• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2layout: default
3title: ICU Design
4nav_order: 5
5parent: ICU
6---
7<!--
8© 2020 and later: Unicode, Inc. and others.
9License & terms of use: http://www.unicode.org/copyright.html
10-->
11
12# ICU Architectural Design
13{: .no_toc }
14
15## Contents
16{: .no_toc .text-delta }
17
181. TOC
19{:toc}
20
21---
22
23# Overview
24
25This chapter discusses the ICU design structure, the ICU versioning support, and
26the introduction of namespace in C++.
27
28## Java and ICU Basic Design Structure
29
30The JDK internationalization components and ICU components both share the same
31common basic architectures with regard to the following:
32
331. [Locales](#locales)
342. [Data-driven services](#data-driven-services)
353. [ICU threading models and the open and close model](#icu-threading-model-and-open-and-close-model)
364. [Cloning customization](#cloning-customization)
375. [Error handling](#error-handling)
386. [Extensibility](#extensibility)
397. [Resource bundle inheritance model](#resource-bundle-inheritance-model)
40
41There are design features in ICU4C that are not in the Java Development Kit
42(JDK) due
43to programming language restrictions. These features include the following:
44
45### Locales
46
47Locale IDs are composed of language, country, and variant information. The
48following links provide additional useful information regarding ISO standards:
49[ISO-639](http://lcweb.loc.gov/standards/iso639-2/englangn.html), and an ISO
50Country Code,
51[ISO-3166](http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html).
52For example, Italian, Italy, and Euro are designated as: it_IT_EURO.
53
54### Data-driven Services
55
56Data-driven services often use resource bundles for locale data. These services
57map a key to data. The resources are designed not only to manage system locale
58information but also to manage application-specific or general services data.
59ICU supports string, numeric, and binary data types and can be structured into
60nested arrays and tables.
61
62This results in the following:
63
641. Data used by the services can be built at compile time or run time.
652. For efficient loading, system data is pre-compiled to .dll files or files
66   that can be mapped into memory.
673. Data for services can be added and modified without source code changes.
68
69### ICU Threading Model and Open and Close Model
70
71The "open and close" model supports multi-threading. It enables ICU users to use
72the same kind of service for different locales, either in the same thread or in
73different threads.
74
75For example, a thread can open many collators for different languages, and
76different threads can use different collators for the same locale
77simultaneously. Constant data can be shared so that only the current state is
78allocated for each editor.
79
80The ICU threading model is designed to avoid contention for resources, and
81enable you to use the services for multiple locales simultaneously within the
82same thread. The ICU threading model, like the rest of the ICU architecture, is
83the same model used for the international services in Java™.
84
85When you use a service such as collation, the client opens the service using an
86ID, typically a locale. This service allocates a small chunk of memory used for
87the state of the service, with pointers to shared, read-only data in support of
88that service. (In Java, you call `getInstance()` to create an object; in C++,
89`createInstance()`. ICU uses the open and close metaphor in C because it is more
90familiar to C programmers.)
91
92If no locale is supplied when a service is opened, ICU uses the default locale.
93Once a service is open, changing the default locale has no effect. Thus, there
94can not be any thread synchronization between the default locales and open
95services.
96
97When you open a second service for the same locale, another small chunk of
98memory is used for the state of the service, with pointers to the same shared,
99read-only data. Thus, the majority of the memory usage is shared. When any
100service is closed, then the chunk of memory is deallocated. Other connections
101that point to the same shared data stay valid.
102
103Any number of services, for the same locale or different locales, can be open
104within the same thread or in different threads.
105
106#### Thread-safe const APIs
107
108In recent ICU releases, we have worked to make any service object *thread-safe*
109(usable concurrently) *as long as all of the threads are using only const APIs*:
110APIs that are declared const in C++, take a const this-like service pointer in
111C, or are "logically const" in Java. This is an enhancement over the original
112Java/ICU threading model. (Originally, concurrent use of even only const APIs
113was not thread-safe.)
114
115However, you cannot use a reference to an open service object in two threads at
116the same time *if either of them calls any non-const API*. An individual open
117service object is not thread-safe for concurrent "writes". Rather, for non-const
118use, you must use the clone function to create a copy of the service you want
119and then pass this copy to the second thread. This procedure allows you to use
120the same service in different threads, but avoids any thread synchronization or
121deadlock problems.
122
123#### Freezable
124
125Some classes also implement the `Freezable` interface (or similar pattern in
126C++), for example `UnicodeSet` or `Collator`: An object that typically starts
127out mutable can be set up and then "frozen", which makes it immutable and thus
128usable concurrently because all non-const APIs are disabled. A frozen object can
129never be "thawed". For example, a `Collator` can be created, various attributes
130set, then frozen and then used from many threads for comparing strings and
131getting sort keys.
132
133#### Clone vs. open
134
135Clone operations are designed to be much faster than reopening the service with
136initial parameters and copying the source's state. (With objects in C++ and
137Java, the clone function is also much safer than trying to recreate a service,
138since you get the proper subclass.) Once a service is cloned, changes will not
139affect the original source service, or vice-versa.
140
141Thus, the normal mode of operation is to:
142
1431. Open a service with a given locale.
1442. Use the service as long as needed. However, do not keep opening and closing
145   a service within a tight loop.
1463. Clone a service if it needs to be used in parallel in another thread.
1474. Close any clones that you open as well as any instances of the services that
148   are owned.
149
150> :point_right: **Note**: These service instances may be closed in any sequence.
151The preceding steps are given as an example.
152
153### Cloning Customization
154
155Typically, the services supplied with ICU cover the vast majority of usages.
156However, there are circumstances where the service needs to be customized for a
157new locale. ICU (and Java) enable you to create customized services. For
158example, you can create a `RuleBasedCollator` by merging the rules for French and
159Arabic to get a custom French-Arabic collation sequence. By merging these rules,
160the pointer does not point to a read-only table that is shared between threads.
161Instead, the pointer refers to a table that is specific to your particular open
162service. If you clone the open service, the table is copied. When you close the
163service, the table is destroyed.
164
165For some services, ICU supplies registration. You can register a customized open
166service under an ID; keeping a copy of that service even after you close the
167original. A client in that thread or in other threads can recreate a copy of the
168service by opening with that ID.
169
170ICU may cache service instances. Therefore, registration should be done during
171startup, before opening services by locale ID.
172
173These registrations are not persistent; once your program finishes, ICU flushes
174all the registrations. While you still might have multiple copies of data
175tables, it is faster to create a service from a registered ID than it is to
176create a service from rules.
177
178> :point_right: **Note**: To work around the lack of persistent registration,
179query the service for the parameters used to create it and then store those
180parameters in a file on a disk.
181
182For services whose IDs are locales, such as collation, the registered IDs must
183also be locales. For those services (like Transliteration or Timezones) that are
184cross-locale, the IDs can be any string.
185
186Prospective future enhancements for this model are:
187
1881. Having custom services share data tables, by making those tables reference
189   counted. This will reduce memory consumption and speed clone operations (a
190   performance enhancement chiefly useful for multiple threads using the same
191   customized service).
1922. Expanding registration for all the international services.
1933. Allowing persistent registration of services.
194
195#### Per-client Locale ID vs Per-thread Locale ID
196
197Some application environments operate by setting a per thread (or per process)
198locale ID, and then not passing the locale ID as a parameter during processing.
199If this usage model were used with ICU in a multi-threaded server, it might
200result in ICU being requested to constantly open, use, and then close service
201objects. Instead, it is recommended that locale IDs be associated with each
202client be stored with other per-client data, along with any service objects
203(such as collators or formatters) that client might use. If operations involving
204a single client are short-lived, it might be more efficient to keep a pool of
205service objects, organized according to locale. Then, if a particular locale's
206formatter is in high demand, that formatter can be used, and then returned to
207the pool.
208
209### ICU Memory Usage
210
211ICU4C APIs are designed to allow separate heaps for its libraries vs. the
212application. This is achieved by providing functions to allocate and release
213objects owned by ICU4C using only ICU4C library functions. For more details see
214the Memory Usage section in the [Coding Guidelines](dev/codingguidelines.md#memory-usage).
215
216### ICU4C Initialization and Termination
217
218The ICU library does not normally require any explicit initialization prior to
219use. An application begins use simply by calling any ICU API in the usual way.
220There are, however, a few functions affecting ICU's configuration, that, if used,
221must be called first, before other use of ICU in a process. These are outlined below.
222
2231. `u_setMemoryFunctions()`. This function replaces the standard library heap
224allocation functions used by ICU with alternate versions, provided by the
225application. If it is needed, `u_setMemoryFunctions()` must be called first, before
226any other use of ICU. This functionality is not commonly used.
227
2282. ICU Data Locating Functions, `u_setCommonData()`, `u_setDataDirectory()`, and
229`u_setAppData()`. One or more of these functions will be required when ICU is
230configured to load its data directly from files rather than taking it from the
231default data DLL, and the files are not in the default location. Again, this is
232not common. See [ICU Data](icudata#icu-data-directory).
233
2343. Sanity check that ICU is functioning and able to access data. This is
235important because configuration or installation problems that leave ICU unable
236to load its data do occur, and the resulting failures can be confusing.
237Since not all ICU APIs have UErrorCode parameters, in the absence of data they
238may sometimes silently return incorrect results.
239
240   The function `ulocdata_getCLDRVersion()` is suitable; it is small and light
241weight, requires data, and reports the error in the absence of data.
242
243
244When an application is terminating it should call the function `u_cleanup()`,
245which frees all heap storage and other system resources that are held internally
246by the ICU library. While the use of `u_cleanup()` is not strictly required,
247failure to call it will cause memory leak checking tools to report problems for
248resources being held by ICU library.
249
250Before calling `u_cleanup()`, all ICU objects that were created by the
251application must be deleted, and all ICU services (plain C APIs) must be closed.
252
253For some platforms the configure option `--enable-auto-cleanup`, or defining
254the option `UCLN_NO_AUTO_CLEANUP` to 0, will add code which automatically cleans
255up ICU when its shared library is unloaded. See comments in `ucln_imp.h`
256
257#### C++ Static Initialization and Destruction
258
259The ICU library itself does not rely on C++ static initializers, meaning that
260applications will not encounter order-of-initialization problems from the use of
261ICU.
262
263There are, however, some significant limitations for applications that make use
264of ICU at C++ static initialization time:
265
2661.  `u_setMemoryFunctions()` and the data locating functions, if needed, must
267still be called before any other use of ICU. Which includes any use during the
268construction of static objects.
269
2702.  `u_cleanup()` can only be called after all other ICU-using objects have been
271deleted. Finding a suitable time and place for the call to `u_cleanup()` may be
272difficult, however. Refer to the C++ literature on the order of static
273initialization and destruction.
274
2753.  Destruction of static objects that are scoped to a code block. These, by the
276conventions of C++, are lazily initialized when the code block is first entered,
277so there are no issues during static initialization. But object destruction
278happens when the program terminates, leaving the problem of where to call
279`u_cleanup()`, as discussed above.
280
281#### Dynamically Loading and Unloading ICU
282
283Applications may arrange to dynamically load the ICU library when it is needed,
284and unload it when through, repeating the process as required. The specific
285details for loading and unloading, and accessing such libraries, are operating
286system dependent.
287
288For ICU to be used in this way, before unloading, all ICU objects and services
289must be closed or deleted, and `u_cleanup()` must be called.
290
291On Windows, the loading and unloading of ICU should never be done inside
292[DllMain](https://docs.microsoft.com/en-us/windows/win32/dlls/dllmain). Loading
293one of the ICU libraries can cause other libraries or files to be loaded,
294leading to potential dead-lock.
295
296#### Initializing ICU in Multithreaded Environments
297
298There is one specialized case where extra care is needed to safely initialize
299ICU. This situation will arise only when ALL of the following conditions occur:
300
3011. The application main program is written in plain C, not C++.
3022. The application is multithreaded, with the first use of ICU within the
303   process possibly occurring simultaneously in more than one thread.
3043. The application will be run on a platform that does not handle C++ static
305   constructors from libraries when the main program is not in C++. Platforms
306   known to exhibit this behavior are Mac OS X and HP/UX. Platforms that handle
307   C++ libraries correctly include Windows, Linux and Solaris.
308
309To safely initialize the ICU library when all of the above conditions apply, the
310application must explicitly arrange for a first-use of ICU from a single thread
311before the multi-threaded use of ICU begins. A convenient ICU operation for this
312purpose is `uloc_getDefault()`, declared in the header file `unicode/uloc.h`.
313
314> :point_right: **Note**: The status of this situation needs further
315investigation. See issue
316[ICU-21380](https://unicode-org.atlassian.net/browse/ICU-21380)
317
318
319### Error Handling
320
321In order for ICU to maximize portability, this version includes only the subset
322of the C++ language that compile correctly on older C++ compilers and provide a
323usable C interface. Thus, there is no use of the C++ exception mechanism in the
324code or Application Programming Interface (API).
325
326To communicate errors reliably and support multi-threading, this version uses an
327error code parameter mechanism. Every function that can fail takes an error-code
328parameter by reference. This parameter is always the last parameter listed for
329the function.
330
331The `UErrorCode` parameter is defined as an enumerated type. Zero represents no
332error, positive values represent errors, and negative values represent non-error
333status codes. Macros (`U_SUCCESS` and `U_FAILURE`) are provided to check the
334error code.
335
336The `UErrorCode` parameter is an input-output function. Every function tests the
337error code before performing any other task and immediately exits if it produces
338a FAILURE error code. If the function fails later on, it sets the error code
339appropriately and exits without performing any other work, except for any
340cleanup it needs to do. If the function encounters a non-error condition that it
341wants to signal, such as "encountered an unmapped character" in conversion, the
342function sets the error code appropriately and continues. Otherwise, the
343function leaves the error code unchanged.
344
345Generally, only the functions that do not take a `UErrorCode` parameter, but
346call functions that do, must declare a variable. Almost all functions that take
347a `UErrorCode` parameter, and also call other functions that do, merely have to
348propagate the error code that they were passed to the functions they call.
349Functions that declare a new `UErrorCode` parameter must initialize it to
350`U_ZERO_ERROR` before calling any other functions.
351
352ICU enables you to call several functions (that take error codes) successively
353without having to check the error code after each function. Each function
354usually must check the error code before doing any other processing, since it is
355supposed to stop immediately after receiving an error code. Propagating the
356error-code parameter down the call chain saves the programmer from having to
357declare the parameter in every instance and also mimics the C++ exception
358protocol more closely.
359
360### Extensibility
361
362There are 3 major extensibility elements in ICU:
363
3641. **Data Extensibility**:
365   The user installs new locales or conversion data to enhance the existing ICU
366   support. For more details, refer to the package tool (:construction: **TODO**: need link)
367   chapter for more information.
3682. **Code Extensibility**:
369   The classes, data, and design are fully extensible. Examples of this
370   extensibility include the BreakIterator , RuleBasedBreakIterator and
371   DictionaryBasedBreakIterator classes.
3723. **Error Handling Extensibility**:
373   There are mechanisms available to enhance the built-in error handling when
374   it is necessary. For example, you can design and create your own conversion
375   callback functions when an error occurs. Refer to the
376   [Conversion](conversion/index.md) chapter callback section for more
377   information.
378
379### Resource Bundle Inheritance Model
380
381A resource bundle is a set of \<key,value> pairs that provide a mapping from key
382to value. A given program can have different sets of resource bundles; one set
383for error messages, one for menus, and so on. However, the program may be
384organized to combine all of its resource bundles into a single related set.
385
386The set is organized into a tree with "root" at the top, the language at the
387first level, the country at the second level, and additional variants below
388these levels. The set must contain a root that has all keys that can be used by
389the program accessing the resource bundles.
390
391Except for the root, each resource bundle has an immediate parent. For example,
392if there is a resource bundle `X_Y_Z`, then there must be the resource bundles:
393`X_Y`, and `X`. Each child resource bundle can omit any \<key,value> pair that is
394identical to its parent's pair. (Such omission is strongly encouraged as it
395reduces data size and maintenance effort). It must override any \<key,value> pair
396that is different from its parent's pair. If you have a resource bundle for the
397locale ID `language_country_variant`, you must also have
398a bundle for the ID `language_country` and one for the ID `language`.
399
400If a program doesn't find a key in a child resource bundle, it can be assumed
401that it has the same key as the parent. The default locale has no effect on
402this. The particular language used for the root is commonly English, but it
403depends on the developer's preference. Ideally, the language should contain
404values that minimize the need for its children to override it.
405
406The default locale is used only when there is not a resource bundle for a given
407language. For example, there may not be an Italian resource bundle. (This is
408very different than the case where there is an Italian resource bundle that is
409missing a particular key.) When a resource bundle is missing, ICU uses the
410parent unless that parent is the root. The root is an exception because the root
411language may be completely different than its children. In this case, ICU uses a
412modified lookup and the default locale. The following are different lookup
413methods available:
414
415**Lookup chain** : Searching for a resource bundle.
416
417    en_US_<some-variant>
418    en_US
419    en
420    <defaultLang>_<defaultCountry>
421    <defaultLang>
422    root
423
424**Lookup chain** : Searching for a \<key, value> pair after
425`en_US_<some-variant>` has ben loaded. ICU does not use the default locale in
426this case.
427
428    en_US_<some-variant>
429    en_US
430    en
431    root
432
433## Other ICU Design Principles
434
435ICU supports extensive version code and data changes and introduces namespace
436usage.
437
438### Version Numbers in ICU
439
440Version changes show clients when parts of ICU change. ICU; its components (such
441as `Collator`); each resource bundle, including all the locale data resource
442bundles; and individual tagged items within a resource bundle, have their own
443version numbers. Version numbers numerically and lexically increase as changes
444are made.
445
446All version numbers are used in Application Programming Interfaces (APIs) with a
447`UVersionInfo` structure. The `UVersionInfo` structure is an array of four
448unsigned bytes. These bytes are:
449
4501. Major version number
4512. Minor version number
4523. Milli version number
4534. Micro version number
454
455Two `UVersionInfo` structures may be compared using binary comparison (`memcmp`)
456to see which is larger or newer. Version numbers may be different for different
457services. For instance, do not compare the ICU library version number to the ICU
458collator version number.
459
460`UVersionInfo` structures can be converted to and from string representations as
461dotted integers (such as "1.4.5.0") using the `u_versionToString()` and
462`u_versionFromString()` functions. String representations may omit trailing zeros.
463
464The interpretation of version numbers depends on what is being described.
465
466#### ICU Release Version Number (ICU 49 and later)
467
468The first version number field contains the ICU release version number, for
469example 49. Each new version might contain new features, new locale data, and
470modified behavior. (See below for more information on
471[ICU Binary Compatibility](#icu-binary-compatibility)).
472
473The second field is 1 for the initial release (e.g., 49.1). The second and
474sometimes third fields are incremented for binary compatible maintenance
475releases.
476
477* For maintenance releases for only either C or J, the third field is
478  incremented (e.g., ICU4C 49.1.1).
479* For shared updates for C & J, the second field is incremented to 2 and
480  higher (e.g., ICU4C & ICU4J 49.2).
481
482(The second field is 0 during development, with milestone numbers in the third
483field during that time. For example, 49.0.1 for 49 milestone 1.)
484
485#### ICU Release Version Number (ICU 1.4 to ICU 4.8)
486
487In earlier releases, the first two version fields together indicated the ICU
488release, for example 4.8. The third field was 0 for the initial release, and 1
489and higher for binary compatible (bug fixes only) maintenance releases (e.g.,
4904.8.1). The fourth field was used for updates specific to only one of Java, C++,
491or ICU-in-Eclipse.
492
493The second version field was *even* for formal releases ("reference releases")
494(e.g., 1.6 or 4.8) and *odd* during their development (unreleased unstable
495snapshot versions; e.g., 4.7). During development, the third field contained the
496milestone number (e.g., 4.7.1 for 4.8 milestone 1). For very old ICU code, we
497published semi-formal “enhancement” releases with odd second-field numbers
498(e.g., 1.7).
499
500Library filenames and some other internal uses already used a concatenation of
501the first two fields ("48" for 4.8).
502
503#### Resource Bundles and Elements
504
505The data stored in resource bundles is tagged with version numbers. A resource
506bundle can contain a tagged string named "Version" that declares the version
507number in dotted-integer format. For example,
508
509```text
510en {
511    Version { "1.0.3.5" }
512    ...
513}
514```
515
516A resource bundle may omit the "version" element and thus, will inherit a
517version along the usual chain. For example, if the resource bundle **en_US**
518contained no "version" element, it would inherit "1.0.3.5" from the parent en
519element. If inheritance passes all the way to the root resource bundle and it
520contains no "version" resource, then the resource bundle receives the default
521version number 0.
522
523Elements within a resource bundle may also contain version numbers. For example:
524
525```text
526be {
527    CollationElements {
528        Version { "1.0.0.0" }
529        ...
530    }
531}
532```
533
534In this example, the CollationElements data is version 1.0.0.0. This element
535version is not related to the version of the bundle.
536
537#### Internal version numbers
538
539Internally, data files carry format and other version numbers. These version
540numbers ensure that ICU can use the data file. The interpretation depends
541entirely on the data file type. Often, the major number in the format version
542stays the same for backwards-compatible changes to a data file format. The minor
543format version number is incremented for additions that do not violate the
544backwards compatibility of the data file.
545
546#### Component Version Numbers
547
548ICU component version numbers may be found using:
549
5501. `u_getVersion()` returns the version number of ICU as a whole in C++. In C,
551   `ucol_getVersion()` returns the version number of ICU as a whole.
5522. `ures_getVersion()` and `ResourceBundle::getVersion()` return the version
553   number of a ResourceBundle. This is a data version number for the bundle as a
554   whole and subject to inheritance.
5553. `u_getUnicodeVersion()` and `Unicode::getUnicodeVersion()` return the version
556   number of the Unicode character data that underlies ICU. This version
557   reflects the numbering of the Unicode releases. See
558   <http://www.unicode.org/> for more information.
5594. `Collator::getVersion()` in C++ and `ucol_getVersion()` in C return the version
560   number of the Collator. This is a code version number for the collation code
561   and algorithm. It is a combination of version numbers for the collation
562   implementation, the Unicode Collation Algorithm data (which is the data that
563   is used for characters that are not mentioned in a locale's specific
564   collation elements), and the collation elements.
565
566#### Configuration and Management
567
568A major new feature in ICU 2.0 is the ability to link to different versions of
569ICU with the same program. Using this new feature, a program can keep using ICU
5701.8 collation, for example, while using ICU 2.0 for other services. ICU now can
571also be unloaded if needed, to free up resources, and then reloaded when it is
572needed.
573
574### Namespace in C++
575
576ICU 2.0 introduced the use of a C++ namespace to avoid naming collision between
577ICU exported symbols and other libraries. All the public ICU C++ classes are
578defined in the "icu_VersionNumber::" namespace, which is also aliased as
579namespace "icu". Starting with ICU 2.0, including any public ICU C++ header by
580default includes a "using namespace icu_VersionNumber" statement. This is for
581backward compatibility, and should be turned off in favor of explicitly using
582`icu::UnicodeString` etc. (see [How To Use ICU](howtouseicu.md)). (If entry point
583renaming is turned off, then only the unversioned "icu" namespace is used.)
584
585Starting with ICU 49, ICU4C requires namespace support.
586
587### Library Dependencies (C++)
588
589It is sometimes useful to see a dependency chart between the public ICU APIs and
590ICU libraries. This chart can be useful to people that are new to ICU or to
591people that want only certain ICU libraries.
592
593> :construction: **TODO**: The dependency chart is currently not available.
594
595Here are some things to realize about the chart.
596
5971. It gives a general overview of the ICU library dependencies.
5982. Internal dependencies, like the mutex API, are left out for clarity.
5993. Similar APIs were lumped together for clarity (e.g. Formatting). Some of
600   these dependency details can be viewed from the ICU API reference.
6014. The descriptions of each API can be found in our [ICU API
602   reference](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/)
603
604### Code Dependencies (C++)
605
606Starting with ICU 49, the dependencies of code files (.o files compiled from
607.c/.cpp) are documented in
608[source/test/depstest/dependencies.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/depstest/dependencies.txt).
609Adjacent Python code is used to parse this file and to
610[verify](http://site.icu-project.org/processes/release/tasks/healthy-code#TOC-Check-library-dependencies)
611that it matches the actual dependencies of the code files.
612
613The dependency list can be used to build subset libraries. In addition, by
614reducing intra-library dependencies, the code size of statically linked ICU code
615has been reduced.
616
617### ICU API categories
618
619ICU APIs, as defined in header and class files, are either "external" or
620"internal". External APIs are meant to be used by applications, while internal
621APIs should be used only within ICU. APIs are marked to indicate whether they
622are external or internal, as follows. Every external API has a lifecycle label,
623see below.
624
625#### External ICU4C APIs
626
627External ICU4C APIs are
628
6291. declared in header files in unicode folders and exported at build/install
630   time to an `include/unicode` folder
6312. when C++ class members, are `public` or `protected`
6323. do not have an `@internal` label
633
634Exception: Layout engine header files are not in a unicode folder, although the
635public ones are still copied to the `include/unicode` folder at build/install
636time. External layout engine APIs are the ones that have lifecycle labels and
637not an `@internal` label.
638
639#### External ICU4J APIs
640
641External ICU4J APIs are
642
6431. declared in one of the ICU4J core packages (`com.ibm.icu.lang`,
644   `com.ibm.icu.math`, `com.ibm.icu.text`, or `com.ibm.icu.util`).
6452. `public` or `protected` class members
6463. `public` or `protected` contained classes
6474. do not have an `@internal` label
648
649#### "System" APIs
650
651"System" APIs are external APIs that are intended only for special uses for
652system-level code, for example `u_cleanup()`. Normal users should not use them,
653although they are public and supported. System APIs have a `@system` label
654in addition to the lifecycle label that all external APIs have (see below).
655
656#### Internal APIs
657
658All APIs that do not fit any of the descriptions above are internal, which means
659that they are for ICU internal use only and may change at any time without
660notice. Some of them are member functions of public C++ or Java classes, and are
661"technically public but logistically internal" for implementation reasons;
662typically because programming languages don't provide sufficiently access
663control (without clumsy mechanisms). In this case, such APIs have an
664`@internal` label.
665
666### ICU API compatibility
667
668As ICU develops, it adds external APIs - functions, classes, constants, and so
669on. Occasionally it is also necessary to remove or change external APIs. In
670order to make this work, we use the following process:
671
672For all API changes (and for significant/controversial/difficult implementation
673changes), we use proposals to announce and discuss them. A proposal is simply an
674email to the icu-design mailing list that details what is proposed to be
675changed, with an expiration date of typically a week. This gives all mailing
676list members a chance to review upcoming changes, and to discuss them. A
677proposal often changes significantly as a result of discussion. Most proposals
678will eventually find consensus among list members; otherwise, the ICU-TC decides
679what to do. If the addition or change of APIs would affect you, please subscribe
680to the main [icu-design mailing list](http://icu-project.org/contacts.html).
681
682When a **new API** is added to ICU, it **is marked as draft with a `@draft ICU
683x.y` label in the API documentation, **where x.y is the ICU version when the
684API *signature* was introduced or last changed**. A draft API is not guaranteed
685to be stable! Although we will not make gratuitous changes, sometimes the draft
686APIs turns out to be unsatisfactory in actual practice and may need to be
687changed or even removed. Changes of "draft" API are subject to the proposal
688process described above.
689
690**When a `@draft ICU x.y` API is changed, it must remain `@draft` and its version
691number must be updated.**
692
693In ICU4J 3.4.2 and earlier, `@draft` APIs were also marked with Java's `@deprecated`
694tag, so that uses of draft APIs in client code would be flagged by the compiler.
695These uses of the `@deprecated` tag were indicated with the comment “This is a
696draft API and might change in a future release of ICU.” Many clients found this
697confusing and/or undesireable, so ICU4J 3.4.3 no longer marks draft APIs with
698the `@deprecated` tag by default. For clients who prefer the earlier behavior,
699ICU4J provides an ant build target, `restoreDeprecated`, which will update the
700source files to use the `@deprecated` tag. Then clients can just rebuild the ICU4J
701jar as usual.
702
703When an API is judged to be stable and has not been changed for at least one ICU
704release, it is relabeled as stable with a `@stable ICU x.y**` label in the API
705documentation. A stable API is expected to be available in this form for a long
706time. The ICU version **x.y** indicates the last time the API *signature* was
707introduced or changed. **The promotion from `@draft ICU x.y` to `@stable ICU x.y`
708must not change the x.y version number.**
709
710We occasionally make an exception and allow adding new APIs marked as
711`@stable ICU x.y` APIs in the x.y release itself if we believe that they have to
712be stable. We might do this for enum constants that reflect 1:1 Unicode property
713aliases and property value aliases, for a Unicode upgrade in the x.y release.
714
715We sometimes **"broaden" a `@stable`** API function by changing its signature
716in a compatible way. For example, in Java, we might change an input parameter
717from a `String` to a `CharSequence`. In this case we keep the `@stable` but
718update the ICU version number indicating the function signature change.
719
720Even a stable API may eventually need to become deprecated or obsolete. Such
721APIs are strongly discouraged from use. Typically, an improved API is introduced
722at the time of deprecation/obsolescence of the old one.
723
7241. Use of deprecated APIs is strongly discouraged, but they are retained for
725   backward compatibility. These are marked with labels like
726   `@deprecated ICU x.y Use u_abc() instead.`. **The ICU version x.y shows the
727   ICU release in which the API was first declared "deprecated".**
7282. In ICU4J, starting with release 57, a custom Javadoc tag `@discouraged`
729   was added. While similar to `@deprecated` it is used when either ICU wants
730   to discourage a particular API from use but the JDK hasn't deprecated it or
731   ICU needs to keep it for compatibility reasons. These are marked with labels
732   like `@discouraged ICU x.y. Use u_abc() instead.`.
7333. Obsolete APIs are are those whose continued retention will cause severe
734   conflicts or user error, or whose continued support would be a very
735   significant maintenance burden. We make every effort to keep these to a
736   minimum. Obsolete APIs are marked with labels like `@obsolete ICU x.y. Use
737   u_abc() instead since this API will be removed in that release.`.
738   **The x.y indicates that we plan to remove it in ICU version x.y.**
739
740Stable C or Java APIs will not be obsoleted because doing so would break
741forward binary compatibility of the ICU library. Stable APIs may be
742deprecated, but they will be retained in the library.
743
744An "obsolete" API will remain unchanged until it is removed in the indicated
745ICU release, which will be usually one year after the API was declared
746obsolete. Sometimes we still keep it available for some time via a
747compile-time switch but stop maintaining it. In rare occasions, an API must
748be replaced right away because of naming conflicts or severe defects; in
749such cases we provide compile-time switches (`#ifdef` or other mechanisms) to
750select the old API.
751
752For example, here is how an API might be tagged in various versions:
753
754* **In ICU 0.2**: The API is newly introduced as a draft in this release.
755
756  ```text
757  @draft ICU 0.2
758  f(x)
759  ```
760
761* **In ICU 0.4**: The draft version number is updated, because the signature
762  changed.
763
764  ```text
765  @draft ICU 0.4
766  f(x, y)
767  ```
768
769* **In ICU 0.6**: The API is promoted from draft to stable, but the version
770  number does not change, as the signature is the same.
771
772  ```text
773  @stable ICU 0.4
774  f(x, y)
775  ```
776
777* **In ICU 1.0**: The API is "broadened" in a compatible way. For example,
778  changing an input parameter from char to int or from some class to a base
779  class. The signature is changed (so we update the ICU version number), but old
780  calling code continues to work unchanged (so we retain @stable if that's what
781  it was.)
782
783  ```text
784  @stable ICU 1.0
785  f(xbase, y)
786  ```
787
788* **In ICU 1.2**: The API is demoted to deprecated (or obsolete) status.
789
790  ```text
791  @deprecated ICU 1.2 Use g(x,y,z) instead.
792  f(xbase, y)
793  ```
794
795  or, when this API is planned to be removed in ICU 1.4:
796
797  ```text
798  @obsolete ICU 1.4. Use g(x,y,z) instead.
799  f(xbase, y)
800  ```
801
802### ICU Binary Compatibility
803
804*Using ICU as an Operating System Level Library*
805
806ICU4C may be configured for use as a system library in an environment where
807applications that are built with one version of ICU must continue to run without
808change with later versions of the ICU shared library.
809
810Here are the requirements for enabling binary compatibility for ICU4C:
811
8121. Applications must use only APIs that are marked as stable.
8132. Applications must use only plain C APIs, never C++.
8143. ICU must be built with function renaming disabled.
8154. Applications must be built using an ICU that was configured for binary
816   compatibility.
8175. Use ICU version 3.0 or later.
8186. Provide both “common” and “i18n” libraries, or build a combined library.
819
820**Stable APIs Only.** APIs in the ICU library that are tagged as being stable
821will be maintained in future versions of the library. Stable functions will
822continue to exist with the same signature and the same meaning, allowing
823applications to continue to work without change.
824
825Stable APIs do not guarantee that the results from every function will always be
826completely identical between ICU versions (see the
827[Version Numbers in ICU](#version-numbers-in-icu) section above). Bugs may be
828fixed. The Unicode character data may change with new versions of the Unicode
829standard. Locale data may be updated or changed, yielding different results for
830operations like formatting or collation. Applications that require exact
831bit-for-bit, bug-for-bug compatibility of ICU results should not rely on ICU
832release-to-release binary compatibility, but should instead link against a
833specific version of ICU.
834
835To verify that an application uses only stable APIs, build it with the C
836preprocessor symbols `U_HIDE_DRAFT_API` and `U_HIDE_DEPRECATED_API` defined. This
837will produce build errors if any draft, deprecated or obsolete APIs are used. An
838operating system level installation of ICU may set this option permanently.
839
840**C APIs only.** Only plain C APIs remain compatible across ICU releases. The
841reason C++ binary compatibility is not supported is primarily because the design
842of C++ language and runtime environments present extreme technical difficulties
843to doing so. Stable C++ APIs are *source* compatible, but applications using
844them must be recompiled when moving between ICU releases.
845
846**Function renaming disabled.** Function renaming is an ICU feature that allows
847an application to explicitly link against a specific version of the ICU library,
848and to continue to use that version even when other ICU versions exist in the
849runtime environment. This is the exact opposite of release-to-release binary
850compatibility – instead of being able to transparently change ICU versions, an
851application is explicitly tied to one specific version.
852
853Function renaming is enabled by default, and must be disabled at ICU build time
854to enable release to release binary compatibility. To disable renaming, use the
855configure option
856
857```shell
858configure -–disable-renaming [other configure options]
859```
860
861(Configure options may also be passed to the runConfigureICU script.)
862
863To enable release-to-release binary compatibility, ICU must be built with
864`--disable-renaming`, *and* applications must be built using the headers and
865libraries that resulted from the `–-disable-renaming` ICU build
866
867**ICU Version 3.0 or Later.** Binary compatibility of ICU releases is supported
868beginning with ICU version 3.0. Older versions of ICU (2.8 and earlier) do not
869provide for binary compatibility between versions.
870
871**Provide both “common” and “i18n” libraries, or build a combined library.**
872It is rare but possible that services/APIs move from one library to another.
873For example, many years ago we moved the BreakIterator APIs from i18n to common,
874so that word titlecasing functions no longer needed separate code to find
875titlecasing or word break opportunities.
876
877More recently, the ListFormatter moved from the common library to i18n
878when its features grew beyond primitive patterns to also support
879FieldPosition and FormattedValue features.
880
881There is also a third, “io” library.
882It is possible that some of its functionality may be moved to the i18n or common
883libraries.
884(A likely candidate might be `operator<<(std::ostream& stream, const UnicodeString& s)`,
885although there are no actual plans to do so at the time of this writing.)
886
887One can build a combined library which provides the exports from
888both the “common” and “i18n” libraries,
889in order to provide a single library for linking against.
890
891This may be needed for some platforms where there is a strong relationship
892between an API and the library that implements it.
893For example, on Windows platforms, attempting to find an API that has been moved
894with a `LoadLibrary`/`GetProcAddress` approach will fail,
895unless you are using a combined library.
896
897#### Linking against multiple versions of ICU4C
898
899This section is intended to aid software developers who are implementing or
900integrating solutions based on ICU, that may need to consider having multiple
901versions of ICU running within the same executable (address space) at once.
902Typically, users of ICU are encouraged to update to the latest stable version.
903Under certain circumstances, however, behavior from earlier versions is desired,
904or else, an application is linking together code which is already built against
905a different version of ICU.
906
907The major and minor numbers are the first and second numbers in a version
908number, separated by a period. For example, in the version numbers 3.4.2.1,
9093.4.2, or 3.4, "3" is the major, and "4" is the minor. Normally, ICU employs
910"symbol renaming", such that the C function names and C++ object names are
911`#defined` to contain the major and minor numbers. So, for example, if your
912application calls the function `ucnv_open()`, it will link against
913`ucnv_open_3_4` if compiled against ICU 3.4, 3.4.2, or even 3.4.2.1. However, if
914compiled against ICU 3.8, the same code will link against `ucnv_open_3_8`.
915Similarly, `UnicodeString` is renamed to `UnicodeString_3_4`, etc. This is normally
916transparent to the user, however, if you inspect the symbols of the library or
917your code, you will see the modified symbols.
918
919If there are multiple versions of ICU being linked against in one application,
920it will need to link against all relevant libraries for each version, for
921example, common, i18n, and data. ICU uses standard library renaming, where, for
922example, `libicuuc.so` on one platform will actually be a symbolic link to
923`libicuuc.so.3.4`. When multiple ICU versions are used, the application may need
924to explicitly link against the exact versions of ICU being used.
925
926To disable renaming, build ICU with `--disable-renaming` passed to configure.
927Or, set the equivalent `#define U_DISABLE_RENAMING 1`. Renaming must be disabled
928both in the ICU build, and in the calling application.
929
930### ICU Data Compatibility
931
932Starting in ICU 3.8 and later, the data library that comes with ICU is binary
933compatible and structurally compatible with versions of ICU with the same major
934and minor version, or a maintenance release. This allows multiple maintenance
935releases of ICU to share the same data, but generally the latest maintenance
936release of the data should be used.
937
938The binary compatibility of the data refers to the resource bundle binary format
939that is contains the locale data, charset conversion tables and other file
940formats supported by ICU. These binary formats are readable by many versions of
941ICU. For example, resource bundles written with ICU 3.6 are readable by ICU 3.8.
942
943The structural compatibility of the data refers to the structural contents of
944the ICU data. The structure of the locale data may change between reference
945releases, but the keys to reference specific types of data will be the same
946between maintenance releases. This means that resource keys to access data
947within resource bundles will work between maintenance releases of a specific
948reference release. For example, an ICU 3.8 calendar will be able to use ICU
9493.8.1 data, and vis versa; however ICU 3.6 may not be able to read ICU 3.8
950locale data. Generally, these keys are not accessible by ICU users because only
951the ICU implementation uses these resource keys.
952
953The contents of the data library may change between ICU maintenance releases and
954give you different results due to important updates and bug fixes. An example of
955an important update would be a timezone rule update for when a country changes
956when daylight saving time occurs. So the results may be different between
957maintenance releases.
958
959### ICU4J Serialization Compatibility
960
961Starting in ICU4J 3.6, ICU4J stable API classes (marked as `@stable`) implementing
962`java.io.Serializable` support serialized objects to be deserialized by ICU4J 3.6
963or newer version of ICU4J. Some classes perform only shallow serialization,
964therefore, it is not guaranteed that a deserialized object behaves exactly same
965with the original object across ICU4J versions. Also, when it is difficult to
966maintain serialization compatibility in a certain class across different ICU4J
967versions for technical or other reasons, the ICU project committee may approve
968the breakage. In such event, a note explaining the compatibility issue will be
969posted in the ICU public mailing lists and also documented in the release note
970of the new ICU4J version introducing the incompatibility.
971