• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2layout: default
3title: ICU FAQ
4nav_order: 6
5parent: Misc
6---
7<!--
8© 2020 and later: Unicode, Inc. and others.
9License & terms of use: http://www.unicode.org/copyright.html
10-->
11
12# ICU FAQs
13{: .no_toc }
14
15## Contents
16{: .no_toc .text-delta }
17
181. TOC
19{:toc}
20
21---
22
23## Introduction to ICU
24
25#### What is ICU?
26
27ICU is a cross-platform Unicode based globalization library. It includes support
28for locale-sensitive string comparison, date/time/number/currency/message
29formatting, text boundary detection, character set conversion and so on.
30
31#### Where can I get ICU?
32
33You can get ICU4C and ICU4J from <http://www.icu-project.org/download/>
34
35**Why don't you build binaries for my platform?**
36
37There are many versions of compilers on so many platforms that we cannot build
38them all and guarantee compatibility between them all even on the same platform.
39Due to these restrictions, we only distribute a limited number of binary
40versions of ICU, but we will assist in building other versions from source.
41
42**Why don't you provide project files for my MSVC version (MSVC 2008, etc)?**
43
44You can use the Cygwin build environment to build ICU from source against the
45MSVC compiler. See the ICU4C Readme.
46
47#### How do I install the binary versions of ICU?
48
49*   **Windows**:
50    *   The DLLs you may need for your application are located in
51        **bin\\icuXX##.dll**, where "XX" are two letters (such as "uc" for the
52        "common" library, "in" for the "i18n" library, etc.) and ## is the major
53        and the minor version number (such as **42** for **4.2** / **4.2**.0.1
54        or **4.2**.4 ).
55    *   Either place the DLLs in the same directory as your application's .EXE
56        files, or set the PATH variable to point to the directory containing the
57        ICU DLLs.
58    *   For compiling applications, add the "include" direcotry (the parent of
59        the "unicode" and "layout" directories) to the include search path.
60    *   For linking applications, add the "lib" directory to the appropriate
61        path.
62*   **Other Platforms**:
63    *   For other platforms, the .tgz file unpacks to a "/usr/local" type
64        hierarchy. For system-wide installation, you can unpack all of the files
65        into /usr/local/bin, /usr/local/include, etc.
66    *   The configuration script **/usr/local/bin/icu-config** or the similar
67        Makefile include fragment **/usr/local/lib/icu/current/Makefile.inc**
68        can be used in building applications.
69
70#### Can you help me build ICU4C for ...
71
72We can try ... make sure you read the latest "readme" and also the [ICU
73Data](../icudata.md) section. You might also [searching the icu-support
74archives](http://site.icu-project.org/contacts), and then posting a question
75there. Additionally, sites such as
76[StackOverflow](http://stackoverflow.com/search?q=icu) may have helpful tips for
77your topic.
78
79*   **Android NDK**
80    *   Please try [searching the icu-support
81        archives](http://site.icu-project.org/contacts) and also see
82        [StackOverflow](http://stackoverflow.com/search?q=icu+android).
83*   **iPhone**
84    *   Please try [searching the icu-support
85        archives](http://site.icu-project.org/contacts) and also see
86        [StackOverflow](http://stackoverflow.com/search?q=icu+iphone).
87
88#### What is the ICU binary compatibility policy?
89
90Please see the section on
91[binary compatibility](../design#icu-binary-compatibility)
92in the [design chapter](../design.md).
93
94#### How is ICU licensed?
95
96The ICU license is intended to allow ICU to be included both in free software
97projects and in proprietary or commercial products.
98
99Since ICU 58, ICU is covered by the
100[Unicode license](http://www.unicode.org/copyright.html) which is very similar to
101the previous ICU license.
102
103ICU 1.8.1–ICU 57 and ICU4J 1.3.1–ICU4J 57 are covered by the [ICU
104license](https://github.com/unicode-org/icu/blob/release-57-1/icu4c/LICENSE),
105a simple, permissive non-copyleft free software license, compatible with the GNU
106GPL. The ICU license is identical to the version of the X license that was
107formerly available at <http://www.x.org/Downloads_terms.html> . (This site no
108longer exists, but can still be retrieved through internet archive services.)
109
110#### Can I use ICU from other languages besides C/C++ and Java?
111
112There are a number of wrappers available, please see the
113[Related Projects](http://site.icu-project.org/related) page.
114
115#### How do I upgrade to a new version of ICU? Should I be concerned about API changes, a new Unicode version or a new CLDR version)?
116
117Our goal is for ICU upgrades to go smoothly. Here are some steps you can take to
118prepare for an upgrade, or to make sure that your usage of ICU is
119upgrade-friendly.
120
121*   **API:** ensure that you are not using draft APIs which may have changed in
122    a future release. See the section on
123    [API compatibility](../design#icu-api-compatibility) in the
124    [design chapter](../design.md).
125*   **Unicode:** See the release notes for particular versions of Unicode to
126    ensure that your code is not affected by property changes or other
127    specification changes.
128*   **CLDR:** If your application has test cases which depend on specific
129    translations, these assumptions may become invalid if the translation of an
130    item changes, new support is added, or if a country changes its currency.
131    Try not to depend on specific translations, or be prepared to change test
132    cases. Also, a newer version may support additional translations,
133    currencies, types of calendars
134*   **Building/Deploying your Application (ICU4C):** ICU4C usually builds with
135    symbol renaming (See:
136    [binary compatibility](../design#icu-binary-compatibility)
137    in the [design chapter](../design.md)). Be sure that you build your
138    application with the updated ICU header files, so that it will link against
139    the current ICU. Also, don't hard-code the names of ICU libraries in your
140    build scripts and projects. Where possible, link against just the
141    'base name' such as `libicuuc.so` or `icuuc.lib` rather than a name
142    containing the version number such as `libicuuc.so.**46**` or
143    `icuuc**46**.dll`.
144
145## Building and Testing ICU
146
147#### How do I build ICU?
148
149See the readme.html that is included with ICU.
150
151#### How do I get 32- or 64-bit versions of the ICU libraries?
152
153From ICU version 4.2 on, the configure script will build with the default bit
154width of your platform. You can request 64 or 32 bits with the
155**--with-library-bits=** option, (e.g. `runConfigureICU Linux
156**--with-library-bits=64**` or `runConfigureICU MacOSX
157**--with-library-bits=32**`).
158(For the behavior of attempting 64 bits if possible, use
159**--with-library-bits=64else32**).
160
161#### How do I build an optimized, non debug ICU?
162
163On Win32, choose the 'Release' configuration from the drop down menu. On other
164platforms, use the runConfigureICU script, which uses the configure script. The
165runConfigureICU script uses the safest level of optimization for the ICU
166libraries. If your platform is not specified, set the following environment
167variables before running configure or runConfigureICU: **CFLAGS=-O CXXFLAGS=-O**
168
169#### Why am I getting so many test failures when I use "gmake check"?
170
171Please view the readme that is included with ICU. It has all the details on how
172to build and test ICU, and it usually answers most problems.
173
174If you are using a compiler that hasn't been tested with ICU before, you may
175have encountered an optimization bug with the compiler. On Unix platforms you
176can specify **--disable-release** when you are using runConfigureICU (e.g.
177`runConfigureICU --disable-release LinuxRedHat`). If this fixes your problem, it
178is recommended that you report the optimization bug to the compiler
179manufacturer.
180
181If neither of these fix your problem, please send an e-mail to the [ICU4C
182Support List](http://icu-project.org/contacts.html) .
183
184#### How can I reduce the size of the ICU data library?
185
186Use the [Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835)
187or see
188[Customizing ICU's Data Library](../icudata#customizing-icus-data-library)
189in the [ICU Data Management](../icudata.md) chapter of this User's Guide.
190
191#### Why am I seeing a small ( only a few K ) instead of a large ( several megabytes ) data shared library (icudt)?
192#### Opening ICU services fails with U_MISSING_RESOURCE_ERROR and u_init() returns failure.
193
194ICU libraries always must link with the ICU data library. However, so that ICU
195can bootstrap itself, it first builds a 'stub' data library, in
196**icu\\source\\stubdata**, so that the tools can function. You should only use
197this in production if you are NOT using DLL-mode data access, in which case you
198are accessing ICU data as individual files, as an archive (.dat) file, or some
199other means. Normally, you should be using the larger library built from
200**icu\\source\\data**. If you see this issue after ICU has completed building,
201re-run 'make' in **icu\\source\\data**, or the '**makedata**' project in Visual
202Studio.
203
204#### Can I add or remove a converter from ICU?
205
206Yes. Please see [Customizing ICU's Data Library](../icudata#customizing-icus-data-library)
207in the [ICU Data Management](../icudata.md) of this User's Guide. You can also
208get extra converters from <http://www.icu-project.org/charts/charset/> or use
209the [ICU Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835)
210tool.
211
212#### Why don't the makefiles work?
213
214You need GNU's make program version 3.8 or later, and you need to run the
215runConfigureICU script, which is located in the `icu/source directory`. You may
216be using a platform that ICU does not support. If the first two answers do not
217apply to you, then you should send an e-mail to the
218[ICU4C Support List](http://www.icu-project.org/contacts.html).
219
220Here are some places you can find gmake:
221
2221.  GNU: <http://www.gnu.org/software/make/>
223
2242.  Sun® Source/Binaries: <http://www.sunfreeware.com>
225
2263.  z/OS (OS/390) Source/Binaries:
227    <http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc>
228
2294.  IBM i (OS/400) Source/Binaries:
230    <http://www.ibm.com/servers/enable/site/porting/iseries/overview/gnu_utilities.html>
231
232Due to differences in every platform's make program, we will not support other
233versions of our make files.
234
235#### What version of the C iostream is used in ICU4C?
236
237ICU4C uses the latest available version of the iostream on the target platform.
238Only the `io` library uses iostream.
239
240#### I only want to use the C APIs, do I need a C++ compiler?
241
242Large portions of ICU4C were always implemented in C++, and over time we are
243moving more into that direction. We continue to support and add C APIs, in order
244to provide binary-compatible APIs. For the implementation, C++ is much better:
245It is generally easier to work with, which reduces bugs and maintenance. It is
246closer to Java, which is important for porting between ICU4C and ICU4J. We use
247[RAII](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization)
248(e.g., LocalPointer) to reduce opportunities for memory leaks, we use inline
249functions and type-safe constants instead of #define, etc. However, we do not
250use exceptions, and we do not use the Standard Template Library (STL), so
251ICU4C's dependencies on the C++ library are minimal. See the new
252[dependencies.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/test/depstest/dependencies.txt)
253and search for "group: cplusplus".
254
255As ICU does not use exceptions, the GCC option `-fno-exceptions` will reduce or
256remove the dependencies on the standard C++ library. In
257[GCC](http://gcc.gnu.org) 4.5 there is an option `-static-libstdc++` which will
258remove C++ library dependencies. Visual Studio has the
259[/MT option](http://msdn.microsoft.com/en-us/library/2kzt1wy3(v=VS.100).aspx),
260and other compilers may have similar options. See the
261[How To Use ICU](../howtouseicu.md) page for related information on this topic.
262
263## Features of ICU
264
265#### What computer languages does ICU support?
266
267ICU4C (ICU) is written in C and C++, and ICU4J is written in Java™.
268
269#### How are the APIs documented for deprecation?
270
271Please read the [ICU API compatibility](../design#icu-api-compatibility)
272section in the [ICU Design](../design.md) chapter.
273
274#### What version of Unicode standard does ICU support?
275
276ICU versions 65 supports Unicode version 12.
277
278The Unicode versions for older versions of ICU are listed on the ICU download
279page, <http://www.icu-project.org/download/>
280
281#### Does ICU support UTF-16 surrogates and Unicode supplementary characters?
282
283Yes.
284
285#### Does Java support UTF-16 surrogates and Unicode supplementary characters?
286
287Java 5 introduced support for Unicode supplementary characters. Java 1.4 and
288earlier do not directly support them.
289
290#### How does ICU relate to Java's java.text.\* package?
291
292The International Components for Unicode are available both as a C/C++ library
293and a Java class library. ICU provides internationalization utilities for
294writing global applications in C, C++ or Java programming languages. ICU was
295originally developed by the Unicode group at the IBM Globalization Center of
296Competency in Cupertino, and ICU was contributed to Sun for inclusion into the
297JDK 1.1. ICU4J includes enhanced versions of some of these contributed classes
298plus additional classes that complement the classes in the JDK.
299
300ICU4C started as a C++ port of the original Java Internationalization classes.
301These classes are now partially implemented in C, with largely parallel C and
302C++ APIs. ICU4C and ICU4J continue to leapfrog each other with features and bug
303fixes. Over time, features from ICU4J get added to the JDK as well.
304
305Both versions of ICU have a goal to implement the latest Unicode standard,
306maintain a single portable source code base, and to make it easier for software
307developers to create global applications.
308
309## Using ICU
310
311#### Can I use any of the features of ICU without Unicode strings?
312
313No. In order to use the collation, text boundary analysis, formatting or other
314ICU APIs, you must use Unicode strings. In order to get Unicode strings from
315your native codepage, you can use the conversion API.
316
317#### How do I declare a Unicode string in ICU?
318
319Use the `U_STRING_DECL` and `U_STRING_INIT` macros or use the UnicodeString
320class for C++. Strings are represented as `UChar \*` as the base string type.
321
322Even though most platforms declare wide strings as `wchar_t \*` or `L""` as the
323base string type, that declaration is not portable because the `sizeof(wchar_t)`
324can be 1, 2 or 4, and the encoding may not even be Unicode. On the platforms
325where `sizeof(wchar_t)` is 2 bytes, `UChar` is defined as `wchar_t`. In that
326case you can  use ICU's strings with 3rd party legacy functions; however, we do
327not suggest using Unicode strings without the `U_STRING_DECL` and
328`U_STRING_INIT` macros or UnicodeString class because they are platform
329independent implementations.
330
331#### How is a Unicode string represented in ICU4C?
332
333A Unicode string is currently represented as UTF-16. The endianess of UTF-16 is
334platform dependent. You can guarantee the endianess of UTF-16 by using a
335converter. UTF-16 strings can be converted to other Unicode forms by using a
336converter or with the UTF conversion macros.
337
338ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not support
339surrogates, and UTF-16 does support surrogates. This means that UCS-2 only
340supports UTF-16's Base Multilingual Plane (BMP). The notion of UCS-2 is
341deprecated and dead. Unicode 2.0 in 1996 changed its default encoding to UTF-16.
342
343If you need to do a quick and easy conversion between UTF-16 and UTF-8, UTF-32
344or an encoding in `wchar_t`, you should take a look at unicode/ustring.h. In
345that header file you will find `u_strToWCS`, `u_strFromWCS`, `u_strToUTF8`,
346`u_strFromUTF8`, `u_strToUTF32` and `u_strFromUTF32` functions. These
347functions are provided for your convenience instead of using the `ucnv_\*` API.
348
349You can also take a look at the `UTF_\*`, `UTF8_\*`, `UTF16_\*` and `UTF32_\*`
350macros, which are defined in
351[unicode/utf.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf.h),
352[unicode/utf8.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf8.h),
353[unicode/utf16.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf16.h)
354and [unicode/utf32.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utf32.h).
355These macros are helpful for programmers that need to manipulate and process
356Unicode strings.
357
358#### How do I index into a UTF-16 string?
359
360Typically, indexes and offsets in strings count string units, not characters
361(although in C and Java they have a char type).
362
363For example, in old-fashioned MBCS strings, you would count indexes and offsets
364by bytes, not by the variable-width character count. In UTF-16, you do the same,
365just count 16-bit units (in ICU: UChar).
366
367#### What is the performance difference between UTF-8 and UTF-16?
368
369Most of the time, the memory throughput of the hard drive and RAM is the main
370performance constraint. UTF-8 is 50% smaller than UTF-16 for US-ASCII, but UTF-8
371is 50% larger than UTF-16 for East and South Asian scripts. There is no memory
372difference for Latin extensions, Greek, Cyrillic, Hebrew, and Arabic.
373
374For processing Unicode data, UTF-16 is much easier to handle. You get a choice
375between either one or two units per character, not a choice among four lengths.
376UTF-16 also does not have illegal 16-bit unit values, while you might want to
377check for illegal bytes in UTF-8. Incomplete character sequences in UTF-16 are
378less important and more benign. If you want to quickly convert small strings
379between the different UTF encodings or get a UChar32 value, you can use the
380macros provided in `utf.h` and its siblings `utf8.h` and `utf16.h`. For larger
381or partial strings, please use the conversion API.
382
383#### How do the converters work?
384
385The converters act like a data stream. This means that the state of the last
386character is saved in the converter after each call to the `ucnv_fromUnicode()`
387and `ucnv_toUnicode()` functions. So if the source buffer ends with part of a
388surrogate Unicode character pair, the next call to `ucnv_toUnicode()` will
389write out the equivalent character to the destination buffer. Please see the
390[Conversion](../conversion/index.md) chapter of the User's Guide for details.
391
392#### What does a locale look like in ICU?
393
394ICU locales are lightweight, and they are represented by just a string.
395Lightweight means that there is just a string to represent a locale and nothing
396more. Many platforms have numbers and other data structures to represent a
397locale, but ICU has one simple platform independent string to represent a
398locale.
399
400ICU locales usually contain an ISO-639 language name (2-3 characters), an
401ISO-3166 country name (2-3 characters), and a variant name which is user
402specified. When a language or country is not represented by these standards, ICU
403uses 3 characters to represent that part of the locale. All three parts are
404separated by an underscore "_". For example, US English is "en_US", and German
405in Germany with the Euro symbol is represented as "de_DE_EURO". Traditionally
406the language part of the locale is lowercase, the country is uppercase and the
407variant is uppercase. More details are available from the [Locale
408Chapter](../locale/index.md) of this User's Guide.
409
410#### How is ICU versioned?
411
412Please read the [ICU Design](../design.md) chapter of the User's Guide.
413
414#### What is the relationship between ICU locale data and system locale data?
415
416There is no relationship. ICU is not dependent on the operating system for the
417locale data.
418
419This also means that `uloc_setDefault()` does not affect the operating system.
420The function `uloc_setDefault()` only sets ICU's default locale. Normally the
421default locale for ICU is whatever the operating system says is the default
422locale.
423
424#### How are errors handled in ICU?
425
426Since not all compilers can handle exceptions, we return an error from functions
427with a `UErrorCode` parameter. The `UErrorCode` parameter of a function will
428return any errors that occurred while it was executing. It's usually a good idea
429to check for errors after calling a function by using the `U_SUCCESS` and
430`U_FAILURE` macros. `U_SUCCESS` returns true when the function did run properly,
431and `U_FAILURE` returns true when the function did NOT run properly. You may
432handle specific errors from a function by checking the exact value of error. The
433possible values of `UErrorCode` are located in
434[utypes.h](https://github.com/unicode-org/icu/blob/master/icu4c/source/common/unicode/utypes.h)
435of the common project. Before any function is called with a `UErrorCode`, it
436must be initialized to `U_ZERO_ERROR`.
437
438Here is an example of `UErrorCode` being used.
439
440```c++
441UErrorCode err = U_ZERO_ERROR;
442callMyFunction(&err);
443if (U_FAILURE(err)) {
444puts("callMyFunction() Failed!");
445}
446```
447
448Please see the [ICU Design](../design.md) chapter for details.
449
450#### With calendar classes, why are months 0-based?
451
452"I have been using ICU for its calendar classes, and have found it to be
453excellent. That said, I am wondering why the decision was made to keep months
4540-based while almost all the other calendrical units (years, weeks of year,
455weeks of month, date, days of year, days of week, days of week in month) are
4561-based? This has been the source of several bugs whenever the mind is slightly
457less than razor sharp." --Contributor
458
459This was not our choice. We inherited it from the Java Calendar API,
460unfortunately.
461
462#### Is there a guideline for COBOL programs that want to use ICU?
463
464There is a COBOL/ICU guideline available since ICU 2.2. For more details, please
465refer to the [COBOL section](../usefrom/cobol.md) of this User's Guide.
466
467#### Where can I get more information about using ICU?
468
469Please send an e-mail to the [ICU4C Support
470List](http://www.icu-project.org/contacts.html) .
471