1--- 2layout: default 3title: ICU4C FAQ 4nav_order: 1 5parent: ICU4C 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# ICU FAQs 13{: .no_toc } 14 15## Contents 16{: .no_toc .text-delta } 17 181. TOC 19{:toc} 20 21--- 22 23## Introduction to ICU 24 25#### What is ICU? 26 27ICU is a cross-platform Unicode based globalization library. It includes support 28for locale-sensitive string comparison, date/time/number/currency/message 29formatting, text boundary detection, character set conversion and so on. 30 31#### Where can I get ICU? 32 33You can get ICU4C and ICU4J from <https://icu.unicode.org/download> 34 35**Why don't you build binaries for my platform?** 36 37There are many versions of compilers on so many platforms that we cannot build 38them all and guarantee compatibility between them all even on the same platform. 39Due to these restrictions, we only distribute a limited number of binary 40versions of ICU, but we will assist in building other versions from source. 41 42**Why don't you provide project files for my MSVC version (MSVC 2008, etc)?** 43 44You can use the Cygwin build environment to build ICU from source against the 45MSVC compiler. See the [Building ICU4C](./icu4c/build) page. 46 47#### How do I install the binary versions of ICU? 48 49* **Windows**: 50 * The DLLs you may need for your application are located in 51 **bin\\icuXX##.dll**, where "XX" are two letters (such as "uc" for the 52 "common" library, "in" for the "i18n" library, etc.) and ## is the major 53 and the minor version number (such as **42** for **4.2** / **4.2**.0.1 54 or **4.2**.4 ). 55 * Either place the DLLs in the same directory as your application's .EXE 56 files, or set the PATH variable to point to the directory containing the 57 ICU DLLs. 58 * For compiling applications, add the "include" direcotry (the parent of 59 the "unicode" and "layout" directories) to the include search path. 60 * For linking applications, add the "lib" directory to the appropriate 61 path. 62* **Other Platforms**: 63 * For other platforms, the .tgz file unpacks to a "/usr/local" type 64 hierarchy. For system-wide installation, you can unpack all of the files 65 into /usr/local/bin, /usr/local/include, etc. 66 * The configuration script **/usr/local/bin/icu-config** or the similar 67 Makefile include fragment **/usr/local/lib/icu/current/Makefile.inc** 68 can be used in building applications. 69 70#### Can you help me build ICU4C for ... 71 72We can try ... make sure you read the [Building ICU4C](./icu4c/build) section and also the [ICU 73Data](../icudata.md) section. You might also [searching the icu-support 74archives](https://icu.unicode.org/contacts), and then posting a question 75there. Additionally, sites such as 76[StackOverflow](http://stackoverflow.com/search?q=icu) may have helpful tips for 77your topic. 78 79* **Android NDK** 80 * Please try [searching the icu-support 81 archives](https://icu.unicode.org/contacts) and also see 82 [StackOverflow](http://stackoverflow.com/search?q=icu+android). 83* **iPhone** 84 * Please try [searching the icu-support 85 archives](https://icu.unicode.org/contacts) and also see 86 [StackOverflow](http://stackoverflow.com/search?q=icu+iphone). 87 88#### What is the ICU binary compatibility policy? 89 90Please see the section on 91[binary compatibility](../design#icu-binary-compatibility) 92in the [design chapter](../design.md). 93 94#### How is ICU licensed? 95 96The ICU license is intended to allow ICU to be included both in free software 97projects and in proprietary or commercial products. 98 99Since ICU 58, ICU is covered by the 100[Unicode license](http://www.unicode.org/copyright.html) which is very similar to 101the previous ICU license. 102 103ICU 1.8.1–ICU 57 and ICU4J 1.3.1–ICU4J 57 are covered by the [ICU 104license](https://github.com/unicode-org/icu/blob/release-57-1/icu4c/LICENSE), 105a simple, permissive non-copyleft free software license, compatible with the GNU 106GPL. The ICU license is identical to the version of the X license that was 107formerly available at <http://www.x.org/Downloads_terms.html> . (This site no 108longer exists, but can still be retrieved through internet archive services.) 109 110#### Can I use ICU from other languages besides C/C++ and Java? 111 112There are a number of wrappers available, please see the 113[Related Projects](https://icu.unicode.org/related) page. 114 115#### How do I upgrade to a new version of ICU? Should I be concerned about API changes, a new Unicode version or a new CLDR version)? 116 117Our goal is for ICU upgrades to go smoothly. Here are some steps you can take to 118prepare for an upgrade, or to make sure that your usage of ICU is 119upgrade-friendly. 120 121* **API:** ensure that you are not using draft APIs which may have changed in 122 a future release. See the section on 123 [API compatibility](../design#icu-api-compatibility) in the 124 [design chapter](../design.md). 125* **Unicode:** See the release notes for particular versions of Unicode to 126 ensure that your code is not affected by property changes or other 127 specification changes. 128* **CLDR:** If your application has test cases which depend on specific 129 translations, these assumptions may become invalid if the translation of an 130 item changes, new support is added, or if a country changes its currency. 131 Try not to depend on specific translations, or be prepared to change test 132 cases. Also, a newer version may support additional translations, 133 currencies, types of calendars 134* **Building/Deploying your Application (ICU4C):** ICU4C usually builds with 135 symbol renaming (See: 136 [binary compatibility](../design#icu-binary-compatibility) 137 in the [design chapter](../design.md)). Be sure that you build your 138 application with the updated ICU header files, so that it will link against 139 the current ICU. Also, don't hard-code the names of ICU libraries in your 140 build scripts and projects. Where possible, link against just the 141 'base name' such as `libicuuc.so` or `icuuc.lib` rather than a name 142 containing the version number such as `libicuuc.so.**46**` or 143 `icuuc**46**.dll`. 144 145## Building and Testing ICU 146 147#### How do I build ICU? 148 149See the [Building ICU4C](./icu4c/build) section. 150 151#### How do I get 32- or 64-bit versions of the ICU libraries? 152 153From ICU version 4.2 on, the configure script will build with the default bit 154width of your platform. You can request 64 or 32 bits with the 155**--with-library-bits=** option, (e.g. `runConfigureICU Linux 156**--with-library-bits=64**` or `runConfigureICU MacOSX 157**--with-library-bits=32**`). 158(For the behavior of attempting 64 bits if possible, use 159**--with-library-bits=64else32**). 160 161#### How do I build an optimized, non debug ICU? 162 163On Win32, choose the 'Release' configuration from the drop down menu. On other 164platforms, use the runConfigureICU script, which uses the configure script. The 165runConfigureICU script uses the safest level of optimization for the ICU 166libraries. If your platform is not specified, set the following environment 167variables before running configure or runConfigureICU: **CFLAGS=-O CXXFLAGS=-O** 168 169#### Why am I getting so many test failures when I use "gmake check"? 170 171Please view the readme that is included with ICU. It has all the details on how 172to build and test ICU, and it usually answers most problems. 173 174If you are using a compiler that hasn't been tested with ICU before, you may 175have encountered an optimization bug with the compiler. On Unix platforms you 176can specify **--disable-release** when you are using runConfigureICU (e.g. 177`runConfigureICU --disable-release LinuxRedHat`). If this fixes your problem, it 178is recommended that you report the optimization bug to the compiler 179manufacturer. 180 181If neither of these fix your problem, please send an e-mail to the [ICU4C 182Support List](https://icu.unicode.org/contacts) . 183 184#### How can I reduce the size of the ICU data library? 185 186Use the [Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835) 187or see 188[Customizing ICU's Data Library](../icudata#customizing-icus-data-library) 189in the [ICU Data Management](../icudata.md) chapter of this User's Guide. 190 191#### Why am I seeing a small ( only a few K ) instead of a large ( several megabytes ) data shared library (icudt)? 192#### Opening ICU services fails with U_MISSING_RESOURCE_ERROR and u_init() returns failure. 193 194ICU libraries always must link with the ICU data library. However, so that ICU 195can bootstrap itself, it first builds a 'stub' data library, in 196**icu\\source\\stubdata**, so that the tools can function. You should only use 197this in production if you are NOT using DLL-mode data access, in which case you 198are accessing ICU data as individual files, as an archive (.dat) file, or some 199other means. Normally, you should be using the larger library built from 200**icu\\source\\data**. If you see this issue after ICU has completed building, 201re-run 'make' in **icu\\source\\data**, or the '**makedata**' project in Visual 202Studio. 203 204#### Can I add or remove a converter from ICU? 205 206Yes. Please see [Customizing ICU's Data Library](../icudata#customizing-icus-data-library) 207in the [ICU Data Management](../icudata.md) of this User's Guide. You can also 208get extra converters from <https://icu.unicode.org/charts/charset> or use 209the [ICU Data Customizer](https://unicode-org.atlassian.net/browse/ICU-12835) 210tool. 211 212#### Why don't the makefiles work? 213 214You need GNU's make program version 3.8 or later, and you need to run the 215runConfigureICU script, which is located in the `icu/source directory`. You may 216be using a platform that ICU does not support. If the first two answers do not 217apply to you, then you should send an e-mail to the 218[ICU4C Support List](https://icu.unicode.org/contacts). 219 220Here are some places you can find gmake: 221 2221. GNU: <http://www.gnu.org/software/make/> 223 2242. Sun® Source/Binaries: <http://www.sunfreeware.com> 225 2263. z/OS (OS/390) Source/Binaries: 227 <http://www.ibm.com/servers/eserver/zseries/zos/unix/bpxa1ty1.html#opensrc> 228 2294. IBM i (OS/400) Source/Binaries: 230 <http://www.ibm.com/servers/enable/site/porting/iseries/overview/gnu_utilities.html> 231 232Due to differences in every platform's make program, we will not support other 233versions of our make files. 234 235#### What version of the C iostream is used in ICU4C? 236 237ICU4C uses the latest available version of the iostream on the target platform. 238Only the `io` library uses iostream. 239 240#### I only want to use the C APIs, do I need a C++ compiler? 241 242Large portions of ICU4C were always implemented in C++, and over time we are 243moving more into that direction. We continue to support and add C APIs, in order 244to provide binary-compatible APIs. For the implementation, C++ is much better: 245It is generally easier to work with, which reduces bugs and maintenance. It is 246closer to Java, which is important for porting between ICU4C and ICU4J. We use 247[RAII](http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization) 248(e.g., LocalPointer) to reduce opportunities for memory leaks, we use inline 249functions and type-safe constants instead of #define, etc. However, we do not 250use exceptions, and we do not use the Standard Template Library (STL), so 251ICU4C's dependencies on the C++ library are minimal. See the new 252[dependencies.txt](https://github.com/unicode-org/icu/blob/main/icu4c/source/test/depstest/dependencies.txt) 253and search for "group: cplusplus". 254 255As ICU does not use exceptions, the GCC option `-fno-exceptions` will reduce or 256remove the dependencies on the standard C++ library. In 257[GCC](http://gcc.gnu.org) 4.5 there is an option `-static-libstdc++` which will 258remove C++ library dependencies. Visual Studio has the 259[/MT option](http://msdn.microsoft.com/en-us/library/2kzt1wy3(v=VS.100).aspx), 260and other compilers may have similar options. See the 261[How To Use ICU](../icu/howtouseicu.md) page for related information on this topic. 262 263## Features of ICU 264 265#### What computer languages does ICU support? 266 267ICU4C (ICU) is written in C and C++, and ICU4J is written in Java™. 268 269#### How are the APIs documented for deprecation? 270 271Please read the [ICU API compatibility](../design#icu-api-compatibility) 272section in the [ICU Design](../design.md) chapter. 273 274#### What version of Unicode standard does ICU support? 275 276ICU versions 65 supports Unicode version 12. 277 278The Unicode versions for older versions of ICU are listed on the ICU download 279page, <https://icu.unicode.org/download> 280 281#### Does ICU support UTF-16 surrogates and Unicode supplementary characters? 282 283Yes. 284 285#### Does Java support UTF-16 surrogates and Unicode supplementary characters? 286 287Java 5 introduced support for Unicode supplementary characters. Java 1.4 and 288earlier do not directly support them. 289 290#### How does ICU relate to Java's java.text.\* package? 291 292The International Components for Unicode are available both as a C/C++ library 293and a Java class library. ICU provides internationalization utilities for 294writing global applications in C, C++ or Java programming languages. ICU was 295originally developed by the Unicode group at the IBM Globalization Center of 296Competency in Cupertino, and ICU was contributed to Sun for inclusion into the 297JDK 1.1. ICU4J includes enhanced versions of some of these contributed classes 298plus additional classes that complement the classes in the JDK. 299 300ICU4C started as a C++ port of the original Java Internationalization classes. 301These classes are now partially implemented in C, with largely parallel C and 302C++ APIs. ICU4C and ICU4J continue to leapfrog each other with features and bug 303fixes. Over time, features from ICU4J get added to the JDK as well. 304 305Both versions of ICU have a goal to implement the latest Unicode standard, 306maintain a single portable source code base, and to make it easier for software 307developers to create global applications. 308 309## Using ICU 310 311#### Can I use any of the features of ICU without Unicode strings? 312 313No. In order to use the collation, text boundary analysis, formatting or other 314ICU APIs, you must use Unicode strings. In order to get Unicode strings from 315your native codepage, you can use the conversion API. 316 317#### How do I declare a Unicode string in ICU? 318 319Use the `U_STRING_DECL` and `U_STRING_INIT` macros or use the UnicodeString 320class for C++. Strings are represented as `UChar \*` as the base string type. 321 322Even though most platforms declare wide strings as `wchar_t \*` or `L""` as the 323base string type, that declaration is not portable because the `sizeof(wchar_t)` 324can be 1, 2 or 4, and the encoding may not even be Unicode. On the platforms 325where `sizeof(wchar_t)` is 2 bytes, `UChar` is defined as `wchar_t`. In that 326case you can use ICU's strings with 3rd party legacy functions; however, we do 327not suggest using Unicode strings without the `U_STRING_DECL` and 328`U_STRING_INIT` macros or UnicodeString class because they are platform 329independent implementations. 330 331#### How is a Unicode string represented in ICU4C? 332 333A Unicode string is currently represented as UTF-16. The endianess of UTF-16 is 334platform dependent. You can guarantee the endianess of UTF-16 by using a 335converter. UTF-16 strings can be converted to other Unicode forms by using a 336converter or with the UTF conversion macros. 337 338ICU does not use UCS-2. UCS-2 is a subset of UTF-16. UCS-2 does not support 339surrogates, and UTF-16 does support surrogates. This means that UCS-2 only 340supports UTF-16's Base Multilingual Plane (BMP). The notion of UCS-2 is 341deprecated and dead. Unicode 2.0 in 1996 changed its default encoding to UTF-16. 342 343If you need to do a quick and easy conversion between UTF-16 and UTF-8, UTF-32 344or an encoding in `wchar_t`, you should take a look at unicode/ustring.h. In 345that header file you will find `u_strToWCS`, `u_strFromWCS`, `u_strToUTF8`, 346`u_strFromUTF8`, `u_strToUTF32` and `u_strFromUTF32` functions. These 347functions are provided for your convenience instead of using the `ucnv_\*` API. 348 349You can also take a look at the `UTF_\*`, `UTF8_\*`, `UTF16_\*` and `UTF32_\*` 350macros, which are defined in 351[unicode/utf.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utf.h), 352[unicode/utf8.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utf8.h), 353[unicode/utf16.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utf16.h) 354and [unicode/utf32.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utf32.h). 355These macros are helpful for programmers that need to manipulate and process 356Unicode strings. 357 358#### How do I index into a UTF-16 string? 359 360Typically, indexes and offsets in strings count string units, not characters 361(although in C and Java they have a char type). 362 363For example, in old-fashioned MBCS strings, you would count indexes and offsets 364by bytes, not by the variable-width character count. In UTF-16, you do the same, 365just count 16-bit units (in ICU: UChar). 366 367#### What is the performance difference between UTF-8 and UTF-16? 368 369Most of the time, the memory throughput of the hard drive and RAM is the main 370performance constraint. UTF-8 is 50% smaller than UTF-16 for US-ASCII, but UTF-8 371is 50% larger than UTF-16 for East and South Asian scripts. There is no memory 372difference for Latin extensions, Greek, Cyrillic, Hebrew, and Arabic. 373 374For processing Unicode data, UTF-16 is much easier to handle. You get a choice 375between either one or two units per character, not a choice among four lengths. 376UTF-16 also does not have illegal 16-bit unit values, while you might want to 377check for illegal bytes in UTF-8. Incomplete character sequences in UTF-16 are 378less important and more benign. If you want to quickly convert small strings 379between the different UTF encodings or get a UChar32 value, you can use the 380macros provided in `utf.h` and its siblings `utf8.h` and `utf16.h`. For larger 381or partial strings, please use the conversion API. 382 383#### How do the converters work? 384 385The converters act like a data stream. This means that the state of the last 386character is saved in the converter after each call to the `ucnv_fromUnicode()` 387and `ucnv_toUnicode()` functions. So if the source buffer ends with part of a 388surrogate Unicode character pair, the next call to `ucnv_toUnicode()` will 389write out the equivalent character to the destination buffer. Please see the 390[Conversion](../conversion/index.md) chapter of the User's Guide for details. 391 392#### What does a locale look like in ICU? 393 394ICU locales are lightweight, and they are represented by just a string. 395Lightweight means that there is just a string to represent a locale and nothing 396more. Many platforms have numbers and other data structures to represent a 397locale, but ICU has one simple platform independent string to represent a 398locale. 399 400ICU locales usually contain an ISO-639 language name (2-3 characters), an 401ISO-3166 country name (2-3 characters), and a variant name which is user 402specified. When a language or country is not represented by these standards, ICU 403uses 3 characters to represent that part of the locale. All three parts are 404separated by an underscore "_". For example, US English is "en_US", and German 405in Germany with the Euro symbol is represented as "de_DE_EURO". Traditionally 406the language part of the locale is lowercase, the country is uppercase and the 407variant is uppercase. More details are available from the [Locale 408Chapter](../locale/index.md) of this User's Guide. 409 410#### How is ICU versioned? 411 412Please read the [ICU Design](../design.md) chapter of the User's Guide. 413 414#### What is the relationship between ICU locale data and system locale data? 415 416There is no relationship. ICU is not dependent on the operating system for the 417locale data. 418 419This also means that `uloc_setDefault()` does not affect the operating system. 420The function `uloc_setDefault()` only sets ICU's default locale. Normally the 421default locale for ICU is whatever the operating system says is the default 422locale. 423 424#### How are errors handled in ICU? 425 426Since not all compilers can handle exceptions, we return an error from functions 427with a `UErrorCode` parameter. The `UErrorCode` parameter of a function will 428return any errors that occurred while it was executing. It's usually a good idea 429to check for errors after calling a function by using the `U_SUCCESS` and 430`U_FAILURE` macros. `U_SUCCESS` returns true when the function did run properly, 431and `U_FAILURE` returns true when the function did NOT run properly. You may 432handle specific errors from a function by checking the exact value of error. The 433possible values of `UErrorCode` are located in 434[utypes.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utypes.h) 435of the common project. Before any function is called with a `UErrorCode`, it 436must be initialized to `U_ZERO_ERROR`. 437 438Here is an example of `UErrorCode` being used. 439 440```c++ 441UErrorCode err = U_ZERO_ERROR; 442callMyFunction(&err); 443if (U_FAILURE(err)) { 444puts("callMyFunction() Failed!"); 445} 446``` 447 448Please see the [ICU Design](../design.md) chapter for details. 449 450#### With calendar classes, why are months 0-based? 451 452"I have been using ICU for its calendar classes, and have found it to be 453excellent. That said, I am wondering why the decision was made to keep months 4540-based while almost all the other calendrical units (years, weeks of year, 455weeks of month, date, days of year, days of week, days of week in month) are 4561-based? This has been the source of several bugs whenever the mind is slightly 457less than razor sharp." --Contributor 458 459This was not our choice. We inherited it from the Java Calendar API, 460unfortunately. 461 462#### Is there a guideline for COBOL programs that want to use ICU? 463 464There is a COBOL/ICU guideline available since ICU 2.2. For more details, please 465refer to the [COBOL section](../usefrom/cobol.md) of this User's Guide. 466 467#### Where can I get more information about using ICU? 468 469Please send an e-mail to the [ICU4C Support 470List](https://icu.unicode.org/contacts) . 471