Directory |
Description |
<ICU>/source/common/ |
The core Unicode and support functionality, such as resource bundles,
character properties, locales, codepage conversion, normalization,
Unicode properties, Locale, and UnicodeString. |
<ICU>/source/i18n/ |
Modules in i18n are generally the more data-driven, that is to say
resource bundle driven, components. These deal with higher-level
internationalization issues such as formatting, collation, text break
analysis, and transliteration. |
<ICU>/source/layoutex/ |
Contains the ICU paragraph layout engine. |
<ICU>/source/io/ |
Contains the ICU I/O library. |
<ICU>/source/data/ |
This directory contains the source data in text format, which is
compiled into binary form during the ICU build process. It contains
several subdirectories, in which the data files are grouped by
function. Note that the build process must be run again after any
changes are made to this directory.
If some of the following directories are missing, it's probably
because you got an official download. If you need the data source files
for customization, then please download the complete ICU source code from the ICU repository.
- in/ A directory that contains a pre-built data library for
ICU. A standard source code package will contain this file without
several of the following directories. This is to simplify the build
process for the majority of users and to reduce platform porting
issues.
- brkitr/ Data files for character, word, sentence, title
casing and line boundary analysis.
- coll/ Data for collation tailorings. The makefile
colfiles.mk contains the list of resource bundle files.
- locales/ These .txt files contain ICU language and
culture-specific localization data. Two special bundles are
root, which is the fallback data and parent of other bundles,
and index, which contains a list of installed bundles. The
makefile resfiles.mk contains the list of resource bundle
files. Some of the locale data is split out into the type-specific
directories curr, lang, region, unit, and zone, described below.
- curr/ Locale data for currency symbols and names (including
plural forms), with its own makefile resfiles.mk.
- lang/ Locale data for names of languages, scripts, and locale
key names and values, with its own makefile resfiles.mk.
- region/ Locale data for names of regions, with its own
makefile resfiles.mk.
- unit/ Locale data for measurement unit patterns and names,
with its own makefile resfiles.mk.
- zone/ Locale data for time zone names, with its own
makefile resfiles.mk.
- mappings/ Here are the code page converter tables. These
.ucm files contain mappings to and from Unicode. These are compiled
into .cnv files. convrtrs.txt is the alias mapping table from
various converter name formats to ICU internal format and vice versa.
It produces cnvalias.icu. The makefiles ucmfiles.mk,
ucmcore.mk, and ucmebcdic.mk contain the list of
converters to be built.
- translit/ This directory contains transliterator rules as
resource bundles, a makefile trnsfiles.mk containing the list
of installed system translitaration files, and as well the special
bundle translit_index which lists the system transliterator
aliases.
- unidata/ This directory contains the Unicode data files.
Please see http://www.unicode.org/ for more
information.
- misc/ The misc directory contains other data files which
did not fit into the above categories, including time zone
information, region-specific data, and other data derived from CLDR
supplemental data.
- out/ This directory contains the assembled memory mapped
files.
- out/build/ This directory contains intermediate (compiled)
files, such as .cnv, .res, etc.
If you are creating a special ICU build, you can set the ICU_DATA
environment variable to the out/ or the out/build/ directories, but
this is generally discouraged because most people set it incorrectly.
You can view the ICU Data
Management section of the ICU User's Guide for details.
|
<ICU>/source/test/intltest/ |
A test suite including all C++ APIs. For information about running
the test suite, see the build instructions specific to your platform
later in this document. |
<ICU>/source/test/cintltst/ |
A test suite written in C, including all C APIs. For information
about running the test suite, see the build instructions specific to your
platform later in this document. |
<ICU>/source/test/iotest/ |
A test suite written in C and C++ to test the icuio library. For
information about running the test suite, see the build instructions
specific to your platform later in this document. |
<ICU>/source/test/testdata/ |
Source text files for data, which are read by the tests. It contains
the subdirectories out/build/ which is used for intermediate
files, and out/ which contains testdata.dat. |
<ICU>/source/tools/ |
Tools for generating the data files. Data files are generated by
invoking <ICU>/source/data/build/makedata.bat on Win32 or
<ICU>/source/make on UNIX. |
<ICU>/source/samples/ |
Various sample programs that use ICU |
<ICU>/source/extra/ |
Non-supported API additions. Currently, it contains the 'uconv' tool
to perform codepage conversion on files. |
<ICU>/packaging/ |
This directory contain scripts and tools for packaging the final
ICU build for various release platforms. |
<ICU>/source/config/ |
Contains helper makefiles for platform specific build commands. Used
by 'configure'. |
<ICU>/source/allinone/ |
Contains top-level ICU workspace and project files, for instance to
build all of ICU under one MSVC project. |
<ICU>/include/ |
Contains the headers needed for developing software that uses ICU on
Windows. |
<ICU>/lib/ |
Contains the import libraries for linking ICU into your Windows
application. |
<ICU>/bin/ |
Contains the libraries and executables for using ICU on Windows. |
## How To Build And Install ICU
### Recommended Build Options
Depending on the platform and the type of installation, we recommend a small number of modifications and build options. Note that C99 compatibility is now required.
* **Namespace (ICU 61 and later):** Since ICU 61, call sites need to qualify ICU types explicitly, for example `icu::UnicodeString`, or do `using icu::UnicodeString;` where appropriate. If your code relies on the "using namespace icu;" that used to be in `unicode/uversion.h`, then you need to update your code.
You could temporarily (until you have more time to update your code) revert to the default "using" via `-DU_USING_ICU_NAMESPACE=1` or by modifying `unicode/uversion.h`:
```
Index: icu4c/source/common/unicode/uversion.h
===================================================================
--- icu4c/source/common/unicode/uversion.h (revision 40704)
+++ icu4c/source/common/unicode/uversion.h (working copy)
@@ -127,7 +127,7 @@
defined(U_LAYOUTEX_IMPLEMENTATION) || defined(U_TOOLUTIL_IMPLEMENTATION)
# define U_USING_ICU_NAMESPACE 0
# else
-# define U_USING_ICU_NAMESPACE 0
+# define U_USING_ICU_NAMESPACE 1
# endif
# endif
# if U_USING_ICU_NAMESPACE
```
* **Namespace (ICU 60 and earlier):** By default, unicode/uversion.h has "using namespace icu;" which defeats much of the purpose of the namespace. (This is for historical reasons: Originally, ICU4C did not use namespaces, and some compilers did not support them. The default "using" statement preserves source code compatibility.)
You should turn this off via `-DU_USING_ICU_NAMESPACE=0` or by modifying unicode/uversion.h:
```
Index: source/common/unicode/uversion.h
===================================================================
--- source/common/unicode/uversion.h (revision 26606)
+++ source/common/unicode/uversion.h (working copy)
@@ -180,7 +180,8 @@
# define U_NAMESPACE_QUALIFIER U_ICU_NAMESPACE::
# ifndef U_USING_ICU_NAMESPACE
-# define U_USING_ICU_NAMESPACE 1
+ // Set to 0 to force namespace declarations in ICU usage.
+# define U_USING_ICU_NAMESPACE 0
# endif
# if U_USING_ICU_NAMESPACE
U_NAMESPACE_USE
```
ICU call sites then either qualify ICU types explicitly, for example `icu::UnicodeString`, or do `using icu::UnicodeString;` where appropriate.
* **Hardcode the default charset to UTF-8:** On platforms where the default charset is always UTF-8, like MacOS X and some Linux distributions, we recommend hardcoding ICU's default charset to UTF-8. This means that some implementation code becomes simpler and faster, and statically linked ICU libraries become smaller. (See the [U_CHARSET_IS_UTF8](http://icu-project.org/apiref/icu4c/platform_8h.html#a0a33e1edf3cd23d9e9c972b63c9f7943) API documentation for more details.)
You can `-DU_CHARSET_IS_UTF8=1` or modify `unicode/utypes.h` (in ICU 4.8 and below) or modify unicode/platform.h (in ICU 49 and higher):
```
Index: source/common/unicode/utypes.h
===================================================================
--- source/common/unicode/utypes.h (revision 26606)
+++ source/common/unicode/utypes.h (working copy)
@@ -160,7 +160,7 @@
* @see UCONFIG_NO_CONVERSION
*/
#ifndef U_CHARSET_IS_UTF8
-# define U_CHARSET_IS_UTF8 0
+# define U_CHARSET_IS_UTF8 1
#endif
/*===========================================================================*/
```
* **UnicodeString constructors:** The UnicodeString class has several single-argument constructors that are not marked "explicit" for historical reasons. This can lead to inadvertent construction of a `UnicodeString` with a single character by using an integer, and it can lead to inadvertent dependency on the conversion framework by using a C string literal.
Beginning with ICU 49, you should do the following:
* Consider marking the from-`UChar` and from-`UChar32` constructors explicit via `-DUNISTR_FROM_CHAR_EXPLICIT=explicit` or similar.
* Consider marking the from-`const char*` and from-`const UChar*` constructors explicit via `-DUNISTR_FROM_STRING_EXPLICIT=explicit` or similar.
> :point_right: **Note**: The ICU test suites cannot be compiled with these settings.
* **utf.h, utf8.h, utf16.h, utf_old.h:** By default, utypes.h (and thus almost every public ICU header) includes all of these header files. Often, none of them are needed, or only one or two of them. All of utf_old.h is deprecated or obsolete.
Beginning with ICU 49, you should define `U_NO_DEFAULT_INCLUDE_UTF_HEADERS` to 1 (via -D or uconfig.h, as above) and include those header files explicitly that you actually need.
> :point_right: **Note**: The ICU test suites cannot be compiled with this setting.
* **utf_old.h:** All of utf_old.h is deprecated or obsolete.
Beginning with ICU 60, you should define `U_HIDE_OBSOLETE_UTF_OLD_H` to 1 (via -D or uconfig.h, as above). Use of any of these macros should be replaced as noted in the comments for the obsolete macro.
> :point_right: **Note**: The ICU test suites _can_ be compiled with this setting.
* **.dat file:** By default, the ICU data is built into a shared library (DLL). This is convenient because it requires no install-time or runtime configuration, but the library is platform-specific and cannot be modified. A .dat package file makes the opposite trade-off: Platform-portable (except for endianness and charset family, which can be changed with the icupkg tool) and modifiable (also with the icupkg tool). If a path is set, then single data files (e.g., .res files) can be copied to that location to provide new locale data or conversion tables etc.
The only drawback with a .dat package file is that the application needs to provide ICU with the file system path to the package file (e.g., by calling `u_setDataDirectory()`) or with a pointer to the data (`udata_setCommonData()`) before other ICU API calls. This is usually easy if ICU is used from an application where `main()` takes care of such initialization. It may be hard if ICU is shipped with another shared library (such as the Xerces-C++ XML parser) which does not control `main()`.
See the [User Guide ICU Data](https://unicode-org.github.io/icu/userguide/icudata) chapter for more details.
If possible, we recommend building the .dat package. Specify `--with-data-packaging=archive` on the configure command line, as in
`runConfigureICU Linux --with-data-packaging=archive`
(Read the configure script's output for further instructions. On Windows, the Visual Studio build generates both the .dat package and the data DLL.)
Be sure to install and use the tiny stubdata library rather than the large data DLL.
* **Static libraries:** It may make sense to build the ICU code into static libraries (.a) rather than shared libraries (.so/.dll). Static linking reduces the overall size of the binary by removing code that is never called.
Example configure command line:
`runConfigureICU Linux --enable-static --disable-shared`
* **Out-of-source build:** It is usually desirable to keep the ICU source file tree clean and have build output files written to a different location. This is called an "out-of-source build". Simply invoke the configure script from the target location:
```
~/icu$ git clone https://github.com/unicode-org/icu.git
~/icu$ mkdir icu4c-build
~/icu$ cd icu4c-build
~/icu/icu4c-build$ ../icu/icu4c/source/runConfigureICU Linux
~/icu/icu4c-build$ make check
```
> :point_right: **Note**: this example shows a relative path to `runConfigureICU`. If you experience difficulty, try using an absolute path to `runConfigureICU` instead.
#### ICU as a System-Level Library
If ICU is installed as a system-level library, there are further opportunities and restrictions to consider. For details, see the _Using ICU as an Operating System Level Library_ section of the [User Guide ICU Architectural Design](https://unicode-org.github.io/icu/userguide/design) chapter.
* **Data path:** For a system-level library, it is best to load ICU data from the .dat package file because the file system path to the .dat package file can be hardcoded. ICU will automatically set the path to the final install location using `U_ICU_DATA_DEFAULT_DIR`. Alternatively, you can set `-DICU_DATA_DIR=/path/to/icu/data` when building the ICU code. (Used by source/common/putil.c.)
Consider also setting `-DICU_NO_USER_DATA_OVERRIDE` if you do not want the `ICU_DATA` environment variable to be used. (An application can still override the data path via `u_setDataDirectory()` or `udata_setCommonData()`.
* **Hide draft API:** API marked with `@draft` is new and not yet stable. Applications must not rely on unstable APIs from a system-level library. Define `U_HIDE_DRAFT_API`, `U_HIDE_INTERNAL_API` and `U_HIDE_SYSTEM_API` by modifying `unicode/utypes.h` before installing it.
* **Only C APIs:** Applications must not rely on C++ APIs from a system-level library because binary C++ compatibility across library and compiler versions is very hard to achieve. Most ICU C++ APIs are in header files that contain a comment with `\brief C++ API`. Consider not installing these header files, or define `U_SHOW_CPLUSPLUS_API` to be `0` by modifying `unicode/utypes.h` before installing it.
* **Disable renaming:** By default, ICU library entry point names have an ICU version suffix. Turn this off for a system-level installation, to enable upgrading ICU without breaking applications. For example:
`runConfigureICU Linux --disable-renaming`
The public header files from this configuration must be installed for applications to include and get the correct entry point names.
### User-Configurable Settings
ICU4C can be customized via a number of user-configurable settings. Many of them are controlled by preprocessor macros which are defined in the `source/common/unicode/uconfig.h` header file. Some turn off parts of ICU, for example conversion or collation, trading off a smaller library for reduced functionality. Other settings are recommended (see previous section) but their default values are set for better source code compatibility.
In order to change such user-configurable settings, you can either modify the `uconfig.h` header file by adding a specific `#define ...` for one or more of the macros before they are first tested, or set the compiler's preprocessor flags (`CPPFLAGS`) to include an equivalent `-D` macro definition.
### How To Build And Install On Windows
Building International Components for Unicode requires:
* Microsoft Windows
* Microsoft Visual C++ (part of [Visual Studio](https://www.visualstudio.com/)) (from either Visual Studio 2015 or Visual Studio 2017)
* _**Optional:**_ A version of the [Windows 10 SDK](https://developer.microsoft.com/windows/downloads) (if you want to build the UWP projects)
> :point_right: **Note**: [Cygwin](#how-to-build-and-install-on-windows-with-cygwin) is required if using a version of MSVC other than the one compatible with the supplied project files or if other compilers are used to build ICU. (e.g. GCC)
The steps are:
1. Unzip the `icu-XXXX.zip` file into any convenient location.
* You can use the built-in zip functionality of Windows Explorer to do this. Right-click on the .zip file and choose the "Extract All" option from the context menu. This will open a new window where you can choose the output location to put the files.
* Alternatively, you can use a 3