• Home
Name Date Size #Lines LOC

..--

.settings/04-Jul-2025-1714

src/04-Jul-2025-18,00911,989

.classpathD04-Jul-20251.2 KiB3231

.projectD04-Jul-2025540 2423

README.mdD04-Jul-202511.5 KiB186129

README.txtD04-Jul-2025578 129

build-icu-data.xmlD04-Jul-202525.3 KiB491244

pom.xmlD04-Jul-20255.1 KiB132107

README.md

1<!--
2© 2019 and later: Unicode, Inc. and others.
3License & terms of use: http://www.unicode.org/copyright.html
4-->
5
6# Basic instructions for running the LdmlConverter via Maven
7
8> Note: While this document provides useful background information about the
9  LdmlConverter, the actual complete process for integrating CLDR data to ICU
10  is described in the document `../../../docs/processes/cldr-icu.md` which is
11  best viewed as
12  [CLDR-ICU integration](https://unicode-org.github.io/icu/processes/cldr-icu.html)
13
14## Requirements
15
16* A CLDR release for supplying CLDR data and the CLDR API.
17* The Maven build tool
18* The Ant build tool (using JDK 11+)
19
20## Important directories
21
22| Directory       | Description                                                                                                                                                                                                                          |
23|-----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
24| `TOOLS_ROOT`    | Path to root of ICU tools directory, below which are (e.g.) the `cldr/` and `unicodetools/` directories.                                                                                                                             |
25| `CLDR_DIR`      | This is the path to the to root of standard CLDR sources, below which are the `common/` and `tools/` directories.                                                                                                                    |
26| `CLDR_DATA_DIR` | The top-level directory for the CLDR production data (typically the "production" directory in the staging repository). Usually generated locally or obtained from:  https://github.com/unicode-org/cldr-staging/tree/main/production |
27
28In Posix systems, it's best to set these as exported shell variables, and any
29following instructions assume they have been set accordingly:
30
31```
32$ export TOOLS_ROOT=/path/to/icu/tools
33$ export CLDR_DIR=/path/to/cldr
34$ export CLDR_DATA_DIR=/path/to/cldr-staging/production
35```
36
37Note that you should not attempt to use data from the CLDR project directory
38(where the CLDR API code exists) for conversion into ICU data. The process now
39relies on a pre-processing step, and the CLDR data must come from the separate
40"staging" repository (i.e. https://github.com/unicode-org/cldr-staging) or be
41pre-processed locally into a different directory.
42
43
44## Initial Setup
45
46This project relies on the Maven build tool for managing dependencies and uses
47Ant for configuration purposes, so both will need to be installed. On a Debian
48based system, this should be as simple as:
49
50```
51$ sudo apt-get install maven ant
52```
53
54You must also install an additional CLDR JAR file the local Maven repository at
55`$TOOLS_ROOT/cldr/lib` (see the `README.txt` in that directory for more
56information).
57
58```
59$ cd "$TOOLS_ROOT/cldr/lib"
60$ ./install-cldr-jars.sh "$CLDR_DIR"
61```
62
63## Generating all ICU data and source code
64
65```
66$ cd "$TOOLS_ROOT/cldr/cldr-to-icu"
67$ ant -f build-icu-data.xml
68```
69
70## Other Examples
71
72* Outputting a subset of the supplemental data into a specified directory:
73  ```
74  $ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DoutputTypes=plurals,dayPeriods -DdontGenCode=true
75  ```
76  Note: Output types can be listed with mixedCase, lower_underscore or UPPER_UNDERSCORE.
77  Pass `-DoutputTypes=help` to see the full list.
78
79
80* Outputting only a subset of locale IDs (and all the supplemental data):
81  ```
82  $ ant -f build-icu-data.xml -DoutDir=/tmp/cldr -DlocaleIdFilter='(zh|yue).*' -DdontGenCode=true
83  ```
84
85* Overriding the default CLDR version string (which normally matches the CLDR library code):
86  ```
87  $ ant -f build-icu-data.xml -DcldrVersion="36.1"
88  ```
89
90### Using `alt="ascii"` CLDR alternate values from the CLDR XML
91
92CLDR provides alternate values in addition to the default values for locale data.
93
94For example, some locales have time formats using U+202F NARROW NO-BREAK SPACE (NNBSP) between the hours/minutes/seconds and the day periods.
95In order to provide the equivalent time formats that use the ASCII space
96U+0020 SPACE,
97the alternate values have the extra attribute `alt="ascii"`.
98
99Follw these steps to generate ICU data using the ASCII versions of locale data:
100
1011.  First, edit the `build-icu-data.xml` file where it mentions `ALTERNATE VALUES`
102with the correctly annotated source path, target path, and locales list
103as follows:
104
105    ```diff
106    @@ -384,6 +399,20 @@
107              <!-- ALTERNATE VALUES -->
108              <!-- The following elements configure alternate values for some special case paths.
109                   The target path will only be replaced if both it, and the source path, exist in
110                   the CLDR data (paths will not be modified if only the source path exists).
111                   Since the paths must represent the same semantic type of data, they must be in the
112                   same "namespace" (same element names) and must not contain value attributes. Thus
113                   they can only differ by distinguishing attributes (either added or modified).
114                   This feature is typically used to select alternate translations (e.g. short forms)
115                   for certain paths. -->
116               <!-- <altPath target="//path/to/value[@attr='foo']"
117                             source="//path/to/value[@attr='bar']"
118                             locales="xx,yy_ZZ"/> -->
119    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm']"
120    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm'][@alt='ascii']"/>
121    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms']"
122    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms'][@alt='ascii']"/>
123    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='h']"
124    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='h'][@alt='ascii']"/>
125    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm']"
126    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm'][@alt='ascii']"/>
127    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms']"
128    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms'][@alt='ascii']"/>
129    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmsv']"
130    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmsv'][@alt='ascii']"/>
131    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmv']"
132    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/dateTimeFormats/availableFormats/dateFormatItem[@id='hmv'][@alt='ascii']"/>
133    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='full']/timeFormat[@type='standard']/pattern[@type='standard']"
134    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='full']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"/>
135    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='long']/timeFormat[@type='standard']/pattern[@type='standard']"
136    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='long']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"/>
137    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='medium']/timeFormat[@type='standard']/pattern[@type='standard']"
138    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='medium']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"/>
139    +            <altPath target="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='short']/timeFormat[@type='standard']/pattern[@type='standard']"
140    +                     source="//ldml/dates/calendars/calendar[@type='gregorian']/timeFormats/timeFormatLength[@type='short']/timeFormat[@type='standard']/pattern[@alt='ascii'][@type='standard']"/>
141    +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm']"
142    +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehm'][@alt='ascii']"/>
143    +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms']"
144    +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='Ehms'][@alt='ascii']"/>
145    +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='h']"
146    +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='h'][@alt='ascii']"/>
147    +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm']"
148    +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hm'][@alt='ascii']"/>
149    +            <altPath target="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms']"
150    +                     source="//ldml/dates/calendars/calendar[@type='generic']/dateTimeFormats/availableFormats/dateFormatItem[@id='hms'][@alt='ascii']"/>
151    ```
152
1531.  Then run the generator:
154
155    ```
156    $ ant -f build-icu-data.xml <options>
157    ```
158
159## Config syntax details
160
161Note: some elements have an implicit default attributes associated with them, according to [`ldml.dtd`](https://github.com/unicode-org/icu/blob/main/icu4c/source/data/dtd/cldr/common/dtd/ldml.dtd).
162For example, for the `timeFormat` element,
163the following excerpt of the DTD schema indicates that there is a default value `"standard"` for the `type` attribute:
164
165```
166<!ELEMENT timeFormat ... >
167<!ATTLIST timeFormat type NMTOKEN "standard" >
168```
169
170See `build-icu-data.xml` for documentation of all options and additional customization.
171
172
173## Running unit tests
174
175```
176$ mvn test -DCLDR_DIR="$CLDR_DATA_DIR"
177```
178
179
180## Importing and running from an IDE
181
182This project should be easy to import into an IDE which supports Maven development, such
183as IntelliJ or Eclipse. It uses a local Maven repository directory for the unpublished
184CLDR libraries (which are included in the project), but otherwise gets all dependencies
185via Maven's public repositories.
186

README.txt

1*********************************************************************
2*** © 2019 and later: Unicode, Inc. and others.                   ***
3*** License & terms of use: http://www.unicode.org/copyright.html ***
4*********************************************************************
5
6The instructions for the LdmlConverter tool (a.k.a. CLDR-to-ICU converter) have
7moved to README.md in this directory.
8
9Please read README.md, or better yet, view the rendered form of its Markdown
10contents online at Github
11(ex: https://github.com/unicode-org/icu/tree/main/tools/cldr/cldr-to-icu)
12