• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!--
2© 2019 and later: Unicode, Inc. and others.
3License & terms of use: http://www.unicode.org/copyright.html
4-->
5
6ICU Data Build Tool
7===================
8
9ICU 64 provides a tool for configuring your ICU locale data file with finer
10granularity.  This page explains how to use this tool to customize and reduce
11your data file size.
12
13## Overview: What is in the ICU data file?
14
15There are hundreds of **locales** supported in ICU (including script and
16region variants), and ICU supports many different **features**.  For each
17locale and for each feature, data is stored in one or more data files.
18
19Those data files are compiled and then bundled into a `.dat` file called
20something like `icudt64l.dat`, which is little-endian data for ICU 64. This
21dat file is packaged into the `libicudata.so` on Linux or `libicudata.dll.a`
22on Windows. In ICU4J, it is bundled into a jar file named `icudata.jar`.
23
24At a high level, the size of the ICU data file corresponds to the
25cross-product of locales and features, except that not all features require
26locale-specific data, and not all locales require data for all features. The
27data file contents can be approximately visualized like this:
28
29<img alt="Features vs. Locales" src="../assets/features_locales.svg" style="max-width:600px" />
30
31The `icudt64l.dat` file is 27 MiB uncompressed and 11 MiB gzipped.  This file
32size is too large for certain use cases, such as bundling the data file into a
33smartphone app or an embedded device.  This is something the ICU Data Build
34Tool aims to solve.
35
36## ICU Data Configuration File
37
38The ICU Data Build Tool enables you to write a configuration file that
39specifies what features and locales to include in a custom data bundle.
40
41The configuration file may be written in either [JSON](http://json.org/) or
42[Hjson](https://hjson.org/).  To build ICU4C with custom data, set the
43`ICU_DATA_FILTER_FILE` environment variable when running `runConfigureICU` on
44Unix or when building the data package on Windows.  For example:
45
46    ICU_DATA_FILTER_FILE=filters.json path/to/icu4c/source/runConfigureICU Linux
47
48**Important:** You *must* have the data sources in order to use the ICU Data
49Build Tool. Check for the file icu4c/source/data/locales/root.txt. If that file
50is missing, you need to download "icu4c-\*-data.zip", delete the old
51icu4c/source/data directory, and replace it with the data directory from the zip
52file. If there is a \*.dat file in icu4c/source/data/in, that file will be used
53even if you gave ICU custom filter rules.
54
55In order to use Hjson syntax, the `hjson` pip module must be installed on
56your system.  You should also consider installing the `jsonschema` module to
57print messages when errors are found in your config file.
58
59    $ pip3 install --user hjson jsonschema
60
61To build ICU4J with custom data, you must first build ICU4C with custom data
62and then generate the JAR file.  For more information, read
63[icu4j-readme.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/icu4j-readme.txt).
64
65### Locale Slicing
66
67The simplest way to slice ICU data is by locale.  The ICU Data Build Tool
68makes it easy to select your desired locales to suit a number of use cases.
69
70#### Filtering by Language Only
71
72Here is a *filters.json* file that builds ICU data with support for English,
73Chinese, and German, including *all* script and regional variants for those
74languages:
75
76    {
77      "localeFilter": {
78        "filterType": "language",
79        "whitelist": [
80          "en",
81          "de",
82          "zh"
83        ]
84      }
85    }
86
87The *filterType* "language" only supports slicing by entire languages.
88
89#### Filtering by Locale
90
91For more control, use *filterType* "locale".  Here is a *filters.hjson* file that
92includes the same three languages as above, including regional variants, but
93only the default script (e.g., Simplified Han for Chinese):
94
95    localeFilter: {
96      filterType: locale
97      whitelist: [
98        en
99        de
100        zh
101      ]
102    }
103
104#### Adding Script Variants (includeScripts = true)
105
106You may set the *includeScripts* option to true to include all scripts for a
107language while using *filterType* "locale".  This results in behavior similar
108to *filterType* "language".  In the following JSON example, all scripts for
109Chinese are included:
110
111    {
112      "localeFilter": {
113        "filterType": "locale",
114        "includeScripts": true,
115        "whitelist": [
116          "en",
117          "de",
118          "zh"
119        ]
120      }
121    }
122
123If you wish to explicitly list the scripts, you may put the script code in the
124locale tag in the whitelist, and you do not need the *includeScripts* option
125enabled.  For example, in Hjson, to include Han Traditional ***but not Han
126Simplified***:
127
128    localeFilter: {
129      filterType: locale
130      whitelist: [
131        en
132        de
133        zh_Hant
134      ]
135    }
136
137Note: the option *includeScripts* is only supported at the language level;
138i.e., in order to include all scripts for a particular language, you must
139specify the language alone, without a region tag.
140
141#### Removing Regional Variants (includeChildren = false)
142
143If you wish to enumerate exactly which regional variants you wish to support,
144you may use *filterType* "locale" with the *includeChildren* setting turned to
145false.  The following *filters.hjson* file includes English (US), English
146(UK), German (Germany), and Chinese (China, Han Simplified), as well as their
147dependencies, *but not* other regional variants like English (Australia),
148German (Switzerland), or Chinese (Taiwan, Han Traditional):
149
150    localeFilter: {
151      filterType: locale
152      includeChildren: false
153      whitelist: [
154        en_US
155        en_GB
156        de_DE
157        zh_CN
158      ]
159    }
160
161Including dependencies, the above filter would include the following data files:
162
163- root.txt
164- en.txt
165- en_US.txt
166- en_001.txt
167- en_GB.txt
168- de.txt
169- de_DE.txt
170- zh.txt
171- zh_Hans.txt
172- zh_Hans_CN.txt
173- zh_CN.txt
174
175### File Slicing (coarse-grained features)
176
177ICU provides a lot of features, of which you probably need only a small subset
178for your application.  Feature slicing is a powerful way to prune out data for
179any features you are not using.
180
181***CAUTION:*** When slicing by features, you must manually include all
182dependencies.  For example, if you are formatting dates, you must include not
183only the date formatting data but also the number formatting data, since dates
184contain numbers.  Expect to spend a fair bit of time debugging your feature
185filter to get it to work the way you expect it to.
186
187The data for many ICU features live in individual files.  The ICU Data Build
188Tool puts puts similar *types* of files into categories.  The following table
189summarizes the ICU data files and their corresponding features and categories:
190
191| Feature | Category ID(s) | Data Files <br/> ([icu4c/source/data](https://github.com/unicode-org/icu/tree/master/icu4c/source/data)) | Resource Size <br/> (as of ICU 64) |
192|---|---|---|---|
193| Break Iteration | `"brkitr_rules"` <br/> `"brkitr_dictionaries"` <br/> `"brkitr_tree"` | brkitr/rules/\*.txt <br/> brkitr/dictionaries/\*.txt <br/> brkitr/\*.txt | 522 KiB <br/> **2.8 MiB** <br/> 14 KiB |
194| Charset Conversion | `"conversion_mappings"` | mappings/\*.ucm | **4.9 MiB** |
195| Collation <br/> *[more info](#collation-ucadata)* | `"coll_ucadata"` <br/> `"coll_tree"` | in/coll/ucadata-\*.icu <br/> coll/\*.txt | 511 KiB <br/> **2.8 MiB** |
196| Confusables | `"confusables"` | unidata/confusables\*.txt | 45 KiB |
197| Currencies | `"misc"` <br/> `"curr_supplemental"` <br/> `"curr_tree"` | misc/currencyNumericCodes.txt <br/> curr/supplementalData.txt <br/> curr/\*.txt | 3.1 KiB <br/> 27 KiB <br/> **2.5 MiB** |
198| Language Display <br/> Names | `"lang_tree"` | lang/\*.txt | **2.1 MiB** |
199| Language Tags | `"misc"` | misc/keyTypeData.txt <br/> misc/langInfo.txt <br/> misc/likelySubtags.txt <br/> misc/metadata.txt | 6.8 KiB <br/> 37 KiB <br/> 53 KiB <br/> 33 KiB |
200| Normalization | `"normalization"` | in/\*.nrm except in/nfc.nrm | 160 KiB |
201| Plural Rules | `"misc"` | misc/pluralRanges.txt <br/> misc/plurals.txt | 3.3 KiB <br/> 33 KiB |
202| Region Display <br/> Names | `"region_tree"` | region/\*.txt | **1.1 MiB** |
203| Rule-Based <br/> Number Formatting <br/> (Spellout, Ordinals) | `"rbnf_tree"` | rbnf/\*.txt | 538 KiB |
204| StringPrep | `"stringprep"` | sprep/\*.txt | 193 KiB |
205| Time Zones | `"misc"` <br/> `"zone_tree"` <br/> `"zone_supplemental"` | misc/metaZones.txt <br/> misc/timezoneTypes.txt <br/> misc/windowsZones.txt <br/> misc/zoneinfo64.txt <br/> zone/\*.txt <br/> zone/tzdbNames.txt | 41 KiB <br/> 20 KiB <br/> 22 KiB <br/> 151 KiB <br/> **2.7 MiB** <br/> 4.8 KiB |
206| Transliteration | `"translit"` | translit/\*.txt | 685 KiB |
207| Unicode Character <br/> Names | `"unames"` | in/unames.icu | 269 KiB |
208| Unicode Text Layout | `"ulayout"` | in/ulayout.icu | 14 KiB |
209| Units | `"unit_tree"` | unit/\*.txt | **1.7 MiB** |
210| **OTHER** | `"cnvalias"` <br/> `"misc"` <br/> `"locales_tree"` | mappings/convrtrs.txt <br/> misc/dayPeriods.txt <br/> misc/genderList.txt <br/> misc/numberingSystems.txt <br/> misc/supplementalData.txt <br/> locales/\*.txt | 63 KiB <br/> 19 KiB <br/> 0.5 KiB <br/> 5.6 KiB <br/> 228 KiB <br/> **2.4 MiB** |
211
212#### Additive and Subtractive Modes
213
214The ICU Data Build Tool allows two strategies for selecting features:
215*additive* mode and *subtractive* mode.
216
217The default is to use subtractive mode. This means that all ICU data is
218included, and your configurations can remove or change data from that baseline.
219Additive mode means that you start with an *empty* ICU data file, and you must
220explicitly add the data required for your application.
221
222There are two concrete differences between additive and subtractive mode:
223
224|                         | Additive    | Subtractive |
225|-------------------------|-------------|-------------|
226| Default Feature Filter  | `"exclude"` | `"include"` |
227| Default Resource Filter | `"-/"`, `"+/%%ALIAS"`, `"+/%%Parent"` | `"+/"` |
228
229To enable additive mode, add the following setting to your filter file:
230
231    strategy: "additive"
232
233**Caution:** If using `"-/"` or similar top-level exclusion rules, be aware of
234the fields `"+/%%Parent"` and `"+/%%ALIAS"`, which are required in locale tree
235resource bundles. Excluding these paths may cause unexpected locale fallback
236behavior.
237
238#### Filter Types
239
240You may list *filters* for each category in the *featureFilters* section of
241your config file.  What follows are examples of the possible types of filters.
242
243##### Inclusion Filter
244
245To include a category, use the string `"include"` as your filter.
246
247    featureFilters: {
248      locales_tree: include
249    }
250
251If the category is a locale tree (ends with `_tree`), the inclusion filter
252resolves to the `localeFilter`; for more information, see the section
253"Locale-Tree Categories." Otherwise, the inclusion filter causes all files in
254the category to be included.
255
256**NOTE:** When subtractive mode is used (default), all categories implicitly
257start with `"include"` as their filter.
258
259##### Exclusion Filter
260
261To exclude an entire category, use *filterType* "exclude".  For example, to
262exclude all confusables data:
263
264    featureFilters: {
265      confusables: {
266        filterType: exclude
267      }
268    }
269
270Since ICU 65, you can also write simply:
271
272    featureFilters: {
273      confusables: exclude
274    }
275
276**NOTE:** When additive mode is used, all categories implicitly start with
277`"exclude"` as their filter.
278
279##### File Name Filter
280
281To exclude certain files out of a category, use the file name filter, which is
282the default type of filter when *filterType* is not specified.  For example,
283to include the Burmese break iteration dictionary but not any other
284dictionaries:
285
286    featureFilters: {
287      brkitr_dictionaries: {
288        whitelist: [
289          burmesedict
290        ]
291      }
292    }
293
294Do *not* include directories or file extensions.  They will be added
295automatically for you.  Note that all files in a particular category have the
296same directory and extension.
297
298You can use either a whitelist or a blacklist for the file name filter.
299
300##### Regex Filter
301
302To exclude filenames matching a certain regular expression, use *filterType*
303"regex".  For example, to reject the CJK-specific break iteration rules:
304
305    featureFilters: {
306      brkitr_rules: {
307        filterType: regex
308        blacklist: [
309          ^.*_cj$
310        ]
311      }
312    }
313
314The Python standard library [*re*
315module](https://docs.python.org/3/library/re.html) is used for evaluating the
316regular expressions.  In case the regular expression engine is changed in the
317future, however, you are encouraged to restrict yourself to a simple set of
318regular expression operators.
319
320As above, do not include directories or file extensions, and you can use
321either a whitelist or a blacklist.
322
323##### Union Filter
324
325You can combine the results of multiple filters with *filterType* "union".
326This filter matches files that match *at least one* of the provided filters.
327The syntax is:
328
329    {
330      filterType: union
331      unionOf: [
332        { /* filter 1 */ },
333        { /* filter 2 */ },
334        // ...
335      ]
336    }
337
338This filter type is useful for combining "locale" filters with different
339includeScripts or includeChildren options.
340
341#### Locale-Tree Categories
342
343Several categories have the `_tree` suffix.  These categories are for "locale
344trees": they contain locale-specific data.  ***The [localeFilter configuration
345option](#slicing-data-by-locale) sets the default file filter for all `_tree`
346categories.***
347
348If you want to include different locales for different locale file trees, you
349can override their filter in the *featureFilters* section of the config file.
350For example, to include only Italian data for currency symbols *instead of*
351the common locales specified in *localeFilter*, you can do the following:
352
353    featureFilters:
354      curr_tree: {
355        filterType: locale
356        whitelist: [
357          it
358        ]
359      }
360    }
361
362You can exclude an entire `_tree` category without affecting other categories.
363For example, to exclude region display names:
364
365    featureFilters: {
366      region_tree: {
367        filterType: exclude
368      }
369    }
370
371Note that you are able to use any of the other filter types for `_tree`
372categories, but you must be very careful that you are including all of the
373correct files.  For example, `en_GB` requires `en_001`, and you must always
374include `root`.  If you use the "language" or "locale" filter types, this
375logic is done for you.
376
377### Resource Bundle Slicing (fine-grained features)
378
379The third section of the ICU filter config file is *resourceFilters*.  With
380this section, you can dive inside resource bundle files to remove even more
381data.
382
383You can apply resource filters to all locale tree categories as well as to
384categories that include resource bundles, such as the `"misc"` category.
385
386For example, consider measurement units.  There is one unit file per locale (example:
387[en.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unit/en.txt)),
388and that file contains data for all measurement units in CLDR.  However, if
389you are only formatting distances, for example, you may need the data for only
390a small set of units.
391
392Here is how you could include units of length in the "short" style but no
393other units:
394
395    resourceFilters: [
396      {
397        categories: [
398          unit_tree
399        ]
400        rules: [
401          -/units
402          -/unitsNarrow
403          -/unitsShort
404          +/unitsShort/length
405        ]
406      }
407    ]
408
409Conceptually, the rules are applied from top to bottom.  First, all data for
410all three styes of units are removed, and then the short length units are
411added back.
412
413**NOTE:** In subtractive mode, resource paths are *included* by default. In
414additive mode, resource paths are *excluded* by default.
415
416#### Wildcard Character
417
418You can use the wildcard character (`*`) to match a piece of the resource
419path.  For example, to include length units for all three styles, you can do:
420
421    resourceFilters: [
422      {
423        categories: [
424          unit_tree
425        ]
426        rules: [
427          -/units
428          -/unitsNarrow
429          -/unitsShort
430          +/*/length
431        ]
432      }
433    ]
434
435The wildcard must be the only character in its path segment. Future ICU
436versions may expand the syntax.
437
438#### Resource Filter for Specific File
439
440The resource filter object takes an optional *files* setting which accepts a
441file filter in the same syntax used above for file filtering.  For example, if
442you wanted to apply a filter to misc/supplementalData.txt, you could do the
443following (this example removes calendar data):
444
445    resourceFilters: [
446      {
447        categories: ["misc"]
448        files: {
449          whitelist: ["supplementalData"]
450        }
451        rules: [
452          -/calendarData
453        ]
454      }
455    ]
456
457#### Combining Multiple Resource Filter Specs
458
459You can also list multiple resource filter objects in the *resourceFilters*
460array; the filters are added from top to bottom.  For example, here is an
461advanced configuration that includes "mile" for en-US and "kilometer" for
462en-CA; this also makes use of the *files* option:
463
464    resourceFilters: [
465      {
466        categories: ["unit_tree"]
467        rules: [
468          -/units
469          -/unitsNarrow
470          -/unitsShort
471        ]
472      },
473      {
474        categories: ["unit_tree"]
475        files: {
476          filterType: locale
477          whitelist: ["en_US"]
478        }
479        rules: [
480          +/*/length/mile
481        ]
482      },
483      {
484        categories: ["unit_tree"]
485        files: {
486          filterType: locale
487          whitelist: ["en_CA"]
488        }
489        rules: [
490          +/*/length/kilometer
491        ]
492      }
493    ]
494
495The above example would give en-US these resource filter rules:
496
497    -/units
498    -/unitsNarrow
499    -/unitsShort
500    +/*/length/mile
501
502and en-CA these resource filter rules:
503
504    -/units
505    -/unitsNarrow
506    -/unitsShort
507    +/*/length/kilometer
508
509In accordance with *filterType* "locale", the parent locales *en* and *root*
510would get both units; this is required since both en-US and en-CA may inherit
511from the parent locale:
512
513    -/units
514    -/unitsNarrow
515    -/unitsShort
516    +/*/length/mile
517    +/*/length/kilometer
518
519## Debugging Tips
520
521**Run Python directly:** If you do not want to wait for ./runConfigureICU to
522finish, you can directly re-generate the rules using your filter file with the
523following command line run from *iuc4c/source*.
524
525    $ PYTHONPATH=python python3 -m icutools.databuilder \
526      --mode=gnumake --src_dir=data > data/rules.mk
527
528**Install jsonschema:** Install the `jsonschema` pip package to get warnings
529about problems with your filter file.
530
531**See what data is being used:** ICU is instrumented to allow you to trace
532which resources are used at runtime. This can help you determine what data you
533need to include. For more information, see [tracing.md](tracing.md).
534
535**Inspect data/rules.mk:** The Python script outputs the file *rules.mk*
536inside *iuc4c/source/data*. To see what is going to get built, you can inspect
537that file. First build ICU normally, and copy *rules.mk* to
538*rules_default.mk*. Then build ICU with your filter file. Now you can take the
539diff between *rules_default.mk* and *rules.mk* to see exactly what your filter
540file is removing.
541
542**Inspect the output:** After a `make clean` and `make` with a new *rules.mk*,
543you can look inside the directory *icu4c/source/data/out* to see the files
544that got built.
545
546**Inspect the compiled resource filter rules:** If you are using a resource
547filter, the resource filter rules get compiled for each individual locale
548inside *icu4c/source/data/out/tmp/filters*. You can look at those files to see
549what filter rules are being applied to each individual locale.
550
551**Run genrb in verbose mode:** For debugging a resource filter, you can run
552genrb in verbose mode to see which resources got stripped. To do this, first
553inspect the make output and find a command line like this:
554
555    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/genrb --filterDir ./out/tmp/filters/unit_tree -s ./unit -d ./out/build/icudt64l/unit/ -i ./out/build/icudt64l --usePoolBundle ./out/build/icudt64l/unit/ -k en.txt
556
557Copy that command line and re-run it from *icu4c/source/data* with the `-v`
558flag added to the end. The command will print out exactly which resource paths
559are being included and excluded as well as a model of the filter rules applied
560to this file.
561
562**Inspect .res files with derb:** The `derb` tool can convert .res files back
563to .txt files after filtering. For example, to convert the above unit res file
564back to a txt file, you can run this command from *icu4c/source*:
565
566    LD_LIBRARY_PATH=lib bin/derb data/out/build/icudt64l/unit/en.res
567
568That will produce a file *en.txt* in your current directory, which is the
569original *data/unit/en.txt* but after resource filters were applied.
570
571*Tip:* derb expects your res files to be rooted in a directory named
572`icudt64l` (corresponding to your current ICU version and endianness). If your
573files are not in such a directory, derb fails with U_MISSING_RESOURCE_ERROR.
574
575**Put complex rules first** and **use the wildcard `*` sparingly:** The order
576of the filter rules matters a great deal in how effective your data size
577reduction can be, and the wildcard `*` can sometimes produce behavior that is
578tricky to reason about. For example, these three lists of filter rules look
579similar on first glance but acutally produce different output:
580
581<table>
582<tr>
583<th>Unit Resource Filter Rules</th>
584<th>Unit Resource Size</th>
585<th>Commentary</th>
586<th>Result</th>
587</tr>
588<tr><td><pre>
589-/*/*
590+/*/digital
591-/*/digital/*/dnam
592-/durationUnits
593-/units
594-/unitsNarrow
595</pre></td><td>77 KiB</td><td>
596First, remove all unit types. Then, add back digital units across all unit
597widths. Then, remove display names from digital units. Then, remove duration
598unit patterns and long and narrow forms.
599</td><td>
600Digital units in short form are included; all other units are removed.
601</td></tr>
602<tr><td><pre>
603-/durationUnits
604-/units
605-/unitsNarrow
606-/*/*
607+/*/digital
608-/*/digital/*/dnam
609</pre></td><td>125 KiB</td><td>
610First, remove duration unit patterns and long and narrow forms. Then, remove
611all unit types. Then, add back digital units across all unit widths. Then,
612remove display names from digital units.
613</td><td>
614Digital units are included <em>in all widths</em>; all other units are removed.
615</td></tr>
616<tr><td><pre>
617-/*/*
618+/*/digital
619-/*/*/*/dnam
620-/durationUnits
621-/units
622-/unitsNarrow
623</pre></td><td>191 KiB</td><td>
624First, remove all unit types. Then, add back digital units across all unit
625widths. Then, remove display names from all units. Then, remove duration unit
626patterns and long and narrow forms.
627</td><td>
628Digital units in short form are included, as is the <em>tree structure</em>
629for all other units, even though the other units have no real data.
630</td></tr>
631</table>
632
633By design, empty tree structure is retained in the unit bundle. This is
634because there are numerous instances in ICU data where the presence of an
635empty tree carries meaning. However, it means that you must be careful when
636building resource filter rules in order to achieve the optimal data bundle
637size.
638
639Using the `-v` option in genrb (described above) is helpful when debugging
640these types of issues.
641
642## Other Features of the ICU Data Build Tool
643
644While data filtering is the primary reason the ICU Data Build Tool was
645developed, there are there are additional use cases.
646
647### Running Data Build without Configure/Make
648
649You can build the dat file outside of the ICU build system by directly
650invoking the Python icutools.databuilder.  Run the following command to see the
651help text for the CLI tool:
652
653    $ PYTHONPATH=path/to/icu4c/source/python python3 -m icutools.databuilder --help
654
655### Collation UCAData
656
657For using collation (sorting and searching) in any language, the "root"
658collation data file must be included. It provides the Unicode CLDR default
659sort order for all code points, and forms the basis for language-specific
660tailorings as well as for custom collators built at runtime.
661
662There are two versions of the root collation data file:
663
664- ucadata-unihan.txt (compiled size: 511 KiB)
665- ucadata-implicithan.txt (compiled size: 178 KiB)
666
667The unihan version sorts Han characters in radical-stroke order according to
668Unicode, which is a somewhat useful default sort order, especially for use
669with non-CJK languages.  The implicithan version sorts Han characters in the
670order of their Unicode assignment, which is similar to radical-stroke order
671for common characters but arbitrary for others.  For more information, see
672[UTS #10 §10.1.3](https://www.unicode.org/reports/tr10/#Implicit_Weights).
673
674By default, the unihan version is used.  The unihan version of the data file
675is much larger than that for implicithan, so if you need collation but also
676small data, then you may want to select the implicithan version.  To use the
677implicithan version, put the following setting in your *filters.json* file:
678
679    {
680      "collationUCAData": "implicithan"
681    }
682
683### Disable Pool Bundle
684
685By default, ICU uses a "pool bundle" to store strings shared between locales.
686This saves space and is recommended for most users. However, when developing
687a system where locale data files may be added "on the fly" and not included in
688the original ICU distribution, those additional data files may not be able to
689use a pool bundle due to name collisions with the existing pool bundle.
690
691To disable the pool bundle in the current ICU build, put the following setting
692in your *filters.json* file:
693
694    {
695      "usePoolBundle": false
696    }
697
698### File Substitution
699
700Using the configuration file, you can perform whole-file substitutions.  For
701example, suppose you want to replace the transliteration rules for
702*Zawgyi_my*.  You could create a directory called `my_icu_substitutions`
703containing your new `Zawgyi_my.txt` rule file, and then put this in your
704configuration file:
705
706    fileReplacements: {
707      directory: "/path/to/my_icu_substitutions"
708      replacements: [
709        {
710          src: "Zawgyi_my.txt"
711          dest: "translit/Zawgyi_my.txt"
712        },
713        "misc/dayPeriods.txt"
714      ]
715    }
716
717`directory` should either be an absolute path, or a path starting with one of
718the following, and it should not contain a trailing slash:
719
720- "$SRC" for the *icu4c/source/data* directory in the source tree
721- "$FILTERS" for the directory containing filters.json
722- "$CWD" for your current working directory
723
724When the entry in the `replacements` array is an object, the `src` and `dest`
725fields indicate, for each file in the source directory (`src`), what file in
726the ICU hierarchy it should replace (`dest`). When the entry is a string, the
727same relative path is used for both `src` and `dest`.
728
729Whole-file substitution happens before all other filters are applied.
730