1<!-- 2© 2019 and later: Unicode, Inc. and others. 3License & terms of use: http://www.unicode.org/copyright.html 4--> 5 6ICU Data Build Tool 7=================== 8 9ICU 64 provides a tool for configuring your ICU locale data file with finer 10granularity. This page explains how to use this tool to customize and reduce 11your data file size. 12 13## Overview: What is in the ICU data file? 14 15There are hundreds of **locales** supported in ICU (including script and 16region variants), and ICU supports many different **features**. For each 17locale and for each feature, data is stored in one or more data files. 18 19Those data files are compiled and then bundled into a `.dat` file called 20something like `icudt64l.dat`, which is little-endian data for ICU 64. This 21dat file is packaged into the `libicudata.so` on Linux or `libicudata.dll.a` 22on Windows. In ICU4J, it is bundled into a jar file named `icudata.jar`. 23 24At a high level, the size of the ICU data file corresponds to the 25cross-product of locales and features, except that not all features require 26locale-specific data, and not all locales require data for all features. The 27data file contents can be approximately visualized like this: 28 29<img alt="Features vs. Locales" src="../assets/features_locales.svg" style="max-width:600px" /> 30 31The `icudt64l.dat` file is 27 MiB uncompressed and 11 MiB gzipped. This file 32size is too large for certain use cases, such as bundling the data file into a 33smartphone app or an embedded device. This is something the ICU Data Build 34Tool aims to solve. 35 36## ICU Data Configuration File 37 38The ICU Data Build Tool enables you to write a configuration file that 39specifies what features and locales to include in a custom data bundle. 40 41The configuration file may be written in either [JSON](http://json.org/) or 42[Hjson](https://hjson.org/). To build ICU4C with custom data, set the 43`ICU_DATA_FILTER_FILE` environment variable when running `runConfigureICU` on 44Unix or when building the data package on Windows. For example: 45 46 ICU_DATA_FILTER_FILE=filters.json path/to/icu4c/source/runConfigureICU Linux 47 48**Important:** You *must* have the data sources in order to use the ICU Data 49Build Tool. Check for the file icu4c/source/data/locales/root.txt. If that file 50is missing, you need to download "icu4c-\*-data.zip", delete the old 51icu4c/source/data directory, and replace it with the data directory from the zip 52file. If there is a \*.dat file in icu4c/source/data/in, that file will be used 53even if you gave ICU custom filter rules. 54 55In order to use Hjson syntax, the `hjson` pip module must be installed on 56your system. You should also consider installing the `jsonschema` module to 57print messages when errors are found in your config file. 58 59 $ pip3 install --user hjson jsonschema 60 61To build ICU4J with custom data, you must first build ICU4C with custom data 62and then generate the JAR file. For more information, read 63[icu4j-readme.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/icu4j-readme.txt). 64 65### Locale Slicing 66 67The simplest way to slice ICU data is by locale. The ICU Data Build Tool 68makes it easy to select your desired locales to suit a number of use cases. 69 70#### Filtering by Language Only 71 72Here is a *filters.json* file that builds ICU data with support for English, 73Chinese, and German, including *all* script and regional variants for those 74languages: 75 76 { 77 "localeFilter": { 78 "filterType": "language", 79 "whitelist": [ 80 "en", 81 "de", 82 "zh" 83 ] 84 } 85 } 86 87The *filterType* "language" only supports slicing by entire languages. 88 89#### Filtering by Locale 90 91For more control, use *filterType* "locale". Here is a *filters.hjson* file that 92includes the same three languages as above, including regional variants, but 93only the default script (e.g., Simplified Han for Chinese): 94 95 localeFilter: { 96 filterType: locale 97 whitelist: [ 98 en 99 de 100 zh 101 ] 102 } 103 104#### Adding Script Variants (includeScripts = true) 105 106You may set the *includeScripts* option to true to include all scripts for a 107language while using *filterType* "locale". This results in behavior similar 108to *filterType* "language". In the following JSON example, all scripts for 109Chinese are included: 110 111 { 112 "localeFilter": { 113 "filterType": "locale", 114 "includeScripts": true, 115 "whitelist": [ 116 "en", 117 "de", 118 "zh" 119 ] 120 } 121 } 122 123If you wish to explicitly list the scripts, you may put the script code in the 124locale tag in the whitelist, and you do not need the *includeScripts* option 125enabled. For example, in Hjson, to include Han Traditional ***but not Han 126Simplified***: 127 128 localeFilter: { 129 filterType: locale 130 whitelist: [ 131 en 132 de 133 zh_Hant 134 ] 135 } 136 137Note: the option *includeScripts* is only supported at the language level; 138i.e., in order to include all scripts for a particular language, you must 139specify the language alone, without a region tag. 140 141#### Removing Regional Variants (includeChildren = false) 142 143If you wish to enumerate exactly which regional variants you wish to support, 144you may use *filterType* "locale" with the *includeChildren* setting turned to 145false. The following *filters.hjson* file includes English (US), English 146(UK), German (Germany), and Chinese (China, Han Simplified), as well as their 147dependencies, *but not* other regional variants like English (Australia), 148German (Switzerland), or Chinese (Taiwan, Han Traditional): 149 150 localeFilter: { 151 filterType: locale 152 includeChildren: false 153 whitelist: [ 154 en_US 155 en_GB 156 de_DE 157 zh_CN 158 ] 159 } 160 161Including dependencies, the above filter would include the following data files: 162 163- root.txt 164- en.txt 165- en_US.txt 166- en_001.txt 167- en_GB.txt 168- de.txt 169- de_DE.txt 170- zh.txt 171- zh_Hans.txt 172- zh_Hans_CN.txt 173- zh_CN.txt 174 175### File Slicing (coarse-grained features) 176 177ICU provides a lot of features, of which you probably need only a small subset 178for your application. Feature slicing is a powerful way to prune out data for 179any features you are not using. 180 181***CAUTION:*** When slicing by features, you must manually include all 182dependencies. For example, if you are formatting dates, you must include not 183only the date formatting data but also the number formatting data, since dates 184contain numbers. Expect to spend a fair bit of time debugging your feature 185filter to get it to work the way you expect it to. 186 187The data for many ICU features live in individual files. The ICU Data Build 188Tool puts puts similar *types* of files into categories. The following table 189summarizes the ICU data files and their corresponding features and categories: 190 191| Feature | Category ID(s) | Data Files <br/> ([icu4c/source/data](https://github.com/unicode-org/icu/tree/master/icu4c/source/data)) | Resource Size <br/> (as of ICU 64) | 192|---|---|---|---| 193| Break Iteration | `"brkitr_rules"` <br/> `"brkitr_dictionaries"` <br/> `"brkitr_tree"` | brkitr/rules/\*.txt <br/> brkitr/dictionaries/\*.txt <br/> brkitr/\*.txt | 522 KiB <br/> **2.8 MiB** <br/> 14 KiB | 194| Charset Conversion | `"conversion_mappings"` | mappings/\*.ucm | **4.9 MiB** | 195| Collation <br/> *[more info](#collation-ucadata)* | `"coll_ucadata"` <br/> `"coll_tree"` | in/coll/ucadata-\*.icu <br/> coll/\*.txt | 511 KiB <br/> **2.8 MiB** | 196| Confusables | `"confusables"` | unidata/confusables\*.txt | 45 KiB | 197| Currencies | `"misc"` <br/> `"curr_supplemental"` <br/> `"curr_tree"` | misc/currencyNumericCodes.txt <br/> curr/supplementalData.txt <br/> curr/\*.txt | 3.1 KiB <br/> 27 KiB <br/> **2.5 MiB** | 198| Language Display <br/> Names | `"lang_tree"` | lang/\*.txt | **2.1 MiB** | 199| Language Tags | `"misc"` | misc/keyTypeData.txt <br/> misc/langInfo.txt <br/> misc/likelySubtags.txt <br/> misc/metadata.txt | 6.8 KiB <br/> 37 KiB <br/> 53 KiB <br/> 33 KiB | 200| Normalization | `"normalization"` | in/\*.nrm except in/nfc.nrm | 160 KiB | 201| Plural Rules | `"misc"` | misc/pluralRanges.txt <br/> misc/plurals.txt | 3.3 KiB <br/> 33 KiB | 202| Region Display <br/> Names | `"region_tree"` | region/\*.txt | **1.1 MiB** | 203| Rule-Based <br/> Number Formatting <br/> (Spellout, Ordinals) | `"rbnf_tree"` | rbnf/\*.txt | 538 KiB | 204| StringPrep | `"stringprep"` | sprep/\*.txt | 193 KiB | 205| Time Zones | `"misc"` <br/> `"zone_tree"` <br/> `"zone_supplemental"` | misc/metaZones.txt <br/> misc/timezoneTypes.txt <br/> misc/windowsZones.txt <br/> misc/zoneinfo64.txt <br/> zone/\*.txt <br/> zone/tzdbNames.txt | 41 KiB <br/> 20 KiB <br/> 22 KiB <br/> 151 KiB <br/> **2.7 MiB** <br/> 4.8 KiB | 206| Transliteration | `"translit"` | translit/\*.txt | 685 KiB | 207| Unicode Character <br/> Names | `"unames"` | in/unames.icu | 269 KiB | 208| Unicode Text Layout | `"ulayout"` | in/ulayout.icu | 14 KiB | 209| Units | `"unit_tree"` | unit/\*.txt | **1.7 MiB** | 210| **OTHER** | `"cnvalias"` <br/> `"misc"` <br/> `"locales_tree"` | mappings/convrtrs.txt <br/> misc/dayPeriods.txt <br/> misc/genderList.txt <br/> misc/numberingSystems.txt <br/> misc/supplementalData.txt <br/> locales/\*.txt | 63 KiB <br/> 19 KiB <br/> 0.5 KiB <br/> 5.6 KiB <br/> 228 KiB <br/> **2.4 MiB** | 211 212#### Additive and Subtractive Modes 213 214The ICU Data Build Tool allows two strategies for selecting features: 215*additive* mode and *subtractive* mode. 216 217The default is to use subtractive mode. This means that all ICU data is 218included, and your configurations can remove or change data from that baseline. 219Additive mode means that you start with an *empty* ICU data file, and you must 220explicitly add the data required for your application. 221 222There are two concrete differences between additive and subtractive mode: 223 224| | Additive | Subtractive | 225|-------------------------|-------------|-------------| 226| Default Feature Filter | `"exclude"` | `"include"` | 227| Default Resource Filter | `"-/"`, `"+/%%ALIAS"`, `"+/%%Parent"` | `"+/"` | 228 229To enable additive mode, add the following setting to your filter file: 230 231 strategy: "additive" 232 233**Caution:** If using `"-/"` or similar top-level exclusion rules, be aware of 234the fields `"+/%%Parent"` and `"+/%%ALIAS"`, which are required in locale tree 235resource bundles. Excluding these paths may cause unexpected locale fallback 236behavior. 237 238#### Filter Types 239 240You may list *filters* for each category in the *featureFilters* section of 241your config file. What follows are examples of the possible types of filters. 242 243##### Inclusion Filter 244 245To include a category, use the string `"include"` as your filter. 246 247 featureFilters: { 248 locales_tree: include 249 } 250 251If the category is a locale tree (ends with `_tree`), the inclusion filter 252resolves to the `localeFilter`; for more information, see the section 253"Locale-Tree Categories." Otherwise, the inclusion filter causes all files in 254the category to be included. 255 256**NOTE:** When subtractive mode is used (default), all categories implicitly 257start with `"include"` as their filter. 258 259##### Exclusion Filter 260 261To exclude an entire category, use *filterType* "exclude". For example, to 262exclude all confusables data: 263 264 featureFilters: { 265 confusables: { 266 filterType: exclude 267 } 268 } 269 270Since ICU 65, you can also write simply: 271 272 featureFilters: { 273 confusables: exclude 274 } 275 276**NOTE:** When additive mode is used, all categories implicitly start with 277`"exclude"` as their filter. 278 279##### File Name Filter 280 281To exclude certain files out of a category, use the file name filter, which is 282the default type of filter when *filterType* is not specified. For example, 283to include the Burmese break iteration dictionary but not any other 284dictionaries: 285 286 featureFilters: { 287 brkitr_dictionaries: { 288 whitelist: [ 289 burmesedict 290 ] 291 } 292 } 293 294Do *not* include directories or file extensions. They will be added 295automatically for you. Note that all files in a particular category have the 296same directory and extension. 297 298You can use either a whitelist or a blacklist for the file name filter. 299 300##### Regex Filter 301 302To exclude filenames matching a certain regular expression, use *filterType* 303"regex". For example, to reject the CJK-specific break iteration rules: 304 305 featureFilters: { 306 brkitr_rules: { 307 filterType: regex 308 blacklist: [ 309 ^.*_cj$ 310 ] 311 } 312 } 313 314The Python standard library [*re* 315module](https://docs.python.org/3/library/re.html) is used for evaluating the 316regular expressions. In case the regular expression engine is changed in the 317future, however, you are encouraged to restrict yourself to a simple set of 318regular expression operators. 319 320As above, do not include directories or file extensions, and you can use 321either a whitelist or a blacklist. 322 323##### Union Filter 324 325You can combine the results of multiple filters with *filterType* "union". 326This filter matches files that match *at least one* of the provided filters. 327The syntax is: 328 329 { 330 filterType: union 331 unionOf: [ 332 { /* filter 1 */ }, 333 { /* filter 2 */ }, 334 // ... 335 ] 336 } 337 338This filter type is useful for combining "locale" filters with different 339includeScripts or includeChildren options. 340 341#### Locale-Tree Categories 342 343Several categories have the `_tree` suffix. These categories are for "locale 344trees": they contain locale-specific data. ***The [localeFilter configuration 345option](#slicing-data-by-locale) sets the default file filter for all `_tree` 346categories.*** 347 348If you want to include different locales for different locale file trees, you 349can override their filter in the *featureFilters* section of the config file. 350For example, to include only Italian data for currency symbols *instead of* 351the common locales specified in *localeFilter*, you can do the following: 352 353 featureFilters: 354 curr_tree: { 355 filterType: locale 356 whitelist: [ 357 it 358 ] 359 } 360 } 361 362You can exclude an entire `_tree` category without affecting other categories. 363For example, to exclude region display names: 364 365 featureFilters: { 366 region_tree: { 367 filterType: exclude 368 } 369 } 370 371Note that you are able to use any of the other filter types for `_tree` 372categories, but you must be very careful that you are including all of the 373correct files. For example, `en_GB` requires `en_001`, and you must always 374include `root`. If you use the "language" or "locale" filter types, this 375logic is done for you. 376 377### Resource Bundle Slicing (fine-grained features) 378 379The third section of the ICU filter config file is *resourceFilters*. With 380this section, you can dive inside resource bundle files to remove even more 381data. 382 383You can apply resource filters to all locale tree categories as well as to 384categories that include resource bundles, such as the `"misc"` category. 385 386For example, consider measurement units. There is one unit file per locale (example: 387[en.txt](https://github.com/unicode-org/icu/blob/master/icu4c/source/data/unit/en.txt)), 388and that file contains data for all measurement units in CLDR. However, if 389you are only formatting distances, for example, you may need the data for only 390a small set of units. 391 392Here is how you could include units of length in the "short" style but no 393other units: 394 395 resourceFilters: [ 396 { 397 categories: [ 398 unit_tree 399 ] 400 rules: [ 401 -/units 402 -/unitsNarrow 403 -/unitsShort 404 +/unitsShort/length 405 ] 406 } 407 ] 408 409Conceptually, the rules are applied from top to bottom. First, all data for 410all three styes of units are removed, and then the short length units are 411added back. 412 413**NOTE:** In subtractive mode, resource paths are *included* by default. In 414additive mode, resource paths are *excluded* by default. 415 416#### Wildcard Character 417 418You can use the wildcard character (`*`) to match a piece of the resource 419path. For example, to include length units for all three styles, you can do: 420 421 resourceFilters: [ 422 { 423 categories: [ 424 unit_tree 425 ] 426 rules: [ 427 -/units 428 -/unitsNarrow 429 -/unitsShort 430 +/*/length 431 ] 432 } 433 ] 434 435The wildcard must be the only character in its path segment. Future ICU 436versions may expand the syntax. 437 438#### Resource Filter for Specific File 439 440The resource filter object takes an optional *files* setting which accepts a 441file filter in the same syntax used above for file filtering. For example, if 442you wanted to apply a filter to misc/supplementalData.txt, you could do the 443following (this example removes calendar data): 444 445 resourceFilters: [ 446 { 447 categories: ["misc"] 448 files: { 449 whitelist: ["supplementalData"] 450 } 451 rules: [ 452 -/calendarData 453 ] 454 } 455 ] 456 457#### Combining Multiple Resource Filter Specs 458 459You can also list multiple resource filter objects in the *resourceFilters* 460array; the filters are added from top to bottom. For example, here is an 461advanced configuration that includes "mile" for en-US and "kilometer" for 462en-CA; this also makes use of the *files* option: 463 464 resourceFilters: [ 465 { 466 categories: ["unit_tree"] 467 rules: [ 468 -/units 469 -/unitsNarrow 470 -/unitsShort 471 ] 472 }, 473 { 474 categories: ["unit_tree"] 475 files: { 476 filterType: locale 477 whitelist: ["en_US"] 478 } 479 rules: [ 480 +/*/length/mile 481 ] 482 }, 483 { 484 categories: ["unit_tree"] 485 files: { 486 filterType: locale 487 whitelist: ["en_CA"] 488 } 489 rules: [ 490 +/*/length/kilometer 491 ] 492 } 493 ] 494 495The above example would give en-US these resource filter rules: 496 497 -/units 498 -/unitsNarrow 499 -/unitsShort 500 +/*/length/mile 501 502and en-CA these resource filter rules: 503 504 -/units 505 -/unitsNarrow 506 -/unitsShort 507 +/*/length/kilometer 508 509In accordance with *filterType* "locale", the parent locales *en* and *root* 510would get both units; this is required since both en-US and en-CA may inherit 511from the parent locale: 512 513 -/units 514 -/unitsNarrow 515 -/unitsShort 516 +/*/length/mile 517 +/*/length/kilometer 518 519## Debugging Tips 520 521**Run Python directly:** If you do not want to wait for ./runConfigureICU to 522finish, you can directly re-generate the rules using your filter file with the 523following command line run from *iuc4c/source*. 524 525 $ PYTHONPATH=python python3 -m icutools.databuilder \ 526 --mode=gnumake --src_dir=data > data/rules.mk 527 528**Install jsonschema:** Install the `jsonschema` pip package to get warnings 529about problems with your filter file. 530 531**See what data is being used:** ICU is instrumented to allow you to trace 532which resources are used at runtime. This can help you determine what data you 533need to include. For more information, see [tracing.md](tracing.md). 534 535**Inspect data/rules.mk:** The Python script outputs the file *rules.mk* 536inside *iuc4c/source/data*. To see what is going to get built, you can inspect 537that file. First build ICU normally, and copy *rules.mk* to 538*rules_default.mk*. Then build ICU with your filter file. Now you can take the 539diff between *rules_default.mk* and *rules.mk* to see exactly what your filter 540file is removing. 541 542**Inspect the output:** After a `make clean` and `make` with a new *rules.mk*, 543you can look inside the directory *icu4c/source/data/out* to see the files 544that got built. 545 546**Inspect the compiled resource filter rules:** If you are using a resource 547filter, the resource filter rules get compiled for each individual locale 548inside *icu4c/source/data/out/tmp/filters*. You can look at those files to see 549what filter rules are being applied to each individual locale. 550 551**Run genrb in verbose mode:** For debugging a resource filter, you can run 552genrb in verbose mode to see which resources got stripped. To do this, first 553inspect the make output and find a command line like this: 554 555 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/genrb --filterDir ./out/tmp/filters/unit_tree -s ./unit -d ./out/build/icudt64l/unit/ -i ./out/build/icudt64l --usePoolBundle ./out/build/icudt64l/unit/ -k en.txt 556 557Copy that command line and re-run it from *icu4c/source/data* with the `-v` 558flag added to the end. The command will print out exactly which resource paths 559are being included and excluded as well as a model of the filter rules applied 560to this file. 561 562**Inspect .res files with derb:** The `derb` tool can convert .res files back 563to .txt files after filtering. For example, to convert the above unit res file 564back to a txt file, you can run this command from *icu4c/source*: 565 566 LD_LIBRARY_PATH=lib bin/derb data/out/build/icudt64l/unit/en.res 567 568That will produce a file *en.txt* in your current directory, which is the 569original *data/unit/en.txt* but after resource filters were applied. 570 571*Tip:* derb expects your res files to be rooted in a directory named 572`icudt64l` (corresponding to your current ICU version and endianness). If your 573files are not in such a directory, derb fails with U_MISSING_RESOURCE_ERROR. 574 575**Put complex rules first** and **use the wildcard `*` sparingly:** The order 576of the filter rules matters a great deal in how effective your data size 577reduction can be, and the wildcard `*` can sometimes produce behavior that is 578tricky to reason about. For example, these three lists of filter rules look 579similar on first glance but acutally produce different output: 580 581<table> 582<tr> 583<th>Unit Resource Filter Rules</th> 584<th>Unit Resource Size</th> 585<th>Commentary</th> 586<th>Result</th> 587</tr> 588<tr><td><pre> 589-/*/* 590+/*/digital 591-/*/digital/*/dnam 592-/durationUnits 593-/units 594-/unitsNarrow 595</pre></td><td>77 KiB</td><td> 596First, remove all unit types. Then, add back digital units across all unit 597widths. Then, remove display names from digital units. Then, remove duration 598unit patterns and long and narrow forms. 599</td><td> 600Digital units in short form are included; all other units are removed. 601</td></tr> 602<tr><td><pre> 603-/durationUnits 604-/units 605-/unitsNarrow 606-/*/* 607+/*/digital 608-/*/digital/*/dnam 609</pre></td><td>125 KiB</td><td> 610First, remove duration unit patterns and long and narrow forms. Then, remove 611all unit types. Then, add back digital units across all unit widths. Then, 612remove display names from digital units. 613</td><td> 614Digital units are included <em>in all widths</em>; all other units are removed. 615</td></tr> 616<tr><td><pre> 617-/*/* 618+/*/digital 619-/*/*/*/dnam 620-/durationUnits 621-/units 622-/unitsNarrow 623</pre></td><td>191 KiB</td><td> 624First, remove all unit types. Then, add back digital units across all unit 625widths. Then, remove display names from all units. Then, remove duration unit 626patterns and long and narrow forms. 627</td><td> 628Digital units in short form are included, as is the <em>tree structure</em> 629for all other units, even though the other units have no real data. 630</td></tr> 631</table> 632 633By design, empty tree structure is retained in the unit bundle. This is 634because there are numerous instances in ICU data where the presence of an 635empty tree carries meaning. However, it means that you must be careful when 636building resource filter rules in order to achieve the optimal data bundle 637size. 638 639Using the `-v` option in genrb (described above) is helpful when debugging 640these types of issues. 641 642## Other Features of the ICU Data Build Tool 643 644While data filtering is the primary reason the ICU Data Build Tool was 645developed, there are there are additional use cases. 646 647### Running Data Build without Configure/Make 648 649You can build the dat file outside of the ICU build system by directly 650invoking the Python icutools.databuilder. Run the following command to see the 651help text for the CLI tool: 652 653 $ PYTHONPATH=path/to/icu4c/source/python python3 -m icutools.databuilder --help 654 655### Collation UCAData 656 657For using collation (sorting and searching) in any language, the "root" 658collation data file must be included. It provides the Unicode CLDR default 659sort order for all code points, and forms the basis for language-specific 660tailorings as well as for custom collators built at runtime. 661 662There are two versions of the root collation data file: 663 664- ucadata-unihan.txt (compiled size: 511 KiB) 665- ucadata-implicithan.txt (compiled size: 178 KiB) 666 667The unihan version sorts Han characters in radical-stroke order according to 668Unicode, which is a somewhat useful default sort order, especially for use 669with non-CJK languages. The implicithan version sorts Han characters in the 670order of their Unicode assignment, which is similar to radical-stroke order 671for common characters but arbitrary for others. For more information, see 672[UTS #10 §10.1.3](https://www.unicode.org/reports/tr10/#Implicit_Weights). 673 674By default, the unihan version is used. The unihan version of the data file 675is much larger than that for implicithan, so if you need collation but also 676small data, then you may want to select the implicithan version. To use the 677implicithan version, put the following setting in your *filters.json* file: 678 679 { 680 "collationUCAData": "implicithan" 681 } 682 683### Disable Pool Bundle 684 685By default, ICU uses a "pool bundle" to store strings shared between locales. 686This saves space and is recommended for most users. However, when developing 687a system where locale data files may be added "on the fly" and not included in 688the original ICU distribution, those additional data files may not be able to 689use a pool bundle due to name collisions with the existing pool bundle. 690 691To disable the pool bundle in the current ICU build, put the following setting 692in your *filters.json* file: 693 694 { 695 "usePoolBundle": false 696 } 697 698### File Substitution 699 700Using the configuration file, you can perform whole-file substitutions. For 701example, suppose you want to replace the transliteration rules for 702*Zawgyi_my*. You could create a directory called `my_icu_substitutions` 703containing your new `Zawgyi_my.txt` rule file, and then put this in your 704configuration file: 705 706 fileReplacements: { 707 directory: "/path/to/my_icu_substitutions" 708 replacements: [ 709 { 710 src: "Zawgyi_my.txt" 711 dest: "translit/Zawgyi_my.txt" 712 }, 713 "misc/dayPeriods.txt" 714 ] 715 } 716 717`directory` should either be an absolute path, or a path starting with one of 718the following, and it should not contain a trailing slash: 719 720- "$SRC" for the *icu4c/source/data* directory in the source tree 721- "$FILTERS" for the directory containing filters.json 722- "$CWD" for your current working directory 723 724When the entry in the `replacements` array is an object, the `src` and `dest` 725fields indicate, for each file in the source directory (`src`), what file in 726the ICU hierarchy it should replace (`dest`). When the entry is a string, the 727same relative path is used for both `src` and `dest`. 728 729Whole-file substitution happens before all other filters are applied. 730