1Name: icu 2URL: https://github.com/unicode-org/icu 3Version: 72-1 4CPEPrefix: cpe:/a:icu-project:international_components_for_unicode:72.1 5License: MIT 6Security Critical: yes 7 8Description: 9This directory contains the source code of ICU 72.1 for C/C++. 10 11A. How to update ICU 12 131. Run "scripts/update.sh <version>" (e.g. 72-1). 14 This will download ICU from the upstream git repository. 15 It does preserve Chrome-specific build files and 16 converter files. (see section C) 17 18 source.gni and icu.gyp* files are automatically updated, too. 19 202. Review and apply patches/changes in "D. Local Modifications" if 21 necessary/applicable. Update patch files in patches/. 22 233. Follow the instructions in section B on building ICU data files 24 25B. How to build ICU data files 26 27 28Pre-built data files are generated and checked in with the following steps 29 301. icu data files for Chrome OS, Linux, Mac and Windows 31 32 a. Make a icu data build directory outside the Chromium source tree 33 and cd to that directory (say, $ICUBUILDIR). 34 35 b. Run 36 ${CHROME_ICU_TREE_TOP}/scripts/make_data_all.sh 37 38 This script takes the following steps: 39 40 i) Run 41 ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout --disable-tests 42 43 ii) Run make 44 45 iii) (cd data && make clean) 46 47 iv) scripts/config_data.sh common 48 This configure the build with filer for common. 49 50 v) Run make 51 52 vi) scripts/copy_data.sh common 53 This copies the ICU data files for non-Android platforms 54 (both Little and Big Endian) to the following locations: 55 56 common/icudtl.dat 57 common/icudtb.dat 58 59 vii) Repeat step iii) - vi) for chromeos to produce chromeos/icudtl.dat 60 61 viii) cast/patch_locale.sh 62 Modify the file for cast, android, ios and flutter. 63 64 ix) Repeat step iii) - vi) for cast, andriod and ios to produce 65 cast/icudtl.dat 66 andriod/icudtl.dat 67 ios/icudtl.dat 68 69 x) flutter/patch_brkitr.sh 70 On top of cast/patch_locale.sh.sh (step viii)), further patch 71 the code for flutter. 72 73 xi) Repeat step iii) - vi) for flutter to produce 74 flutter/icudtl.dat 75 76 xii) scripts/clean_up_data_source.sh 77 78 This reverts the result of cast/patch_locale.sh and flutter/patch_brkitr.sh 79 make the tree ready for committing updated ICU data files for 80 non-Android and Android platforms. 81 82 c. Whenever data is updated (e.g timezone update), take step b as long 83 as the ICU build directory used in a. is kept. 84 852. Note on the locale data customization 86 87 - filter/chromeos.json 88 a. Filter the locale data for ChromeOS's UI langauges : 89 locales, lang, region, currency, zone 90 b. Filter the locale data for non-UI languages to the bare minimum : 91 ExemplarCharacters, LocaleScript, layout, and the name of the 92 language for a locale in its native language. 93 c. Filter the legacy Chinese character set-based collation 94 (big5han/gb2312han) that don't make any sense and nobdoy uses. 95 96 - filter/common.json 97 Same as above in filter/chromeos.json, AND 98 e. Filter exemplar cities in timezone data (data/zone). 99 100 - filter/android.json and filter/ios.json 101 a. Filter the locale data for Android / iOS UI langauges : 102 locales, lang, region, currency, zone 103 b. Filter the locale data for non-UI languages to the bare minimum : 104 ExemplarCharacters, LocaleScript, layout, and the name of the 105 language for a locale in its native language. 106 c. Filter the legacy Chinese character set-based collation 107 d. Filter source/data/{region,lang} to exclude these data 108 except the language and script names of zh_Hans and zh_Hant. 109 e. Keep only the minimal calendar data in data/locales. 110 f. Include currency display names for a smaller subset of currencies. 111 g. Minimize the locale data for 9 locales to which Chrome on Android 112 is not localized. 113 114 115C. Chromium-specific data build files and converters 116 117They're preserved in step A.1 above. In general, there's no need to touch 118them when updating ICU. 119 1201. source/data/mappings 121 - convrtrs.txt : Lists encodings and aliases required by the WHATWG 122 Encoding spec plus a few extra (see the file as to why). 123 124 - ucmlocal.txt : to list only converters we need. 125 126 - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP, 127 Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. 128 They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh. 129 130 - gb18030.ucm and windows-936.ucm 131 gb_table.patch was applied for the following changes. No need 132 to apply it again. The patch is kept for the record. 133 a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per 134 the encoding spec (one-way mapping in toUnicode direction). 135 b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map 136 from U+1E3F to \xA8\xBC (windows-936/GBK). 137 See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 138 1392. source/data/brkitr 140 - dictionaries/khmerdict.txt: Abridged Khmer dictionary. See 141 https://unicode-org.atlassian.net/browse/ICU-9451 142 - dictionaries/laodict.txt: Abridged Lao dictionary. We keep using the smaller 143 old version from ICU69-1. 144 - rules/word_ja.txt (used only on Android) 145 Added for Japanese-specific word-breaking without the C+J dictionary. 146 - rules/{root,zh,zh_Hant}.txt 147 a. Use line_normal by default. 148 b. Drop local patches we used to have for the following issues. They'll 149 be dealt with in the upstream (Unicode/CLDR). 150 http://unicode.org/cldr/trac/ticket/6557 151 http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) 152 1533. Add {an,ku,tg,wa}.txt to source/data/{locale,lang} 154 with the minimal locale data necessary for spellchecker and 155 and language menus. 156 157D. Local Modifications 158 1591. Applied locale data patches from Google obtained by diff'ing 160 the upstream copy and Google's internal copy for source/data 161 162 - patches/locale_google.patch: 163 * Google's internal ICU locale changes 164 * Simpler region names for Hong Kong and Macau in all locales 165 * Currency signs in ru and uk locales (do not include 'tr' locale changes) 166 * AM/PM, midnight, noon formatting for a few Indian locales 167 * Timezone name changes in Korean and Chinese locales 168 * Default digit for Arabic locale is European digits. 169 170 - patches/locale1.patch: Minor fixes for Korean 171 172 1732. Breakiterator patches 174 - patches/wordbrk.patch for word.txt, word_POSIX.txt, and word_fi_sv.txt 175 a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that 176 FQDN labels can be split at '.' 177 b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. 178 See http://unicode.org/cldr/trac/ticket/6555 179 c. Restore pre-ICU 72 behavior of breaking at '@'. The new upstream behavior 180 of not breaking at '@' interacted badly with the local change to break at 181 '.' (D.2.a above): although not breaking at '@' is intended to not break 182 within e-mail addresses, this is not possible with Chromium's 183 break-at-'.' behavior. 184 185 - patches/khmer-dictbe.patch 186 Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt). 187 https://unicode-org.atlassian.net/browse/ICU-9451 188 189 - Add several common Chinese words that were dropped previously to 190 source/data/cjdict/brkitr/cjdict.txt 191 patch: patches/cjdict.patch 192 upstream bug: https://unicode-org.atlassian.net/browse/ICU-10888 193 1943. Timezone data update 195 Run scripts/update_tz.sh to grab the latest version of the 196 following timezone data files and put them in source/data/misc 197 198 metaZones.txt 199 timezoneTypes.txt 200 windowsZones.txt 201 zoneinfo64.txt 202 203 As of Mar 31, 2023, the latest version is 2023c 204 and the above files are available at the ICU github repos. 205 2064. Build-related changes 207 208 - patches/configure.patch: 209 * Remove a section of configure that will cause breakage while 210 running runConfigureICU. 211 212 - patches/wpo.patch (only needed when icudata dll is used). 213 upstream bugs : https://unicode-org.atlassian.net/browse/ICU-8043 214 https://unicode-org.atlassian.net/browse/ICU-5701 215 216 - patches/data_symb.patch : 217 Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use 218 the icu data file or icudt.dll 219 2205. ISO-2022-JP encoding (fromUnicode) change per WHATWG encoding spec. 221 - patches/iso2022jp.patch 222 - upstream bug: 223 https://unicode-org.atlassian.net/browse/ICU-20251 224 2256. Enable tracing of file but not resource, only for Chromium 226 to reduce performance impact/risk. 227 - patches/restrace.patch 228 2297. Patch Arabic date time pattern back to 67 value to avoid test 230 breakage in 231 third_party/blink/web_tests/fast/forms/datetimelocal/datetimelocal-appearance-l10n.html 232 - patches/ardatepattern.patch 233 - https://bugs.chromium.org/p/chromium/issues/detail?id=1139186 234 2358. Remove explicit std::atomic<NumberRangeFormatterImpl*> template 236 instantiation 237 patches/atomic_template_instantiation.patch 238 - The explicit instantiation was added to silence MSVC C4251 warnings: 239 https://unicode-org.atlassian.net/browse/ICU-20157 240 Small test cases show that it is generally an error to instantiate 241 std::atomic<T*> with an incomplete type T with MSVC, clang, and GCC, so this 242 instantiation never should have worked: 243 https://gcc.godbolt.org/z/34xx8h 244 At this time, it's not clear if this particular instantiation with 245 NumberRangeFormatterImpl* was ever necessary for MSVC. Further testing with 246 MSVC is required to upstream this patch. 247 - https://unicode-org.atlassian.net/browse/ICU-21482 248 2499. Patch source/common/uposixdefs.h so it compiles on Fuchsia on Macs. 250 patches/fuchsia.patch 251 - context bug: https://bugs.chromium.org/p/chromium/issues/detail?id=1184527 252 25310. Patch ICU to fix mix usage of UBool which break win64_msvc 254 - patches/fix-bool-mix-use.patch 255 - https://github.com/unicode-org/icu/pull/2255 256 25711. Patch ICU en-CA date pattern back to y-MM-dd format 258 patches/revert-en-ca-date.patch 259