1This directory contains the source code of ICU 4.2.1 for C/C++ 2 31. It was obtained with the following: 4 5 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-2-1 icu42 6 72. The following directories were removed because they're not used by Chromium 8 at the moment: 9 as_is 10 packaging 11 source/extra 12 source/sample 13 source/layout 14 source/layoutex 15 163. Platform header files for Linux and Mac OS X: 17 On Linux and Mac OS X, 'runConfigureICU Linux' and 'runConfigureICU MacOSX' 18 are run to generate source/common/unicode/platform.h. 19 Rename it to 'plinux.h' and 'pmac.h' on Linux and Mac and check them in. 20 21 The Mac 'pmac.h' file needs to have patches/pmac.h.patch applied. 22 23 Change source/common/unicode/umachine.h to refer to plinux.h and pmac.h 24 on Linux and Mac, respetively. 25 264. To avoid name collisions (two different versions of StringPiece 27 are in Chrome's base and ICU), make the use of 'icu::' namespace 28 qualifier required by setting U_USING_ICU_NAMESPACE to 0 in 29 source/common/unicode/uversion.h 30 31 In addition, the patches for ICU ticket 6935 32 (http://icu-project.org/trac/ticket/6935) are applied. 33 34 The combined patch is patches/namespace.patch.txt 35 365. The word breaking for Chinese and Japanese were modified to use a word 37 frequency list with the following patch and cjdict.txt. 38 39 In addition, the word breaking rule for ASCII and full-width full stop(period) 40 surrounded by letters has been modified to fit our need for segmenting 41 a host name into its components (e.g. treating 'www.google.com' not as 42 a single word but as 5 words). It's what ICU 3.8 did before UTR 29 43 changed the rule (WB #6, #7). This also let us pass 44 LayoutTests/css1/text_properties/text_transform.html without rebaselining. 45 46 These patches alone will not work without build-related changes mentioned 47 in #10 below. 48 49 - patches/segmentation.patch.txt : 50 Adds a dictionary (word-frequency)-based word breaking for CJK 51 (Korean is supported in the code, but it does not do anything 52 because we don't have a Korean word-list.) 53 54 - source/data/brkitr/cjdict.txt : 55 Chinese and Japanese word frequency list. 56 See the file for license/copyright notice 57 58 - source/data/brkitr/cc_edict.txt : 59 the list of words derived from CC-Edict.) 60 61 The following two files were removed (because Japanese breaking rules 62 are now the same as that of other langauges). 63 64 - source/data/brkitr/word_ja.txt 65 - source/data/brkitr/ja.txt 66 67 If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt 68 to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test. 69 706. A minor break iterator change 71 72 - patches/brkitr.patch.txt 73 747. Converter changes : converters.patch.txt 75 - Include what we really need. See source/data/mappings/ucmlocal.txt 76 - Alias and mapping changes : source/data/mappings/convrtrs.txt 77 - Changes several tables and add six new tables, three of which 78 are 'fake' tables for ISO-2022-CN(-Ext). 79 - ucnv2022.c is modified to use 3 'fake' tables added above for 80 ISO-2022-CN(-Ext). 81 828. Locale changes 83 - patches/locale1.patch.txt : 84 Filipino locale, exemplar character set changes for CJK + 9 Indian 85 locales with minor fixes for Danish, Hungarian, Turkish, Korean 86 and Catalan. 87 88 - patches/locale2.patch.txt : 89 The minimum locale data Chrome needs for 35 languages Chrome is 90 not localized to. Each locale data file has ExemplarCharacters, 91 LocaleScript, layout, and the name of the language for a locale 92 in its native language. 93 94 - patches/locale3.patch.txt : Locale build configuration files 95 969. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt 97 98 - patches/unihan.patch.txt: 99 unihan collation tables are never used in Chrome/Webkit, but it takes 100 about 1MB in the uncompressed ICU data file in ICU 4.2.1. 101 10210. Build-related changes 103 104 - patches/wpo.patch 105 - patches/windows.patch 106 - patches/data.build.patch : 107 To remove some data files we don't use and cut down the data size. 108 - patches/data.build.win.patch : 109 Windows-only data build patch. Add a new target DATALIB to makedata.mak 110 - add an empty file (stubdatabuilt.txt) to source/stubdata 111 11211. Pre-built data libraries are checked in. 113 114 - source/data/in/icudt42l.dat : Built on Linux with all the patches 115 above applied, 116 - icudt42.dll : With icudt42l.dat in place, all the patches applied 117 and header files moved (#11 below), generated in bin/ by building 118 icudt_build project of build/icudt_build.sln on Windows. 119 It's made in bin/ and moved to the top and checked in. 120 - {mac,linux}/icudt42l_dat.s : Built on Mac and Linux with all the 121 patches above applied and checked in. 122 linux needs the '@' in the preamble changed to '%'. See 123 http://codereview.chromium.org/215026. 124 mac/icudt42l_dat.s needs one line added after it is generated. A 125 .private_extern directive needs to be added so that the top of the 126 file looks like: 127 128.globl _icudt42_dat 129 .private_extern _icudt42_dat 130 .data 131 13212. The header files were moved as shown below: 133 134 source/common/unicode ==> public/common/unicode 135 source/i18n/unicode ==> public/i18n/unicode 136 13713. The patch for a memory leak in i18n/timezone.cpp (Windows only): 138 see http://bugs.icu-project.org/trac/ticket/7135 139 140 - patches/tzmemory.patch 141 14214. The patch for a crash in common/putil.c (Linux only): 143 see http://bugs.icu-project.org/trac/ticket/7177 144 145 - patches/linuxtz.patch 146 14715. The patch for Linux locale detection 148 149 - patches/locdet.patch 150