• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1This directory contains the source code of ICU 4.2.1 for C/C++
2
31. It was obtained with the following:
4
5    $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-2-1 icu42
6
72. The following directories were removed because they're not used by Chromium
8   at the moment:
9   as_is
10   packaging
11   source/extra
12   source/sample
13   source/layout
14   source/layoutex
15
163. Platform header files for Linux and Mac OS X:
17   On Linux and Mac OS X, 'runConfigureICU Linux' and 'runConfigureICU MacOSX'
18   are run to generate source/common/unicode/platform.h.
19   Rename it to 'plinux.h' and 'pmac.h' on Linux and Mac and check them in.
20
21   The Mac 'pmac.h' file needs to have patches/pmac.h.patch applied.
22
23   Change source/common/unicode/umachine.h to refer to plinux.h and pmac.h
24   on Linux and Mac, respetively.
25
264. To avoid name collisions (two different versions of StringPiece
27   are in Chrome's base and ICU), make the use of 'icu::' namespace
28   qualifier required by setting U_USING_ICU_NAMESPACE to 0 in
29   source/common/unicode/uversion.h
30
31   In addition, the patches for ICU ticket 6935
32   (http://icu-project.org/trac/ticket/6935) are applied.
33
34   The combined patch is patches/namespace.patch.txt
35
365. The word breaking for Chinese and Japanese were modified to use a word
37   frequency list with the following patch and cjdict.txt.
38
39   In addition, the word breaking rule for ASCII and full-width full stop(period)
40   surrounded by letters has been modified to fit our need for segmenting
41   a host name into its components  (e.g. treating 'www.google.com' not as
42   a single word but as 5 words). It's what ICU 3.8 did before UTR 29
43   changed the rule (WB #6, #7).  This also let us pass
44   LayoutTests/css1/text_properties/text_transform.html without rebaselining.
45
46   These patches alone will not work without build-related changes mentioned
47   in #10 below.
48
49   - patches/segmentation.patch.txt :
50       Adds a dictionary (word-frequency)-based word breaking for CJK
51       (Korean is supported in the code, but it does not do anything
52        because we don't have a Korean word-list.)
53
54   - source/data/brkitr/cjdict.txt :
55       Chinese and Japanese word frequency list.
56       See the file for license/copyright notice
57
58   - source/data/brkitr/cc_edict.txt :
59       the list of words derived from CC-Edict.)
60
61   The following two files were removed (because Japanese breaking rules
62   are now the same as that of other langauges).
63
64   - source/data/brkitr/word_ja.txt
65   - source/data/brkitr/ja.txt
66
67   If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
68   to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
69
706. A minor break iterator change
71
72   - patches/brkitr.patch.txt
73
747. Converter changes : converters.patch.txt
75  - Include what we really need. See source/data/mappings/ucmlocal.txt
76  - Alias and mapping changes : source/data/mappings/convrtrs.txt
77  - Changes several tables and add six new tables, three of which
78    are 'fake' tables for ISO-2022-CN(-Ext).
79  - ucnv2022.c is modified to use 3 'fake' tables added above for
80    ISO-2022-CN(-Ext).
81
828. Locale changes
83  - patches/locale1.patch.txt :
84      Filipino locale, exemplar character set changes for CJK + 9 Indian
85      locales with minor fixes for Danish, Hungarian, Turkish, Korean
86      and Catalan.
87
88  - patches/locale2.patch.txt :
89      The minimum locale data Chrome needs for 35 languages Chrome is
90      not localized to. Each locale data file has ExemplarCharacters,
91      LocaleScript, layout, and the name of the language for a locale
92      in its native language.
93
94  - patches/locale3.patch.txt : Locale build configuration files
95
969. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
97
98  - patches/unihan.patch.txt:
99    unihan collation tables are never used in Chrome/Webkit, but it takes
100    about 1MB in the uncompressed ICU data file in ICU 4.2.1.
101
10210. Build-related changes
103
104  - patches/wpo.patch
105  - patches/windows.patch
106  - patches/data.build.patch :
107      To remove some data files we don't use and cut down the data size.
108  - patches/data.build.win.patch :
109      Windows-only data build patch. Add a new target DATALIB to makedata.mak
110  - add an empty file (stubdatabuilt.txt) to source/stubdata
111
11211. Pre-built data libraries are checked in.
113
114    - source/data/in/icudt42l.dat : Built on Linux with all the patches
115      above applied,
116    - icudt42.dll : With icudt42l.dat in place, all the patches applied
117      and header files moved (#11 below), generated in bin/ by building
118      icudt_build project of build/icudt_build.sln on Windows.
119      It's made in bin/ and moved to the top and checked in.
120    - {mac,linux}/icudt42l_dat.s : Built on Mac and Linux with all the
121      patches above applied and checked in.
122      linux needs the '@' in the preamble changed to '%'. See
123      http://codereview.chromium.org/215026.
124      mac/icudt42l_dat.s needs one line added after it is generated.  A
125      .private_extern directive needs to be added so that the top of the
126      file looks like:
127
128.globl _icudt42_dat
129        .private_extern _icudt42_dat
130        .data
131
13212. The header files were moved as shown below:
133
134   source/common/unicode ==> public/common/unicode
135   source/i18n/unicode   ==> public/i18n/unicode
136
13713. The patch for a memory leak in i18n/timezone.cpp (Windows only):
138    see http://bugs.icu-project.org/trac/ticket/7135
139
140    - patches/tzmemory.patch
141
14214. The patch for a crash in common/putil.c (Linux only):
143    see http://bugs.icu-project.org/trac/ticket/7177
144
145    - patches/linuxtz.patch
146
14715. The patch for Linux locale detection
148
149    - patches/locdet.patch
150