• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Name: icu
2URL: http://site.icu-project.org/
3Version: 4.6
4License: MIT
5Security Critical: yes
6
7Description:
8This directory contains the source code of ICU 4.6 for C/C++
9
101. It was obtained with the following:
11
12    $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-4-6 icu46
13
142. Platform header files for Linux, FreeBSD, OpenBSD, Android and Mac OS X:
15
16   - Apply platform.patch in patches directory. : It applies the upstream
17     patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248)
18     and change source/common/unicode/ptypes.h to refer to plinux.h and
19     pmac.h generated below.
20
21   - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and
22     'runConfigureICU MacOSX' are run to generate
23     source/common/unicode/platform.h.
24
25   - On OpenBSD, source/common/unicode/platform.h is being generated
26     by the icu4c port in the ports directory and not by runConfigureICU.
27     In case the file has to be updated you can do:
28     cd /home/ports/textproc/icu4c && make configure
29
30   - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h'
31
32   - Apply patches/pmach.h.patch on Mac to pmac.h
33
34   - On Android, the pandroid.h was generated by copying plinux.h to
35     pandroid.h and applying the patches/pandroid.h.patch.
36
37   - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h
38
393. The following directories were removed because they're not used by Chromium
40   at the moment:
41   as_is
42   packaging
43   source/extra
44   source/sample
45   source/layout
46   source/layoutex
47
48
494. The word breaking for Chinese and Japanese were modified to use a word
50   frequency list with the following patch and cjdict.txt.
51
52   - patches/segmentation.patch :
53       Adds a dictionary (word-frequency)-based word breaking for CJK
54       (Korean is supported in the code, but it does not do anything
55        because we don't have a Korean word-list.)
56
57   - source/data/brkitr/cjdict.txt :
58       Chinese and Japanese word frequency list.
59       See the file for license/copyright notice
60
61   - source/data/brkitr/cc_edict.txt :
62       the list of words derived from CC-Edict.)
63
64   - patches/brkitr.patch
65     * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific
66                  handling of U+0022, and splitting of FQDN into labels at '.'.
67		  For Hebrew, see http://unicode.org/cldr/track/ticket/3120
68     * line.txt : Incorporated line_he and minor changes in CL, OP and ID
69                  definitions.
70		  For Hebrew, see http://unicode.org/cldr/track/ticket/4004
71		  For others, see http://unicode.org/cldr/track/ticket/3974
72		                  http://unicode.org/cldr/track/ticket/4200
73		                  http://unicode.org/cldr/track/ticket/
74     * brklocal.mk : build file changes to drop unnecessary brkitr rule
75                     files (e.g. word_ja.txt, line_he.txt)
76
77   - android/brkitr.patch (to be applied for Android build only) :
78       Reverts some changes about Chinese/Japanese segmentation rules in
79       patches/brkitr.patch to reduce binary size for Android.
80
81   If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt
82   to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test.
83
845. Converter changes : converters.patch
85  - Include what we really need. See source/data/mappings/ucmlocal.txt
86  - Alias and mapping changes : source/data/mappings/convrtrs.txt
87  - Changes several tables and add six new tables, three of which
88    are 'fake' tables for ISO-2022-CN(-Ext).
89  - ucnv2022.c is modified to use 3 'fake' tables added above for
90    ISO-2022-CN(-Ext).
91
926. Locale changes
93  - patches/locale1.patch :
94      Filipino, Amharic, and Swahili locales
95      exemplar character set changes for CJK + 9 Indian locales
96      Minor fixes for Danish, , Turkish, and Korean.
97
98  - patches/locale2.patch :
99      The minimum locale data Chrome needs for 47 languages Chrome is
100      not localized to. Each locale data file has ExemplarCharacters,
101      LocaleScript, layout, and the name of the language for a locale
102      in its native language.
103
104  - patches/locale3.patch : Locale build configuration files. They
105    add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
106    source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
107
108  - In source/data/region, run the following command to get rid of numeric region
109    display names we don't use (everything other than 419).
110     $ sed -i  '/[0-35-9][0-9][0-9]{/ d' *.txt
111
112  - android/patch_locale.sh (to be run for Android build only):
113      Makes changes to source/data/{curr,region,lang} to exclude these data
114      except the language and script names of zh_Hans and zh_Hant.
115
1167. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt
117
118  - patches/unihan.patch:
119    unihan collation tables are never used in Chrome/Webkit, but it takes
120    about 1MB in the uncompressed ICU data file in ICU 4.2.1.
121
1228. Timezone data update
123  - Grab the latest version of the following timezone data files and
124    put them in source/data/misc.
125
126     metaZones.txt
127     timezoneTypes.txt
128     windowsZones.txt
129     zoneinfo64.txt
130
131   As of Dec 2013, the latest version is 2013h and the above files
132   are available at
133   http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2013h/44/
134
1359. Transliterator customization
136
137   - Add the following files taken from ICU 52 to source/data/trnslit
138
139     {tr,el,az}_{Upper,Lower,Title}.txt
140
141   - Also add css3transform.txt to the same directory
142   - Put the following line in trnslocal.mk
143
144     TRANSLIT_SOURCE=css3transform.txt
145
14610. Build-related changes
147
148  - patches/wpo.patch
149  - patches/vscomp.patch
150    (see http://bugs.icu-project.org/trac/ticket/8355 and
151         http://bugs.icu-project.org/trac/ticket/8356 )
152  - patches/rtti.patch : Make RTTI work without exception handling on Windows
153    (see http://bugs.icu-project.org/trac/ticket/8343)
154  - patches/data.build.patch :
155      To remove some data files we don't use and cut down the data size.
156  - patches/data.build.win.patch :
157      Windows-only data build patch. Add a new target DATALIB to makedata.mak
158  - patches/clang.patch: To build with Clang.
159    (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in
160    the patch have already been fixed in the ICU trunk.)
161  - add an empty file (stubdatabuilt.txt) to source/stubdata
162
16311. Pre-built data libraries are checked in.
164
165    Before building data file on Linux, re-run 'runConfigureICU Linux' again
166    if it's run without data.build.patch in #10 above.
167
168    Because we removed layout and layoutex directories in step 3,
169    'runConfigureICU Linux' will fail even with '--disable-layout'. A
170    work-around is to have a copy of our icu tree in a separate build directory
171    and add back directories we removed in step 3 before
172    running 'runConfigure'.
173
174    'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu
175    to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make'
176    in {BUILD_DIR_ROOT}/data.
177
178    'make' will fail again when pkgdata looks for css3transform.res. Edit
179    data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'.
180    (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again.
181
182
183    - source/data/in/icudt46l.dat : Built on Linux with all the patches
184      above applied. This file will be generated in
185      {BUILD_DIR_ROOT}/data/out/tmp.
186
187    - windows/icudt.dll : With icudt46l.dat in place, all the patches applied
188      and header files moved (#11 below), generated by building icudt_build
189      project of build/icudt_build.sln on Windows. icudt46.dll is
190      generated in bin/{Release,Debug} and copied to windows/icudt.dll
191      and checked in. Note that we drop the version number ('46') from the
192      dll name to avoind having to update our build scripts/configuration
193      files everytime ICU is upgraded to a new version.
194
195    - {mac,linux}/icudt46l_dat.S : Built on Mac and Linux with all the
196      patches above (except android/brkitr.patch) applied and checked in.
197      This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp.
198
199      Alternatively, one can just generate icudt46l_dat.S on Linux and adopt
200      the header portion to match the current header in mac/icudt46l_dat.S.
201      That is as following without no leading space in each line:
202
203          .globl _icudt46_dat
204          #ifdef U_HIDE_DATA_SYMBOL
205                 .private_extern _icudt46_dat
206          #endif
207                 .data
208                 .const
209                 .align 4
210          _icudt46_dat:
211
212
213    - android/icudt46l_dat.S : Built on Linux with all the patches above and
214      android/brkitr.patch applied and android/patch_locale.sh executed, and
215      checked in.
216
21712. Apply the fix found with static analysis tools such as PSV and coverity
218
219  - patches/static.analysis.patch
220  - upstream trunk/4.8 do not have this code any more.
221
22213. Fix for msvs2010 applied:
223--- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
224 (revision 78292)
225+++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp
226 (working copy)
227@@ -75,7 +75,7 @@
228 * Visual Studios 9.0.
229 * Cygwin with MSVC 9.0 also complains here about redefinition.
230 */
231-#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC)
232+#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC)
233 const int32_t StringPiece::npos;
234 #endif
235
23614. Fix for locales that don't use '.' as decimal separator: patches/nan.patch
237  - upstream bug: http://bugs.icu-project.org/trac/ticket/8561
238  - Handle other chars besides the dot. This is required because decNumber's
239    parser expects the dot as a decimal separator.
240  - Locales that don't use dot were producing "NaN" values.
241
24215. Fix a bug in the regex engine.
243  - patches/regex.patch
244  - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream)
245
24616. Apply the upstream patch for Korean search collator support (ICU 4.6.1).
247   - patches/search_collation.patch
248   - upstream bug: http://bugs.icu-project.org/trac/ticket/8290
249
25017. Fix a use of uninitialized memory bug in regular expression matching
251   - patches/rematch.patch
252   - upstream bug: http://bugs.icu-project.org/trac/ticket/8824
253
25418. Make it compile with -Werror on gcc 4.6
255   - patches/gcc46.patch (ToT upstream does not have this code any more).
256
25719. Fix four out of bounds memory access error in common/uloc.c
258    and common/uresbund.c
259   - patches/uloc.patch
260   - upstream bug:
261     1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize)
262     2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords)
263     3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund)
264        http://bugs.icu-project.org/trac/ticket/8813 (uresbund)
265     4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords)
266
26720. Fix a null pointer error in ubrk_setText in ubrk.cpp.
268    - patches/ubrk.patch
269    - upstream bug : http://bugs.icu-project.org/trac/ticket/9115
270
27121. Fix a clang warning in rbbi.cpp by merging in an upstream change.
272    - patches/changeset_30255.patch
273    - upstream change : http://bugs.icu-project.org/trac/changeset/30255
274
27522. Fix time zone handling and compilation on iOS.
276    - patches/ios_timezone.patch
277    - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051
278    -                 http://bugs.icu-project.org/trac/ticket/8661
279
28023. Fix a buffer overflow in utext
281    - patches/utext.patch
282    - upstream change : http://bugs.icu-project.org/trac/changeset/29356
283
28424. Fix compilation errors on VS2012 and above.
285    - patches/vs2012.patch
286
28725. Fix a buffer overflow in UTF-16/32 detection.
288    - patches/csetdet.patch
289    - upstream bug: http://bugs.icu-project.org/trac/ticket/10318
290
29126. Add BreakIterator::getRuleStatus
292    - patches/breakiterator.patch
293    - Copy and paste BreakIterator::getRuleStatus API from ICU 52
294