• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2016 and later: Unicode, Inc. and others.
2* License & terms of use: http://www.unicode.org/copyright.html
3* Copyright (C) 2004-2016, International Business Machines
4* Corporation and others.  All Rights Reserved.
5*
6*   file name:  changes.txt
7*   encoding:   US-ASCII
8*   tab size:   8 (not used)
9*   indentation:4
10*
11*   created on: 2004may06
12*   created by: Markus W. Scherer
13
14* change log for Unicode updates
15
16For an overview, see https://unicode-org.github.io/icu/processes/unicode-update
17
18---------------------------------------------------------------------------- ***
19
20* New ISO 15924 script codes
21
22Normally, add new script codes as part of a Unicode update.
23See https://unicode-org.github.io/icu/processes/release/tasks/standards#update-script-code-enums
24and see the change logs below.
25
26---------------------------------------------------------------------------- ***
27
28Unicode 14.0 update for ICU 70
29
30https://www.unicode.org/versions/Unicode14.0.0/
31https://www.unicode.org/versions/beta-14.0.0.html
32https://www.unicode.org/Public/14.0.0/ucd/
33https://www.unicode.org/reports/uax-proposed-updates.html
34https://www.unicode.org/reports/tr44/tr44-27.html
35
36https://unicode-org.atlassian.net/browse/CLDR-14801
37https://unicode-org.atlassian.net/browse/ICU-21635
38
39* Command-line environment setup
40
41export UNICODE_DATA=~/unidata/uni14/20210903
42export CLDR_SRC=~/cldr/uni/src
43export ICU_ROOT=~/icu/uni
44export ICU_SRC=$ICU_ROOT/src
45export ICUDT=icudt70b
46export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
47export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
48export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
49
50*** Unicode version numbers
51- makedata.mak
52- uchar.h
53- com.ibm.icu.util.VersionInfo
54- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
55
56- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
57    so that the makefiles see the new version number.
58  cd $ICU_ROOT/dbg/icu4c
59  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
60
61*** data files & enums & parser code
62
63* download files
64- same as for the early Unicode Tools setup and data refresh:
65  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
66  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
67- mkdir -p $UNICODE_DATA
68- download Unicode files into $UNICODE_DATA
69  + subfolders: emoji, idna, security, ucd, uca
70  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
71  + split Unihan into single-property files
72    ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
73  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
74    or from the UCD/cldr/ output folder of the Unicode Tools:
75    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
76  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
77    or
78  cp ~/unitools/mine/Generated/UCD/d19/cldr/GraphemeBreakTest-cldr-14.0.0d19.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
79
80* for manual diffs and for Unicode Tools input data updates:
81  remove version suffixes from the file names
82    ~$ unidata/desuffixucd.py $UNICODE_DATA
83  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
84
85* process and/or copy files
86- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
87  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
88  + For debugging, and tweaking how ppucd.txt is written,
89    the tool has an --only_ppucd option:
90    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
91
92- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
93
94* new constants for new property values
95- preparseucd.py error:
96    ValueError: missing uchar.h enum constants for some property values:
97    [(u'blk', set([u'Toto', u'Tangsa', u'Cypro_Minoan', u'Arabic_Ext_B', u'Vithkuqi', u'Old_Uyghur', u'Latin_Ext_F', u'UCAS_Ext_A', u'Kana_Ext_B', u'Ethiopic_Ext_B', u'Latin_Ext_G', u'Znamenny_Music'])),
98    (u'jg', set([u'Vertical_Tail', u'Thin_Yeh'])),
99    (u'sc', set([u'Toto', u'Ougr', u'Vith', u'Tnsa', u'Cpmn']))]
100  = PropertyValueAliases.txt new property values (diff old & new .txt files)
101    ~/unidata$ diff -u uni13/20200304/ucd/PropertyValueAliases.txt uni14/20210609/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
102    +age; 14.0                             ; V14_0
103    +blk; Arabic_Ext_B                     ; Arabic_Extended_B
104    +blk; Cypro_Minoan                     ; Cypro_Minoan
105    +blk; Ethiopic_Ext_B                   ; Ethiopic_Extended_B
106    +blk; Kana_Ext_B                       ; Kana_Extended_B
107    +blk; Latin_Ext_F                      ; Latin_Extended_F
108    +blk; Latin_Ext_G                      ; Latin_Extended_G
109    +blk; Old_Uyghur                       ; Old_Uyghur
110    +blk; Tangsa                           ; Tangsa
111    +blk; Toto                             ; Toto
112    +blk; UCAS_Ext_A                       ; Unified_Canadian_Aboriginal_Syllabics_Extended_A
113    +blk; Vithkuqi                         ; Vithkuqi
114    +blk; Znamenny_Music                   ; Znamenny_Musical_Notation
115    +jg ; Thin_Yeh                         ; Thin_Yeh
116    +jg ; Vertical_Tail                    ; Vertical_Tail
117    +sc ; Cpmn                             ; Cypro_Minoan
118    +sc ; Ougr                             ; Old_Uyghur
119    +sc ; Tnsa                             ; Tangsa
120    +sc ; Toto                             ; Toto
121    +sc ; Vith                             ; Vithkuqi
122  -> add new blocks to uchar.h before UBLOCK_COUNT
123    use long property names for enum constants,
124    for the trailing comment get the block start code point: diff old & new Blocks.txt
125    ~/unidata$ diff -u uni13/20200304/ucd/Blocks.txt uni14/20210609/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
126    +0870..089F; Arabic Extended-B
127    +10570..105BF; Vithkuqi
128    +10780..107BF; Latin Extended-F
129    +10F70..10FAF; Old Uyghur
130    -11700..1173F; Ahom
131    +11700..1174F; Ahom
132    +11AB0..11ABF; Unified Canadian Aboriginal Syllabics Extended-A
133    +12F90..12FFF; Cypro-Minoan
134    +16A70..16ACF; Tangsa
135    -18D00..18D8F; Tangut Supplement
136    +18D00..18D7F; Tangut Supplement
137    +1AFF0..1AFFF; Kana Extended-B
138    +1CF00..1CFCF; Znamenny Musical Notation
139    +1DF00..1DFFF; Latin Extended-G
140    +1E290..1E2BF; Toto
141    +1E7E0..1E7FF; Ethiopic Extended-B
142    (ignore blocks whose end code point changed)
143  -> add new blocks to UCharacter.UnicodeBlock IDs
144    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
145            replace  public static final int \1_ID = \2; \3
146  -> add new blocks to UCharacter.UnicodeBlock objects
147    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
148            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
149  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
150    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
151            replace  public static final int \1 = \2; \3
152  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
153      and in com.ibm.icu.dev.test.lang.TestUScript.java
154  -> add new joining groups to uchar.h & UCharacter.JoiningGroup
155
156* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
157    (not strictly necessary for NOT_ENCODED scripts)
158  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
159
160* build ICU
161  to make sure that there are no syntax errors
162
163  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
164
165* update spoof checker UnicodeSet initializers:
166    inclusionPat & recommendedPat in i18n/uspoof.cpp
167    INCLUSION & RECOMMENDED in SpoofChecker.java
168- make sure that the Unicode Tools tree contains the latest security data files
169- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
170- run the tool (no special environment variables needed)
171- copy & paste from the Console output into the .cpp & .java files
172
173* Bazel build process
174
175See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
176for an overview and for setup instructions.
177
178Consider running `bazelisk --version` outside of the $ICU_SRC folder
179to find out the latest `bazel` version, and
180copying that version number into the $ICU_SRC/.bazeliskrc config file.
181(Revert if you find incompatibilities, or, better, update our build & config files.)
182
183* generate data files
184
185- remember to define the environment variables
186  (see the start of the section for this Unicode version)
187- cd $ICU_SRC
188- optional but not necessary:
189    bazelisk clean
190- build/bootstrap/generate new files:
191    icu4c/source/data/unidata/generate.sh
192
193* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
194  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
195- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
196- Unicode 6.0..14.0: U+2260, U+226E, U+226F
197- nothing new in this Unicode version, no test file to update
198
199* run & fix ICU4C tests
200- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
201- update CLDR GraphemeBreakTest.txt
202    cd ~/unitools/mine/Generated
203    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
204    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
205    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
206- Andy helps with RBBI & spoof check test failures
207
208* collation: CLDR collation root, UCA DUCET
209
210- UCA DUCET goes into Mark's Unicode tools,
211  and a tool-tailored version goes into CLDR, see
212    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
213
214- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
215    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
216- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
217    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
218    (note removing the underscore before "Rules")
219    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
220- restore TODO diffs in UCARules.txt
221    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
222- update (ICU4C)/source/test/testdata/CollationTest_*.txt
223  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
224  from the CLDR root files (..._CLDR_..._SHORT.txt)
225    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
226    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
227    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
228- if CLDR common/uca/unihan-index.txt changes, then update
229  CLDR common/collation/root.xml <collation type="private-unihan">
230  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
231
232- generate data files, as above (generate.sh), now to pick up new collation data
233- update CollationFCD.java:
234  copy & paste the initializers of lcccIndex[] etc. from
235    ICU4C/source/i18n/collationfcd.cpp to
236    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
237- rebuild ICU4C (make clean, make check, as usual)
238
239* Unihan collators
240    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
241- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
242  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
243- generate ICU zh collation data
244    instructions inspired by
245    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
246    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
247  + setup:
248    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
249        (didn't work without setting JAVA_HOME,
250         nor with the Google default of /usr/local/buildtools/java/jdk
251         [Google security limitations in the XML parser])
252    export TOOLS_ROOT=~/icu/uni/src/tools
253    export CLDR_DIR=~/cldr/uni/src
254    export CLDR_DATA_DIR=~/cldr/uni/src
255        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
256    cd "$TOOLS_ROOT/cldr/lib"
257    ./install-cldr-jars.sh "$CLDR_DIR"
258  + generate the files we need
259    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
260    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
261  + diff
262    cd $ICU_SRC
263    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
264    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
265  + copy into the source tree
266    cd $ICU_SRC
267    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
268    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
269- rebuild ICU4C
270
271* run & fix ICU4C tests, now with new CLDR collation root data
272- run all tests with the collation test data *_SHORT.txt or the full files
273  (the full ones have comments, useful for debugging)
274- note on intltest: if collate/UCAConformanceTest fails, then
275  utility/MultithreadTest/TestCollators will fail as well;
276  fix the conformance test before looking into the multi-thread test
277
278* update Java data files
279- refresh just the UCD/UCA-related/derived files, just to be safe
280- see (ICU4C)/source/data/icu4j-readme.txt
281- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
282- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
283    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
284    you need to reconfigure with unicore data; see the "configure" line above.
285  output:
286    ...
287    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
288    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt70b
289    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b
290    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt70l.dat ./out/icu4j/icudt70b.dat -s ./out/build/icudt70l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt70b
291    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b"
292    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt70b/
293    mkdir -p /tmp/icu4j/main/shared/data
294    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
295    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt70b/
296    mkdir -p /tmp/icu4j/main/shared/data
297    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
298    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
299- copy the big-endian Unicode data files to another location,
300  separate from the other data files,
301  and then refresh ICU4J
302    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
303    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
304    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
305    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
306    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
307    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
308    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
309    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
310    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
311    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
312
313* When refreshing all of ICU4J data from ICU4C
314- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
315- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
316or
317- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
318
319* refresh Java test .txt files
320- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
321    cd $ICU_SRC/icu4c/source/data/unidata
322    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
323    cd ../../test/testdata
324    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
325    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
326
327* run & fix ICU4J tests
328
329*** API additions
330- send notice to icu-design about new born-@stable API (enum constants etc.)
331
332*** CLDR numbering systems
333- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
334  for example:
335    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-13.txt
336    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
337    ~/icu/uni/src$ diff -u /tmp/icu/nv4-13.txt /tmp/icu/nv4-14.txt
338    -->
339    +cp;16AC4;-Alpha;gc=Nd;-IDS;lb=NU;na=TANGSA DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
340  Unicode 14:
341    tnsa 16AC0..16AC9 Tangsa
342    https://github.com/unicode-org/cldr/pull/1326
343
344*** merge the Unicode update branches back onto the trunk
345- do not merge the icudata.jar and testdata.jar,
346  instead rebuild them from merged & tested ICU4C
347- make sure that changes to Unicode tools are checked in:
348  https://github.com/unicode-org/unicodetools
349
350---------------------------------------------------------------------------- ***
351
352Unicode 13.0 update for ICU 66
353
354https://www.unicode.org/versions/Unicode13.0.0/
355https://www.unicode.org/versions/beta-13.0.0.html
356https://www.unicode.org/Public/13.0.0/ucd/
357https://www.unicode.org/reports/uax-proposed-updates.html
358https://www.unicode.org/reports/tr44/tr44-25.html
359
360https://unicode-org.atlassian.net/browse/CLDR-13387
361https://unicode-org.atlassian.net/browse/ICU-20893
362
363* Command-line environment setup
364
365UNICODE_DATA=~/unidata/uni13/20200212
366CLDR_SRC=~/cldr/uni/src
367ICU_ROOT=~/icu/uni
368ICU_SRC=$ICU_ROOT/src
369ICUDT=icudt66b
370ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
371ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
372export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
373
374*** Unicode version numbers
375- makedata.mak
376- uchar.h
377- com.ibm.icu.util.VersionInfo
378- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
379
380- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
381    so that the makefiles see the new version number.
382  cd $ICU_ROOT/dbg/icu4c
383  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
384
385*** data files & enums & parser code
386
387* download files
388- mkdir -p $UNICODE_DATA
389- download Unicode files into $UNICODE_DATA
390  + subfolders: emoji, idna, security, ucd, uca
391  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
392  + split Unihan into single-property files
393    ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
394  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
395    or from the ucd/cldr/ output folder of the Unicode Tools:
396    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
397  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
398
399* for manual diffs and for Unicode Tools input data updates:
400  remove version suffixes from the file names
401    ~$ unidata/desuffixucd.py $UNICODE_DATA
402  (see https://sites.google.com/site/unicodetools/inputdata)
403
404* process and/or copy files
405- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
406  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
407  + For debugging, and tweaking how ppucd.txt is written,
408    the tool has an --only_ppucd option:
409    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
410
411- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
412
413* new constants for new property values
414- preparseucd.py error:
415    ValueError: missing uchar.h enum constants for some property values:
416    [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
417        u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
418    (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
419    (u'InPC', set([u'Top_And_Bottom_And_Left']))]
420  = PropertyValueAliases.txt new property values (diff old & new .txt files)
421    blk; Chorasmian                       ; Chorasmian
422    blk; CJK_Ext_G                        ; CJK_Unified_Ideographs_Extension_G
423    blk; Dives_Akuru                      ; Dives_Akuru
424    blk; Khitan_Small_Script              ; Khitan_Small_Script
425    blk; Lisu_Sup                         ; Lisu_Supplement
426    blk; Symbols_For_Legacy_Computing     ; Symbols_For_Legacy_Computing
427    blk; Tangut_Sup                       ; Tangut_Supplement
428    blk; Yezidi                           ; Yezidi
429  -> add to uchar.h before UBLOCK_COUNT
430    use long property names for enum constants,
431    for the trailing comment get the block start code point: diff old & new Blocks.txt
432  -> add to UCharacter.UnicodeBlock IDs
433    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
434            replace  public static final int \1_ID = \2; \3
435  -> add to UCharacter.UnicodeBlock objects
436    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
437            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
438
439    sc ; Chrs                             ; Chorasmian
440    sc ; Diak                             ; Dives_Akuru
441    sc ; Kits                             ; Khitan_Small_Script
442    sc ; Yezi                             ; Yezidi
443  -> uscript.h & com.ibm.icu.lang.UScript
444  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
445      and in com.ibm.icu.dev.test.lang.TestUScript.java
446
447    InPC; Top_And_Bottom_And_Left         ; Top_And_Bottom_And_Left
448  -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
449
450* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
451    (not strictly necessary for NOT_ENCODED scripts)
452  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
453
454* build ICU (make install)
455  to make sure that there are no syntax errors, and
456  so that the tools build can pick up the new definitions from the installed header files.
457
458  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
459
460* update spoof checker UnicodeSet initializers:
461    inclusionPat & recommendedPat in i18n/uspoof.cpp
462    INCLUSION & RECOMMENDED in SpoofChecker.java
463- make sure that the Unicode Tools tree contains the latest security data files
464- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
465- update the hardcoded version number there in the DIRECTORY path
466- run the tool (no special environment variables needed)
467- copy & paste from the Console output into the .cpp & .java files
468
469* generate normalization data files
470  cd $ICU_ROOT/dbg/icu4c
471  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
472  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
473  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
474  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
475  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
476
477* build ICU (make install)
478  so that the tools build can pick up the new definitions from the installed header files.
479
480  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
481
482* build Unicode tools using CMake+make
483
484$ICU_SRC/tools/unicode/c/icudefs.txt:
485
486# Location (--prefix) of where ICU was installed.
487set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
488# Location of the ICU4C source tree.
489set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
490
491  $ICU_ROOT/dbg$
492    mkdir -p tools/unicode/c
493    cd tools/unicode/c
494
495  $ICU_ROOT/dbg/tools/unicode/c$
496    cmake ../../../../src/tools/unicode/c
497    make
498
499* generate core properties data files
500  $ICU_ROOT/dbg/tools/unicode/c$
501    genprops/genprops $ICU_SRC/icu4c
502- tool failure:
503    genprops: Script_Extensions indexes overflow bit field
504    genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
505  -> uprops.icu data file format :
506     add two more bits to store a script code or Script_Extensions index
507  -> generator code, C++ & Java runtime, uprops.icu format version 7.7
508- rebuild ICU (make install) & tools
509
510* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
511  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
512- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
513- Unicode 6.0..13.0: U+2260, U+226E, U+226F
514- nothing new in this Unicode version, no test file to update
515
516* run & fix ICU4C tests
517- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
518- Andy helps with RBBI & spoof check test failures
519
520* collation: CLDR collation root, UCA DUCET
521
522- UCA DUCET goes into Mark's Unicode tools, see
523    https://sites.google.com/site/unicodetools/home#TOC-UCA
524  diff the main mapping file, look for bad changes
525  (for example, more bytes per weight for common characters)
526    ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
527    ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
528
529- CLDR root data files are checked into $CLDR_SRC/common/uca/
530    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
531
532- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
533    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
534- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
535    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
536    (note removing the underscore before "Rules")
537    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
538- restore TODO diffs in UCARules.txt
539    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
540- update (ICU4C)/source/test/testdata/CollationTest_*.txt
541  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
542  from the CLDR root files (..._CLDR_..._SHORT.txt)
543    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
544    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
545    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
546- if CLDR common/uca/unihan-index.txt changes, then update
547  CLDR common/collation/root.xml <collation type="private-unihan">
548  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
549
550- run genuca
551  $ICU_ROOT/dbg/tools/unicode/c$
552    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
553    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
554- rebuild ICU4C
555
556* Unihan collators
557    https://sites.google.com/site/unicodetools/unihan
558- run Unicode Tools
559    org.unicode.draft.GenerateUnihanCollators
560  with VM arguments
561    -ea
562    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
563    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
564    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
565    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
566    -DUVERSION=13.0.0
567- run Unicode Tools
568    org.unicode.draft.GenerateUnihanCollatorFiles
569  with the same arguments
570- check CLDR diffs
571    cd $CLDR_SRC
572    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
573    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
574- copy to CLDR
575    cd $CLDR_SRC
576    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
577    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
578- run CLDR unit tests, commit to CLDR
579- generate ICU zh collation data: run CLDR
580    org.unicode.cldr.icu.NewLdml2IcuConverter
581  with program arguments
582    -t collation
583    -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
584    -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
585    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
586    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
587    zh
588  and VM arguments
589    -ea
590    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
591- rebuild ICU4C
592
593* run & fix ICU4C tests, now with new CLDR collation root data
594- run all tests with the collation test data *_SHORT.txt or the full files
595  (the full ones have comments, useful for debugging)
596- note on intltest: if collate/UCAConformanceTest fails, then
597  utility/MultithreadTest/TestCollators will fail as well;
598  fix the conformance test before looking into the multi-thread test
599
600* update Java data files
601- refresh just the UCD/UCA-related/derived files, just to be safe
602- see (ICU4C)/source/data/icu4j-readme.txt
603- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
604- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
605  output:
606    ...
607    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
608    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
609    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
610    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
611    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
612    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
613    mkdir -p /tmp/icu4j/main/shared/data
614    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
615    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
616    mkdir -p /tmp/icu4j/main/shared/data
617    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
618    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
619- copy the big-endian Unicode data files to another location,
620  separate from the other data files,
621  and then refresh ICU4J
622    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
623    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
624    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
625    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
626    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
627    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
628    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
629    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
630    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
631    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
632
633* When refreshing all of ICU4J data from ICU4C
634- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
635- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
636or
637- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
638
639* update CollationFCD.java
640  + copy & paste the initializers of lcccIndex[] etc. from
641    ICU4C/source/i18n/collationfcd.cpp to
642    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
643
644* refresh Java test .txt files
645- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
646    cd $ICU_SRC/icu4c/source/data/unidata
647    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
648    cd ../../test/testdata
649    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
650    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
651
652* run & fix ICU4J tests
653
654*** API additions
655- send notice to icu-design about new born-@stable API (enum constants etc.)
656
657*** CLDR numbering systems
658- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
659  for example, look for
660    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
661    in new blocks (Blocks.txt)
662  Unicode 13:
663    diak 11950..11959 Dives_Akuru
664
665*** merge the Unicode update branches back onto the trunk
666- do not merge the icudata.jar and testdata.jar,
667  instead rebuild them from merged & tested ICU4C
668- make sure that changes to Unicode tools are checked in:
669  http://www.unicode.org/utility/trac/log/trunk/unicodetools
670
671---------------------------------------------------------------------------- ***
672
673Unicode 12.1 update for ICU 64.2
674
675** This is an abbreviated update with one new character for the new
676** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
677https://en.wikipedia.org/wiki/Reiwa_period
678
679http://www.unicode.org/versions/Unicode12.1.0/
680
681ICU-20497 Unicode 12.1
682
683cldrbug 11978: Unicode 12.1
684
685* Command-line environment setup
686
687UNICODE_DATA=~/unidata/uni121/20190403
688CLDR_SRC=~/svn.cldr/uni
689ICU_ROOT=~/icu/uni
690ICU_SRC=$ICU_ROOT/src
691ICUDT=icudt64b
692ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
693ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
694export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
695
696*** Unicode version numbers
697- makedata.mak
698- uchar.h
699- com.ibm.icu.util.VersionInfo
700- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
701
702- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
703    so that the makefiles see the new version number.
704  cd $ICU_ROOT/dbg/icu4c
705  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
706
707*** data files & enums & parser code
708
709* download files
710- mkdir -p $UNICODE_DATA
711- download Unicode files into $UNICODE_DATA
712  + subfolders: emoji, idna, security, ucd, uca
713  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
714
715* for manual diffs and for Unicode Tools input data updates:
716  remove version suffixes from the file names
717    ~$ unidata/desuffixucd.py $UNICODE_DATA
718  (see https://sites.google.com/site/unicodetools/inputdata)
719
720* process and/or copy files
721- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
722  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
723  + For debugging, and tweaking how ppucd.txt is written,
724    the tool has an --only_ppucd option:
725    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
726
727- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
728
729* build ICU (make install)
730  so that the tools build can pick up the new definitions from the installed header files.
731
732  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
733
734* update spoof checker UnicodeSet initializers:
735    inclusionPat & recommendedPat in uspoof.cpp
736    INCLUSION & RECOMMENDED in SpoofChecker.java
737- make sure that the Unicode Tools tree contains the latest security data files
738- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
739- update the hardcoded version number there in the DIRECTORY path
740- run the tool (no special environment variables needed)
741- copy & paste from the Console output into the .cpp & .java files
742
743* generate normalization data files
744  cd $ICU_ROOT/dbg/icu4c
745  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
746  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
747  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
748  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
749  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
750
751* build ICU (make install)
752  so that the tools build can pick up the new definitions from the installed header files.
753
754  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
755
756* build Unicode tools using CMake+make
757
758$ICU_SRC/tools/unicode/c/icudefs.txt:
759
760# Location (--prefix) of where ICU was installed.
761set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
762# Location of the ICU4C source tree.
763set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
764
765  $ICU_ROOT/dbg$
766    mkdir -p tools/unicode/c
767    cd tools/unicode/c
768
769  $ICU_ROOT/dbg/tools/unicode/c$
770    cmake ../../../../src/tools/unicode/c
771    make
772
773* generate core properties data files
774  $ICU_ROOT/dbg/tools/unicode/c$
775    genprops/genprops $ICU_SRC/icu4c
776    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
777    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
778- rebuild ICU (make install) & tools
779
780* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
781  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
782- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
783- Unicode 6.0..12.1: U+2260, U+226E, U+226F
784- nothing new in this Unicode version, no test file to update
785
786* run & fix ICU4C tests
787- Andy handles RBBI & spoof check test failures
788
789* collation: CLDR collation root, UCA DUCET
790
791- UCA DUCET goes into Mark's Unicode tools, see
792    https://sites.google.com/site/unicodetools/home#TOC-UCA
793  diff the main mapping file, look for bad changes
794  (for example, more bytes per weight for common characters)
795    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
796    ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
797
798- CLDR root data files are checked into $CLDR_SRC/common/uca/
799    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
800
801- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
802    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
803- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
804    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
805    (note removing the underscore before "Rules")
806    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
807- restore TODO diffs in UCARules.txt
808    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
809- update (ICU4C)/source/test/testdata/CollationTest_*.txt
810  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
811  from the CLDR root files (..._CLDR_..._SHORT.txt)
812    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
813    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
814    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
815- if CLDR common/uca/unihan-index.txt changes, then update
816  CLDR common/collation/root.xml <collation type="private-unihan">
817  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
818
819- run genuca, see command line above
820- rebuild ICU4C
821
822* Unihan collators
823    https://sites.google.com/site/unicodetools/unihan
824- run Unicode Tools
825    org.unicode.draft.GenerateUnihanCollators
826  with VM arguments
827    -ea
828    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
829    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
830    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
831    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
832    -DUVERSION=12.1.0
833- run Unicode Tools
834    org.unicode.draft.GenerateUnihanCollatorFiles
835  with the same arguments
836- check CLDR diffs
837    cd $CLDR_SRC
838    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
839    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
840- copy to CLDR
841    cd $CLDR_SRC
842    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
843    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
844- run CLDR unit tests, commit to CLDR
845- generate ICU zh collation data: run CLDR
846    org.unicode.cldr.icu.NewLdml2IcuConverter
847  with program arguments
848    -t collation
849    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
850    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
851    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
852    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
853    zh
854  and VM arguments
855    -ea
856    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
857- rebuild ICU4C
858
859* run & fix ICU4C tests, now with new CLDR collation root data
860- run all tests with the collation test data *_SHORT.txt or the full files
861  (the full ones have comments, useful for debugging)
862- note on intltest: if collate/UCAConformanceTest fails, then
863  utility/MultithreadTest/TestCollators will fail as well;
864  fix the conformance test before looking into the multi-thread test
865
866* update Java data files
867- refresh just the UCD/UCA-related/derived files, just to be safe
868- see (ICU4C)/source/data/icu4j-readme.txt
869- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
870- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
871  output:
872    ...
873    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
874    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
875    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
876    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
877    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
878    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
879    mkdir -p /tmp/icu4j/main/shared/data
880    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
881    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
882    mkdir -p /tmp/icu4j/main/shared/data
883    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
884    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
885- copy the big-endian Unicode data files to another location,
886  separate from the other data files,
887  and then refresh ICU4J
888    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
889    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
890    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
891    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
892    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
893    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
894    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
895    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
896    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
897    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
898
899* When refreshing all of ICU4J data from ICU4C
900- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
901- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
902or
903- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
904
905* update CollationFCD.java
906  + copy & paste the initializers of lcccIndex[] etc. from
907    ICU4C/source/i18n/collationfcd.cpp to
908    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
909
910* refresh Java test .txt files
911- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
912    cd $ICU_SRC/icu4c/source/data/unidata
913    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
914    cd ../../test/testdata
915    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
916    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
917
918* run & fix ICU4J tests
919
920*** API additions
921- send notice to icu-design about new born-@stable API (enum constants etc.)
922
923*** CLDR numbering systems
924- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
925  for example, look for
926    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
927    in new blocks (Blocks.txt)
928  Unicode 12: using Unicode 12 CLDR ticket #11478
929    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
930    wcho 1E2F0..1E2F9 Wancho
931  Unicode 11: using Unicode 11 CLDR ticket #10978
932    rohg 10D30..10D39 Hanifi_Rohingya
933    gong 11DA0..11DA9 Gunjala_Gondi
934  Earlier: CLDR tickets specific to adding new numbering systems.
935  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
936  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
937
938*** merge the Unicode update branches back onto the trunk
939- do not merge the icudata.jar and testdata.jar,
940  instead rebuild them from merged & tested ICU4C
941- make sure that changes to Unicode tools are checked in:
942  http://www.unicode.org/utility/trac/log/trunk/unicodetools
943
944---------------------------------------------------------------------------- ***
945
946Unicode 12.0 update for ICU 64
947
948http://www.unicode.org/versions/Unicode12.0.0/
949http://unicode.org/versions/beta-12.0.0.html
950https://www.unicode.org/review/pri389/
951http://www.unicode.org/reports/uax-proposed-updates.html
952http://www.unicode.org/reports/tr44/tr44-23.html
953
954ICU-20203 Unicode 12
955
956ICU-20111 move text layout properties data into a data file
957
958cldrbug 11478: Unicode 12
959Accidentally used ^/trunk instead of ^/branches/markus/uni12
960
961* Command-line environment setup
962
963UNICODE_DATA=~/unidata/uni12/20190309
964CLDR_SRC=~/svn.cldr/uni
965ICU_ROOT=~/icu/uni
966ICU_SRC=$ICU_ROOT/src
967ICUDT=icudt63b
968ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
969ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
970export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
971
972*** Unicode version numbers
973- makedata.mak
974- uchar.h
975- com.ibm.icu.util.VersionInfo
976- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
977
978- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
979  so that the makefiles see the new version number.
980
981*** data files & enums & parser code
982
983* download files
984- mkdir -p $UNICODE_DATA
985- download Unicode files into $UNICODE_DATA
986  + subfolders: emoji, idna, security, ucd, uca
987  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
988
989* for manual diffs and for Unicode Tools input data updates:
990  remove version suffixes from the file names
991    ~$ unidata/desuffixucd.py $UNICODE_DATA
992  (see https://sites.google.com/site/unicodetools/inputdata)
993
994* process and/or copy files
995- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
996  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
997  + For debugging, and tweaking how ppucd.txt is written,
998    the tool has an --only_ppucd option:
999    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1000
1001- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1002
1003* build ICU (make install)
1004  so that the tools build can pick up the new definitions from the installed header files.
1005
1006  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1007
1008* new constants for new property values
1009- preparseucd.py error:
1010    ValueError: missing uchar.h enum constants for some property values:
1011    [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
1012        u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
1013        u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
1014    (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
1015  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1016    blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
1017    blk; Elymaic                          ; Elymaic
1018    blk; Nandinagari                      ; Nandinagari
1019    blk; Nyiakeng_Puachue_Hmong           ; Nyiakeng_Puachue_Hmong
1020    blk; Ottoman_Siyaq_Numbers            ; Ottoman_Siyaq_Numbers
1021    blk; Small_Kana_Ext                   ; Small_Kana_Extension
1022    blk; Symbols_And_Pictographs_Ext_A    ; Symbols_And_Pictographs_Extended_A
1023    blk; Tamil_Sup                        ; Tamil_Supplement
1024    blk; Wancho                           ; Wancho
1025  -> add to uchar.h
1026    use long property names for enum constants,
1027    for the trailing comment get the block start code point: diff old & new Blocks.txt
1028  -> add to UCharacter.UnicodeBlock IDs
1029    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1030            replace  public static final int \1_ID = \2; \3
1031  -> add to UCharacter.UnicodeBlock objects
1032    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1033            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
1034
1035    sc ; Elym                             ; Elymaic
1036    sc ; Hmnp                             ; Nyiakeng_Puachue_Hmong
1037    sc ; Nand                             ; Nandinagari
1038    sc ; Wcho                             ; Wancho
1039  -> uscript.h & com.ibm.icu.lang.UScript
1040  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1041      and in com.ibm.icu.dev.test.lang.TestUScript.java
1042
1043* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1044    (not strictly necessary for NOT_ENCODED scripts)
1045  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1046
1047* update spoof checker UnicodeSet initializers:
1048    inclusionPat & recommendedPat in uspoof.cpp
1049    INCLUSION & RECOMMENDED in SpoofChecker.java
1050- make sure that the Unicode Tools tree contains the latest security data files
1051- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1052- update the hardcoded version number there in the DIRECTORY path
1053- run the tool (no special environment variables needed)
1054- copy & paste from the Console output into the .cpp & .java files
1055
1056* generate normalization data files
1057  cd $ICU_ROOT/dbg/icu4c
1058  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1059  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1060  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1061  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1062  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1063
1064* build ICU (make install)
1065  so that the tools build can pick up the new definitions from the installed header files.
1066
1067  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1068
1069* build Unicode tools using CMake+make
1070
1071$ICU_SRC/tools/unicode/c/icudefs.txt:
1072
1073# Location (--prefix) of where ICU was installed.
1074set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1075# Location of the ICU4C source tree.
1076set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
1077
1078  $ICU_ROOT/dbg$
1079    mkdir -p tools/unicode/c
1080    cd tools/unicode/c
1081
1082  $ICU_ROOT/dbg/tools/unicode/c$
1083    cmake ../../../../src/tools/unicode/c
1084    make
1085
1086* generate core properties data files
1087  $ICU_ROOT/dbg/tools/unicode/c$
1088    genprops/genprops $ICU_SRC/icu4c
1089    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
1090    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1091- rebuild ICU (make install) & tools
1092
1093* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1094  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1095- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1096- Unicode 6.0..12.0: U+2260, U+226E, U+226F
1097- nothing new in this Unicode version, no test file to update
1098
1099* run & fix ICU4C tests
1100- update test of default bidi classes:
1101  Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
1102  see diffs in DerivedBidiClass.txt
1103  + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
1104  + UCharacterTest.java TestIteration() defaultBidi[]
1105- Andy handles RBBI & spoof check test failures
1106
1107* collation: CLDR collation root, UCA DUCET
1108
1109- UCA DUCET goes into Mark's Unicode tools, see
1110    https://sites.google.com/site/unicodetools/home#TOC-UCA
1111  diff the main mapping file, look for bad changes
1112  (for example, more bytes per weight for common characters)
1113    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
1114    ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
1115
1116- CLDR root data files are checked into $CLDR_SRC/common/uca/
1117    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1118
1119- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1120    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1121- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1122    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1123    (note removing the underscore before "Rules")
1124    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1125- restore TODO diffs in UCARules.txt
1126    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1127- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1128  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1129  from the CLDR root files (..._CLDR_..._SHORT.txt)
1130    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1131    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1132    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1133- if CLDR common/uca/unihan-index.txt changes, then update
1134  CLDR common/collation/root.xml <collation type="private-unihan">
1135  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1136
1137- run genuca, see command line above;
1138  deal with
1139    Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1140    FDD1 119CE;	[71 CD 02, 05, 05]	# Nandinagari first primary (compressible)
1141        (add the character to genuca.cpp sampleCharsToScripts[])
1142  + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
1143    and cache its values.
1144    Works as long as the script metadata is updated before the collation data.
1145- rebuild ICU4C
1146
1147* Unihan collators
1148    https://sites.google.com/site/unicodetools/unihan
1149- run Unicode Tools
1150    org.unicode.draft.GenerateUnihanCollators
1151  with VM arguments
1152    -ea
1153    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1154    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1155    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1156    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1157    -DUVERSION=12.0.0
1158- run Unicode Tools
1159    org.unicode.draft.GenerateUnihanCollatorFiles
1160  with the same arguments
1161- check CLDR diffs
1162    cd $CLDR_SRC
1163    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1164    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1165- copy to CLDR
1166    cd $CLDR_SRC
1167    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1168    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1169- run CLDR unit tests, commit to CLDR
1170- generate ICU zh collation data: run CLDR
1171    org.unicode.cldr.icu.NewLdml2IcuConverter
1172  with program arguments
1173    -t collation
1174    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1175    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1176    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
1177    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
1178    zh
1179  and VM arguments
1180    -ea
1181    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1182- rebuild ICU4C
1183
1184* run & fix ICU4C tests, now with new CLDR collation root data
1185- run all tests with the collation test data *_SHORT.txt or the full files
1186  (the full ones have comments, useful for debugging)
1187- note on intltest: if collate/UCAConformanceTest fails, then
1188  utility/MultithreadTest/TestCollators will fail as well;
1189  fix the conformance test before looking into the multi-thread test
1190
1191* update Java data files
1192- refresh just the UCD/UCA-related/derived files, just to be safe
1193- see (ICU4C)/source/data/icu4j-readme.txt
1194- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1195- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1196  output:
1197    ...
1198    Unicode .icu files built to ./out/build/icudt63l
1199    echo timestamp > uni-core-data
1200    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
1201    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
1202    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1203    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
1204    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
1205    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
1206    mkdir -p /tmp/icu4j/main/shared/data
1207    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1208    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
1209    mkdir -p /tmp/icu4j/main/shared/data
1210    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1211    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1212- copy the big-endian Unicode data files to another location,
1213  separate from the other data files,
1214  and then refresh ICU4J
1215    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1216    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1217    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1218    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1219    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1220    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1221    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1222    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1223    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1224    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1225
1226* When refreshing all of ICU4J data from ICU4C
1227- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1228- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1229or
1230- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1231
1232* update CollationFCD.java
1233  + copy & paste the initializers of lcccIndex[] etc. from
1234    ICU4C/source/i18n/collationfcd.cpp to
1235    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1236
1237* refresh Java test .txt files
1238- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1239    cd $ICU_SRC/icu4c/source/data/unidata
1240    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1241    cd ../../test/testdata
1242    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1243    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1244
1245* run & fix ICU4J tests
1246
1247*** API additions
1248- send notice to icu-design about new born-@stable API (enum constants etc.)
1249
1250*** CLDR numbering systems
1251- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1252  for example, look for
1253    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1254    in new blocks (Blocks.txt)
1255  Unicode 12: using Unicode 12 CLDR ticket #11478
1256    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
1257    wcho 1E2F0..1E2F9 Wancho
1258  Unicode 11: using Unicode 11 CLDR ticket #10978
1259    rohg 10D30..10D39 Hanifi_Rohingya
1260    gong 11DA0..11DA9 Gunjala_Gondi
1261  Earlier: CLDR tickets specific to adding new numbering systems.
1262  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1263  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1264
1265*** merge the Unicode update branches back onto the trunk
1266- do not merge the icudata.jar and testdata.jar,
1267  instead rebuild them from merged & tested ICU4C
1268- make sure that changes to Unicode tools are checked in:
1269  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1270
1271---------------------------------------------------------------------------- ***
1272
1273ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
1274
1275* Command-line environment setup
1276
1277UNICODE_DATA=~/unidata/uni11/20180609
1278CLDR_SRC=~/svn.cldr/uni
1279ICU_ROOT=~/icu/mine
1280ICU_SRC=$ICU_ROOT/src
1281ICUDT=icudt62b
1282ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1283ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1284export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1285
1286*** Links
1287
1288https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
1289https://unicode-org.atlassian.net/browse/ICU-12850 vo
1290
1291*** data files & enums & parser code
1292
1293* API additions
1294- for each of the three new enumerated properties
1295  + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
1296  + uchar.h: update UCHAR_INT_LIMIT
1297  + uchar.h: add the enum U<long prop name>
1298    with constants U_<short prop name>_<long value name>
1299  + UProperty.java: add the constant <long prop name>
1300  + UProperty.java: update INT_LIMIT
1301  + UCharacter.java: add the interface <long prop name>
1302    with constants <long value name>
1303
1304* process and/or copy files
1305- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1306  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1307  + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
1308    names and aliases.
1309  + For debugging, and tweaking how ppucd.txt is written,
1310    the tool has an --only_ppucd option:
1311    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1312
1313* preparseucd.py changes
1314- add new property short names (uppercase) to _prop_and_value_re
1315  so that ParseUCharHeader() parses the new enum constants
1316
1317* build ICU (make install)
1318  so that the tools build can pick up the new definitions from the installed header files.
1319
1320  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1321
1322* build Unicode tools using CMake+make
1323
1324$ICU_SRC/tools/unicode/c/icudefs.txt:
1325
1326# Location (--prefix) of where ICU was installed.
1327set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1328# Location of the ICU4C source tree.
1329set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
1330
1331  $ICU_ROOT/dbg$
1332    mkdir -p tools/unicode/c
1333    cd tools/unicode/c
1334
1335  $ICU_ROOT/dbg/tools/unicode/c$
1336    cmake ../../../../../src/tools/unicode/c
1337    make
1338
1339* generate core properties data files
1340  $ICU_ROOT/dbg/tools/unicode/c$
1341    genprops/genprops $ICU_SRC/icu4c
1342- rebuild ICU (make install) & tools
1343
1344* write data for runtime, hardcoded for now
1345- add genprops/layoutpropsbuilder.cpp with pieces from sibling files
1346- generate new icu4c/source/common/ulayout_props_data.h
1347- for each of the three new enumerated properties
1348  + int property max value
1349  + small, 8-bit UCPTrie
1350    (A small 16-bit trie with bit fields for these three properties
1351    is very nearly the same size as the sum of the three.)
1352
1353* wire into C++
1354- uprops.cpp: #include ulayout_props_data.h
1355- uprops.cpp: add getInPC() etc. functions
1356- uprops.cpp: add lines to intProps[], include max values
1357- uprops.h: add UPropertySource constants
1358- uprops.cpp: add uprops_addPropertyStarts(src)
1359- uniset_props.cpp: add to UnicodeSet_initInclusion()
1360- intltest/ucdtest.cpp: write unit tests
1361
1362* update Java data files
1363- refresh just the pnames.icu file with the new property [value] names, just to be safe
1364- see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
1365- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1366- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1367- copy the big-endian Unicode data files to another location,
1368  separate from the other data files,
1369  and then refresh ICU4J
1370    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1371    cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1372    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1373
1374* wire into Java
1375- UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
1376- UCharacterProperty.java: for each new property
1377  + create a nested class to hold its CodePointTrie
1378  + initialize it from a string literal
1379  + paste in the initializer printed by genprops
1380  + add a new IntProperty object to the intProps[] array
1381  + use the correct max int value for each property, also printed by genprops
1382- UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
1383- UnicodeSet.java: add to getInclusions()
1384- UCharacterTest.java: write unit tests
1385
1386---------------------------------------------------------------------------- ***
1387
1388Unicode 11.0 update for ICU 62
1389
1390http://www.unicode.org/versions/Unicode11.0.0/
1391http://unicode.org/versions/beta-11.0.0.html
1392https://www.unicode.org/review/pri372/
1393http://www.unicode.org/reports/uax-proposed-updates.html
1394http://www.unicode.org/reports/tr44/tr44-21.html
1395
1396* Command-line environment setup
1397
1398UNICODE_DATA=~/unidata/uni11/20180521
1399CLDR_SRC=~/svn.cldr/uni
1400ICU_ROOT=~/svn.icu/uni
1401ICU_SRC=$ICU_ROOT/src
1402ICUDT=icudt61b
1403ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1404ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1405export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1406
1407*** ICU Trac
1408
1409- ticket:13630: Unicode 11
1410- ^/branches/markus/uni11
1411
1412*** CLDR Trac
1413
1414- cldrbug 10978: Unicode 11
1415- ^/branches/markus/uni11
1416
1417*** Unicode version numbers
1418- makedata.mak
1419- uchar.h
1420- com.ibm.icu.util.VersionInfo
1421- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1422
1423- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1424  so that the makefiles see the new version number.
1425
1426*** data files & enums & parser code
1427
1428* download files
1429- mkdir -p $UNICODE_DATA
1430- download Unicode files into $UNICODE_DATA
1431  + subfolders: emoji, idna, security, ucd, uca
1432  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1433
1434* for manual diffs and for Unicode Tools input data updates:
1435  remove version suffixes from the file names
1436    ~$ unidata/desuffixucd.py $UNICODE_DATA
1437  (see https://sites.google.com/site/unicodetools/inputdata)
1438
1439* process and/or copy files
1440- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1441  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1442  + For debugging, and tweaking how ppucd.txt is written,
1443    the tool has an --only_ppucd option:
1444    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1445
1446- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1447
1448* build ICU (make install)
1449  so that the tools build can pick up the new definitions from the installed header files.
1450
1451  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1452
1453* preparseucd.py changes
1454- fix other errors
1455    NameError: unknown property Extended_Pictographic
1456  -> add Extended_Pictographic binary property
1457  -> add new short names for all Emoji properties
1458
1459* new constants for new property values
1460- preparseucd.py error:
1461    ValueError: missing uchar.h enum constants for some property values:
1462    [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
1463                   u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
1464                   u'Indic_Siyaq_Numbers'])),
1465     (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
1466     (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
1467     (u'GCB', set([u'LinkC', u'Virama'])),
1468     (u'WB', set([u'WSegSpace']))]
1469  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1470    blk; Chess_Symbols                    ; Chess_Symbols
1471    blk; Dogra                            ; Dogra
1472    blk; Georgian_Ext                     ; Georgian_Extended
1473    blk; Gunjala_Gondi                    ; Gunjala_Gondi
1474    blk; Hanifi_Rohingya                  ; Hanifi_Rohingya
1475    blk; Indic_Siyaq_Numbers              ; Indic_Siyaq_Numbers
1476    blk; Makasar                          ; Makasar
1477    blk; Mayan_Numerals                   ; Mayan_Numerals
1478    blk; Medefaidrin                      ; Medefaidrin
1479    blk; Old_Sogdian                      ; Old_Sogdian
1480    blk; Sogdian                          ; Sogdian
1481  -> add to uchar.h
1482    use long property names for enum constants,
1483    for the trailing comment get the block start code point: diff old & new Blocks.txt
1484  -> add to UCharacter.UnicodeBlock IDs
1485    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1486            replace  public static final int \1_ID = \2; \3
1487  -> add to UCharacter.UnicodeBlock objects
1488    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1489            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1490
1491    GCB; LinkC                            ; LinkingConsonant
1492    GCB; Virama                           ; Virama
1493  -> uchar.h & UCharacter.GraphemeClusterBreak
1494  -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
1495
1496    InSC; Consonant_Initial_Postfixed     ; Consonant_Initial_Postfixed
1497  -> ignore: ICU does not yet support this property
1498
1499    jg ; Hanifi_Rohingya_Kinna_Ya         ; Hanifi_Rohingya_Kinna_Ya
1500    jg ; Hanifi_Rohingya_Pa               ; Hanifi_Rohingya_Pa
1501  -> uchar.h & UCharacter.JoiningGroup
1502
1503    sc ; Dogr                             ; Dogra
1504    sc ; Gong                             ; Gunjala_Gondi
1505    sc ; Maka                             ; Makasar
1506    sc ; Medf                             ; Medefaidrin
1507    sc ; Rohg                             ; Hanifi_Rohingya
1508    sc ; Sogd                             ; Sogdian
1509    sc ; Sogo                             ; Old_Sogdian
1510  -> uscript.h & com.ibm.icu.lang.UScript
1511  -> Nushu had been added already
1512  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1513      and in com.ibm.icu.dev.test.lang.TestUScript.java
1514
1515    WB ; WSegSpace                        ; WSegSpace
1516  -> uchar.h & UCharacter.WordBreak
1517
1518* New short names for emoji properties
1519- see UTS #51
1520- short names set in preparseucd.py
1521
1522* New properties
1523- boolean emoji property Extended_Pictographic
1524  -> added in preparseucd.py
1525  -> uchar.h & UProperty.java
1526- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
1527  as shown in PropertyValueAliases.txt
1528  -> ignore for now
1529
1530* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1531    (not strictly necessary for NOT_ENCODED scripts)
1532  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1533
1534* update spoof checker UnicodeSet initializers:
1535    inclusionPat & recommendedPat in uspoof.cpp
1536    INCLUSION & RECOMMENDED in SpoofChecker.java
1537- make sure that the Unicode Tools tree contains the latest security data files
1538- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1539- update the hardcoded version number there in the DIRECTORY path
1540- run the tool (no special environment variables needed)
1541- copy & paste from the Console output into the .cpp & .java files
1542
1543* generate normalization data files
1544  cd $ICU_ROOT/dbg/icu4c
1545  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1546  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1547  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1548  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1549  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1550
1551* build ICU (make install)
1552  so that the tools build can pick up the new definitions from the installed header files.
1553
1554  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1555
1556* build Unicode tools using CMake+make
1557
1558$ICU_SRC/tools/unicode/c/icudefs.txt:
1559
1560# Location (--prefix) of where ICU was installed.
1561set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1562# Location of the ICU4C source tree.
1563set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
1564
1565  $ICU_ROOT/dbg$
1566    mkdir -p tools/unicode/c
1567    cd tools/unicode/c
1568
1569  $ICU_ROOT/dbg/tools/unicode/c$
1570    cmake ../../../../src/tools/unicode/c
1571    make
1572
1573* generate core properties data files
1574  $ICU_ROOT/dbg/tools/unicode/c$
1575    genprops/genprops $ICU_SRC/icu4c
1576    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1577    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1578- rebuild ICU (make install) & tools
1579
1580* Fix case props
1581    genprops error: casepropsbuilder: too many exceptions words
1582    genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
1583- With the addition of Georgian Mtavruli capital letters,
1584  there are now too many simple case mappings with big mapping deltas
1585  that yield uncompressible exceptions.
1586- Changing the data structure (now formatVersion 4),
1587  adding one bit for no-simple-case-folding (for Cherokee), and
1588  one optional slot for a big delta (for most faraway mappings),
1589  together with another bit for whether that is negative.
1590  This makes most Cherokee & Georgian etc. case mappings compressible,
1591  reducing the number of exceptions words.
1592- Further changes to gain one more bit for the exceptions index,
1593  for future growth. Details see casepropsbuilder.cpp.
1594
1595* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1596  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1597- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1598- Unicode 6.0..11.0: U+2260, U+226E, U+226F
1599- nothing new in this Unicode version, no test file to update
1600
1601* run & fix ICU4C tests
1602- Andy handles RBBI & spoof check test failures
1603
1604- Errors in char.txt, word.txt, word_POSIX.txt like
1605    createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET"  at line 46, column 16
1606  because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
1607  -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
1608     not empty, just to get ICU building.
1609  -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
1610     and properties together with the rules that used them (GB 10, WB 14).
1611  -> Andy adjusts the rule sets further to sync with
1612     Unicode 11 grapheme, word, and line break spec changes.
1613
1614* collation: CLDR collation root, UCA DUCET
1615
1616- UCA DUCET goes into Mark's Unicode tools, see
1617    https://sites.google.com/site/unicodetools/home#TOC-UCA
1618  diff the main mapping file, look for bad changes
1619  (for example, more bytes per weight for common characters)
1620    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
1621    ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
1622
1623- CLDR root data files are checked into $CLDR_SRC/common/uca/
1624    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1625
1626- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1627    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1628- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1629    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1630    (note removing the underscore before "Rules")
1631    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1632- restore TODO diffs in UCARules.txt
1633    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1634- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1635  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1636  from the CLDR root files (..._CLDR_..._SHORT.txt)
1637    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1638    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1639    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1640- if CLDR common/uca/unihan-index.txt changes, then update
1641  CLDR common/collation/root.xml <collation type="private-unihan">
1642  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1643
1644- run genuca, see command line above;
1645  deal with
1646    Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1647    FDD1 1180B;	[71 CC 02, 05, 05]	# Dogra first primary (compressible)
1648        (add the character to genuca.cpp sampleCharsToScripts[])
1649  + look up the USCRIPT_ code for the new sample characters
1650    (should be obvious from the comment in the error output)
1651  + *add* mappings to sampleCharsToScripts[], do not replace them
1652    (in case the script sample characters flip-flop)
1653  + insert new scripts in DUCET script order, see the top_byte table
1654    at the beginning of FractionalUCA.txt
1655- rebuild ICU4C
1656
1657* Unihan collators
1658    https://sites.google.com/site/unicodetools/unihan
1659- run Unicode Tools
1660    org.unicode.draft.GenerateUnihanCollators
1661  with VM arguments
1662    -ea
1663    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1664    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1665    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1666    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1667    -DUVERSION=11.0.0
1668- run Unicode Tools
1669    org.unicode.draft.GenerateUnihanCollatorFiles
1670  with the same arguments
1671- check CLDR diffs
1672    cd $CLDR_SRC
1673    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1674    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1675- copy to CLDR
1676    cd $CLDR_SRC
1677    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1678    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1679- run CLDR unit tests, commit to CLDR
1680- generate ICU zh collation data: run CLDR
1681    org.unicode.cldr.icu.NewLdml2IcuConverter
1682  with program arguments
1683    -t collation
1684    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1685    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1686    -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
1687    -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
1688    zh
1689  and VM arguments
1690    -ea
1691    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1692- rebuild ICU4C
1693
1694* run & fix ICU4C tests, now with new CLDR collation root data
1695- run all tests with the collation test data *_SHORT.txt or the full files
1696  (the full ones have comments, useful for debugging)
1697- note on intltest: if collate/UCAConformanceTest fails, then
1698  utility/MultithreadTest/TestCollators will fail as well;
1699  fix the conformance test before looking into the multi-thread test
1700
1701* update Java data files
1702- refresh just the UCD/UCA-related/derived files, just to be safe
1703- see (ICU4C)/source/data/icu4j-readme.txt
1704- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1705- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1706  output:
1707    ...
1708    Unicode .icu files built to ./out/build/icudt61l
1709    echo timestamp > uni-core-data
1710    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1711    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
1712    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1713    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1714    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
1715    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
1716    mkdir -p /tmp/icu4j/main/shared/data
1717    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1718    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
1719    mkdir -p /tmp/icu4j/main/shared/data
1720    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1721    make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
1722- copy the big-endian Unicode data files to another location,
1723  separate from the other data files,
1724  and then refresh ICU4J
1725    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1726    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1727    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1728    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1729    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1730    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1731    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1732    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1733    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1734    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1735
1736* When refreshing all of ICU4J data from ICU4C
1737- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1738- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1739or
1740- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1741
1742* update CollationFCD.java
1743  + copy & paste the initializers of lcccIndex[] etc. from
1744    ICU4C/source/i18n/collationfcd.cpp to
1745    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1746
1747* refresh Java test .txt files
1748- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1749    cd $ICU_SRC/icu4c/source/data/unidata
1750    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1751    cd ../../test/testdata
1752    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1753    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1754
1755* run & fix ICU4J tests
1756
1757*** API additions
1758- send notice to icu-design about new born-@stable API (enum constants etc.)
1759
1760*** CLDR numbering systems
1761- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1762  Unicode 11: using Unicode 11 CLDR ticket #10978
1763    rohg 10D30..10D39 Hanifi_Rohingya
1764    gong 11DA0..11DA9 Gunjala_Gondi
1765  Earlier: CLDR tickets specific to adding new numbering systems.
1766  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1767  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1768
1769*** merge the Unicode update branches back onto the trunk
1770- do not merge the icudata.jar and testdata.jar,
1771  instead rebuild them from merged & tested ICU4C
1772- make sure that changes to Unicode tools are checked in:
1773  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1774
1775---------------------------------------------------------------------------- ***
1776
1777Unicode 10.0 update for ICU 60
1778
1779http://www.unicode.org/versions/Unicode10.0.0/
1780http://www.unicode.org/versions/beta-10.0.0.html
1781http://blog.unicode.org/2017/03/unicode-100-beta-review.html
1782http://www.unicode.org/review/pri350/
1783http://www.unicode.org/reports/uax-proposed-updates.html
1784http://www.unicode.org/reports/tr44/tr44-19.html
1785
1786* Command-line environment setup
1787
1788UNICODE_DATA=~/unidata/uni10/20170605
1789CLDR_SRC=~/svn.cldr/uni10
1790ICU_ROOT=~/svn.icu/uni10
1791ICU_SRC=$ICU_ROOT/src
1792ICUDT=icudt60b
1793ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1794ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1795export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1796
1797*** ICU Trac
1798
1799- ticket:12985: Unicode 10
1800- ticket:13061: undo hacks from emoji 5.0 update
1801- ticket:13062: add Emoji_Component property
1802- ^/branches/markus/uni10
1803
1804*** CLDR Trac
1805
1806- cldrbug 10055: Unicode 10
1807- cldrbug 9882: Unicode 10 script metadata
1808- cldrbug 10219: numbering systems for Unicode 10
1809
1810*** Unicode version numbers
1811- makedata.mak
1812- uchar.h
1813- com.ibm.icu.util.VersionInfo
1814- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1815
1816- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1817  so that the makefiles see the new version number.
1818
1819*** data files & enums & parser code
1820
1821* download files
1822- mkdir -p $UNICODE_DATA
1823- download Unicode 10.0 files into $UNICODE_DATA
1824  + subfolders: ucd, uca, idna, security
1825  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1826- download emoji 5.0 files into $UNICODE_DATA/emoji
1827
1828* for manual diffs: remove version suffixes from the file names
1829  ~$ unidata/desuffixucd.py $UNICODE_DATA
1830  (see https://sites.google.com/site/unicodetools/inputdata)
1831
1832* process and/or copy files
1833- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1834  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1835  + For debugging, and tweaking how ppucd.txt is written,
1836    the tool has an --only_ppucd option:
1837    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1838
1839- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1840
1841* build ICU (make install)
1842  so that the tools build can pick up the new definitions from the installed header files.
1843
1844  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1845
1846* preparseucd.py changes
1847- remove or add new Unicode scripts from/to the
1848  only-in-ISO-15924 list according to the error messages:
1849    ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
1850  -> adjust _scripts_only_in_iso15924 as indicated
1851- fix other errors
1852    Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
1853  -> add vo=Vertical_Orientation to _ignored_properties
1854  -> later removed again, parsing the file, even though we do not yet store data for runtime use
1855
1856* new constants for new property values
1857- preparseucd.py error:
1858    ValueError: missing uchar.h enum constants for some property values:
1859    [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
1860                   u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
1861     (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
1862                  u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
1863                  u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
1864     (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
1865  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1866    blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
1867    blk; Kana_Ext_A                       ; Kana_Extended_A
1868    blk; Masaram_Gondi                    ; Masaram_Gondi
1869    blk; Nushu                            ; Nushu
1870    blk; Soyombo                          ; Soyombo
1871    blk; Syriac_Sup                       ; Syriac_Supplement
1872    blk; Zanabazar_Square                 ; Zanabazar_Square
1873  -> add to uchar.h
1874    use long property names for enum constants,
1875    for the trailing comment get the block start code point: diff old & new Blocks.txt
1876  -> add to UCharacter.UnicodeBlock IDs
1877    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1878            replace  public static final int \1_ID = \2; \3
1879  -> add to UCharacter.UnicodeBlock objects
1880    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1881            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1882
1883    jg ; Malayalam_Bha                    ; Malayalam_Bha
1884    jg ; Malayalam_Ja                     ; Malayalam_Ja
1885    jg ; Malayalam_Lla                    ; Malayalam_Lla
1886    jg ; Malayalam_Llla                   ; Malayalam_Llla
1887    jg ; Malayalam_Nga                    ; Malayalam_Nga
1888    jg ; Malayalam_Nna                    ; Malayalam_Nna
1889    jg ; Malayalam_Nnna                   ; Malayalam_Nnna
1890    jg ; Malayalam_Nya                    ; Malayalam_Nya
1891    jg ; Malayalam_Ra                     ; Malayalam_Ra
1892    jg ; Malayalam_Ssa                    ; Malayalam_Ssa
1893    jg ; Malayalam_Tta                    ; Malayalam_Tta
1894  -> uchar.h & UCharacter.JoiningGroup
1895
1896    sc ; Gonm                             ; Masaram_Gondi
1897    sc ; Nshu                             ; Nushu
1898    sc ; Soyo                             ; Soyombo
1899    sc ; Zanb                             ; Zanabazar_Square
1900  -> uscript.h & com.ibm.icu.lang.UScript
1901  -> Nushu had been added already
1902  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1903      and in com.ibm.icu.dev.test.lang.TestUScript.java
1904
1905* New properties as shown in PropertyValueAliases.txt changes
1906- boolean Emoji_Component from emoji 5
1907  -> uchar.h & UProperty.java
1908- boolean
1909    # Regional_Indicator (RI)
1910
1911    RI ; N                                ; No                               ; F                                ; False
1912    RI ; Y                                ; Yes                              ; T                                ; True
1913  -> uchar.h & UProperty.java
1914  -> single immutable range, to be hardcoded
1915- boolean
1916    # Prepended_Concatenation_Mark (PCM)
1917
1918    PCM; N                                ; No                               ; F                                ; False
1919    PCM; Y                                ; Yes                              ; T                                ; True
1920  -> was new in Unicode 9
1921  -> uchar.h & UProperty.java
1922- enumerated
1923    # Vertical_Orientation (vo)
1924
1925    vo ; R                                ; Rotated
1926    vo ; Tr                               ; Transformed_Rotated
1927    vo ; Tu                               ; Transformed_Upright
1928    vo ; U                                ; Upright
1929  -> only pre-parsed for now, but not yet stored for runtime use
1930
1931* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1932    (not strictly necessary for NOT_ENCODED scripts)
1933  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1934
1935* generate normalization data files
1936  cd $ICU_ROOT/dbg/icu4c
1937  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1938  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1939  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1940  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1941  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1942
1943* build ICU (make install)
1944  so that the tools build can pick up the new definitions from the installed header files.
1945
1946  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1947
1948* build Unicode tools using CMake+make
1949
1950$ICU_SRC/tools/unicode/c/icudefs.txt:
1951
1952# Location (--prefix) of where ICU was installed.
1953set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1954# Location of the ICU4C source tree.
1955set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
1956
1957  $ICU_ROOT/dbg/tools/unicode/c$
1958    cmake ../../../../src/tools/unicode/c
1959    make
1960
1961* generate core properties data files
1962  $ICU_ROOT/dbg/tools/unicode/c$
1963    genprops/genprops $ICU_SRC/icu4c
1964    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1965    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1966- rebuild ICU (make install) & tools
1967
1968* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1969  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1970- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1971- Unicode 6.0..10.0: U+2260, U+226E, U+226F
1972- nothing new in this Unicode version, no test file to update
1973
1974* run & fix ICU4C tests
1975- Andy handles RBBI & spoof check test failures
1976
1977* collation: CLDR collation root, UCA DUCET
1978
1979- UCA DUCET goes into Mark's Unicode tools, see
1980  https://sites.google.com/site/unicodetools/home#TOC-UCA
1981- CLDR root data files are checked into $CLDR_SRC/common/uca/
1982    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1983
1984- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1985    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1986- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1987    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1988    (note removing the underscore before "Rules")
1989    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1990- restore TODO diffs in UCARules.txt
1991    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1992- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1993  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1994  from the CLDR root files (..._CLDR_..._SHORT.txt)
1995    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1996    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1997    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1998- if CLDR common/uca/unihan-index.txt changes, then update
1999  CLDR common/collation/root.xml <collation type="private-unihan">
2000  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
2001
2002- run genuca, see command line above;
2003  deal with
2004    Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
2005    FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
2006        (add the character to genuca.cpp sampleCharsToScripts[])
2007  + look up the USCRIPT_ code for the new sample characters
2008    (should be obvious from the comment in the error output)
2009  + *add* mappings to sampleCharsToScripts[], do not replace them
2010    (in case the script sample characters flip-flop)
2011  + insert new scripts in DUCET script order, see the top_byte table
2012    at the beginning of FractionalUCA.txt
2013- rebuild ICU4C
2014
2015* Unihan collators
2016    https://sites.google.com/site/unicodetools/unihan
2017- run Unicode Tools
2018    org.unicode.draft.GenerateUnihanCollators
2019  with VM arguments
2020    -ea
2021    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
2022    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
2023    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
2024    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
2025    -DUVERSION=10.0.0
2026- run Unicode Tools
2027    org.unicode.draft.GenerateUnihanCollatorFiles
2028  with the same arguments
2029- check CLDR diffs
2030    cd $CLDR_SRC
2031    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2032    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2033- copy to CLDR
2034    cd $CLDR_SRC
2035    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2036    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2037- run CLDR unit tests, commit to CLDR
2038- generate ICU zh collation data: run CLDR
2039    org.unicode.cldr.icu.NewLdml2IcuConverter
2040  with program arguments
2041    -t collation
2042    -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
2043    -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
2044    -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
2045    -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
2046    zh
2047  and VM arguments
2048    -ea
2049    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
2050- rebuild ICU4C
2051
2052* run & fix ICU4C tests, now with new CLDR collation root data
2053- run all tests with the collation test data *_SHORT.txt or the full files
2054  (the full ones have comments, useful for debugging)
2055- note on intltest: if collate/UCAConformanceTest fails, then
2056  utility/MultithreadTest/TestCollators will fail as well;
2057  fix the conformance test before looking into the multi-thread test
2058
2059* update Java data files
2060- refresh just the UCD/UCA-related/derived files, just to be safe
2061- see (ICU4C)/source/data/icu4j-readme.txt
2062- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2063- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2064  output:
2065    ...
2066    Unicode .icu files built to ./out/build/icudt60l
2067    echo timestamp > uni-core-data
2068    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
2069    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
2070    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2071    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
2072    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
2073    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
2074    mkdir -p /tmp/icu4j/main/shared/data
2075    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2076    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
2077    mkdir -p /tmp/icu4j/main/shared/data
2078    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2079    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
2080- copy the big-endian Unicode data files to another location,
2081  separate from the other data files,
2082  and then refresh ICU4J
2083    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
2084    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2085    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2086    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2087    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2088    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2089    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2090    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2091    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2092    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2093
2094* When refreshing all of ICU4J data from ICU4C
2095- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2096- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
2097or
2098- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
2099
2100* update CollationFCD.java
2101  + copy & paste the initializers of lcccIndex[] etc. from
2102    ICU4C/source/i18n/collationfcd.cpp to
2103    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2104
2105* refresh Java test .txt files
2106- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2107    cd $ICU_SRC/icu4c/source/data/unidata
2108    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2109    cd ../../test/testdata
2110    cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2111    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2112
2113* run & fix ICU4J tests
2114
2115*** API additions
2116- send notice to icu-design about new born-@stable API (enum constants etc.)
2117
2118*** CLDR numbering systems
2119- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
2120  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
2121  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
2122
2123*** merge the Unicode update branches back onto the trunk
2124- do not merge the icudata.jar and testdata.jar,
2125  instead rebuild them from merged & tested ICU4C
2126- make sure that changes to Unicode tools are checked in:
2127  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2128
2129---------------------------------------------------------------------------- ***
2130
2131Emoji 5.0 update for ICU 59
2132- ICU 59 mostly remains on Unicode 9.0
2133- except updates bidi and segmentation data to Unicode 10 beta
2134
2135First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
2136
2137* Command-line environment setup
2138
2139ICU_ROOT=~/svn.icu/trunk
2140ICU_SRC_DIR=$ICU_ROOT/src
2141ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
2142ICUDT=icudt59b
2143export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2144SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
2145UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
2146
2147*** ICU Trac
2148
2149- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
2150- changes directly on trunk
2151
2152*** data files & enums & parser code
2153
2154* download files
2155
2156- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
2157- download emoji 5.0 beta files into the same uni90e50 folder
2158- download Unicode 10.0 beta files: ucd
2159  + copy Unicode 10 bidi files to the uni90e50/ucd folder:
2160    BidiBrackets.txt
2161    BidiCharacterTest.txt
2162    BidiMirroring.txt
2163    BidiTest.txt
2164    extracted/DerivedBidiClass.txt
2165  + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
2166    LineBreak.txt
2167    auxiliary/*
2168
2169* preparseucd.py changes
2170- adjust for combined trunks
2171- write new copyright lines
2172- ignore new Emoji_Component property for now
2173
2174* process and/or copy files
2175- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
2176  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2177
2178- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
2179
2180* build ICU (make install)
2181  so that the tools build can pick up the new definitions from the installed header files.
2182
2183  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2184
2185* build Unicode tools using CMake+make
2186
2187~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
2188
2189# Location (--prefix) of where ICU was installed.
2190set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
2191# Location of the ICU4C source tree.
2192set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
2193
2194  ~/svn.icu/trunk/dbg/tools/unicode/c$
2195    cmake ../../../../src/tools/unicode/c
2196    make
2197
2198* generate core properties data files
2199  ~/svn.icu/trunk/dbg/tools/unicode/c$
2200    genprops/genprops $ICU4C_SRC_DIR
2201- rebuild ICU (make install) & tools
2202
2203* run & fix ICU4C tests
2204- Andy handles RBBI & spoof check test failures
2205
2206* update Java data files
2207- refresh just the UCD/UCA-related/derived files, just to be safe
2208- see (ICU4C)/source/data/icu4j-readme.txt
2209- mkdir /tmp/icu4j
2210- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2211  output:
2212    ...
2213    Unicode .icu files built to ./out/build/icudt59l
2214    echo timestamp > uni-core-data
2215    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
2216    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
2217    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2218    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
2219    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
2220    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
2221    mkdir -p /tmp/icu4j/main/shared/data
2222    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2223    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
2224    mkdir -p /tmp/icu4j/main/shared/data
2225    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2226    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
2227- copy the big-endian Unicode data files to another location,
2228  separate from the other data files,
2229  and then refresh ICU4J
2230    cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
2231    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2232    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2233    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2234    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2235    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2236    jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2237
2238* When refreshing all of ICU4J data from ICU4C
2239- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2240- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
2241or
2242- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
2243
2244* refresh Java test .txt files
2245- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2246    cd $ICU4C_SRC_DIR/source/data/unidata
2247    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2248    cd ../../test/testdata
2249    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2250    cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2251
2252* run & fix ICU4J tests
2253
2254---------------------------------------------------------------------------- ***
2255
2256Unicode 9.0 update for ICU 58
2257
2258* Command-line environment setup
2259
2260ICU_ROOT=~/svn.icu/trunk
2261ICU_SRC_DIR=$ICU_ROOT/src
2262ICUDT=icudt58b
2263export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2264SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2265UNIDATA=$ICU_SRC_DIR/source/data/unidata
2266
2267http://www.unicode.org/review/pri323/  -- beta review
2268http://www.unicode.org/reports/uax-proposed-updates.html
2269http://www.unicode.org/versions/beta-9.0.0.html
2270http://www.unicode.org/versions/Unicode9.0.0/
2271http://www.unicode.org/reports/tr44/tr44-17.html
2272
2273*** ICU Trac
2274
2275- ticket:12526: integrate Unicode 9
2276- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
2277- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
2278
2279*** CLDR Trac
2280
2281- cldrbug 9414: UCA 9
2282- ^/branches/markus/uni90 at r11518 from trunk at r11517
2283
2284- cldrbug 8745: Unicode 9.0 script metadata
2285
2286*** Unicode version numbers
2287- makedata.mak
2288- uchar.h
2289- com.ibm.icu.util.VersionInfo
2290- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2291
2292- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2293  so that the makefiles see the new version number.
2294
2295*** data files & enums & parser code
2296
2297* file preparation
2298
2299- download UCD & IDNA files
2300- make sure that the Unicode data folder passed into preparseucd.py
2301  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2302- only for manual diffs: remove version suffixes from the file names
2303  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2304  (see https://sites.google.com/site/unicodetools/inputdata)
2305- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2306- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2307- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2308
2309- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
2310  and copy to $UNIDATA
2311    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
2312
2313* preparseucd.py changes
2314- remove or add new Unicode scripts from/to the
2315  only-in-ISO-15924 list according to the error messages:
2316    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
2317    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
2318    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
2319    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
2320  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2321      and in com.ibm.icu.dev.test.lang.TestUScript.java
2322- DerivedNumericValues.txt new numeric values
2323    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
2324    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
2325    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
2326    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
2327    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
2328  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
2329     uchar.c, UCharacterProperty.java
2330     to support a new series of values
2331- adjust preparseucd.py for Tangut algorithmic names
2332  in ppucd.txt:
2333    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
2334  ->
2335    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
2336- avoid block-compressing most String/Miscellaneous property values,
2337  triggered by genprops not coping with a multi-code point Case_Folding on
2338    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
2339  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
2340
2341* PropertyAliases.txt changes
2342- 1 new property PCM=Prepended_Concatenation_Mark
2343  Ignore: Only useful for layout engines.
2344  Ok to list in ppucd.txt.
2345
2346* PropertyValueAliases.txt new property values
2347    blk; Adlam                            ; Adlam
2348    blk; Bhaiksuki                        ; Bhaiksuki
2349    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
2350    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
2351    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
2352    blk; Marchen                          ; Marchen
2353    blk; Mongolian_Sup                    ; Mongolian_Supplement
2354    blk; Newa                             ; Newa
2355    blk; Osage                            ; Osage
2356    blk; Tangut                           ; Tangut
2357    blk; Tangut_Components                ; Tangut_Components
2358  -> add to uchar.h
2359    use long property names for enum constants
2360  -> add to UCharacter.UnicodeBlock IDs
2361    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2362            replace  public static final int \1_ID = \2; \3
2363  -> add to UCharacter.UnicodeBlock objects
2364    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2365            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2366
2367    GCB; EB                               ; E_Base
2368    GCB; EBG                              ; E_Base_GAZ
2369    GCB; EM                               ; E_Modifier
2370    GCB; GAZ                              ; Glue_After_Zwj
2371    GCB; ZWJ                              ; ZWJ
2372  -> uchar.h & UCharacter.GraphemeClusterBreak
2373
2374    jg ; African_Feh                      ; African_Feh
2375    jg ; African_Noon                     ; African_Noon
2376    jg ; African_Qaf                      ; African_Qaf
2377  -> uchar.h & UCharacter.JoiningGroup
2378
2379    lb ; EB                               ; E_Base
2380    lb ; EM                               ; E_Modifier
2381    lb ; ZWJ                              ; ZWJ
2382  -> uchar.h & UCharacter.LineBreak
2383
2384    sc ; Adlm                             ; Adlam
2385    sc ; Bhks                             ; Bhaiksuki
2386    sc ; Marc                             ; Marchen
2387    sc ; Newa                             ; Newa
2388    sc ; Osge                             ; Osage
2389    sc ; Tang                             ; Tangut
2390  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2391
2392    WB ; EB                               ; E_Base
2393    WB ; EBG                              ; E_Base_GAZ
2394    WB ; EM                               ; E_Modifier
2395    WB ; GAZ                              ; Glue_After_Zwj
2396    WB ; ZWJ                              ; ZWJ
2397  -> uchar.h & UCharacter.WordBreak
2398
2399* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2400    (not strictly necessary for NOT_ENCODED scripts)
2401  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2402
2403* generate normalization data files
2404  cd $ICU_ROOT/dbg
2405  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2406  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2407  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2408  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2409  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2410
2411* build ICU (make install)
2412  so that the tools build can pick up the new definitions from the installed header files.
2413
2414  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
2415
2416* build Unicode tools using CMake+make
2417
2418~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2419
2420  # Location (--prefix) of where ICU was installed.
2421  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2422  # Location of the ICU source tree.
2423  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2424
2425  ~/svn.icutools/trunk/dbg/unicode/c$
2426    cmake ../../../src/unicode/c
2427    make
2428
2429* generate core properties data files
2430  ~/svn.icutools/trunk/dbg/unicode/c$
2431    genprops/genprops $ICU_SRC_DIR
2432    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2433    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2434- rebuild ICU (make install) & tools
2435
2436* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2437  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2438- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2439- Unicode 6.0..9.0: U+2260, U+226E, U+226F
2440- nothing new in 9.0, no test file to update
2441
2442* run & fix ICU4C tests
2443- Andy handles RBBI & spoof check test failures
2444
2445* collation: CLDR collation root, UCA DUCET
2446
2447- UCA DUCET goes into Mark's Unicode tools, see
2448  https://sites.google.com/site/unicodetools/home#TOC-UCA
2449- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2450    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2451
2452- cd (CLDR UCA branch)/common/uca/
2453- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2454    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2455- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2456    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2457    (note removing the underscore before "Rules")
2458    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2459- restore TODO diffs in UCARules.txt
2460    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2461- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2462  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2463  from the CLDR root files (..._CLDR_..._SHORT.txt)
2464    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2465    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2466    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2467- if CLDR common/uca/unihan-index.txt changes, then update
2468  CLDR common/collation/root.xml <collation type="private-unihan">
2469  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2470
2471- run genuca, see command line above;
2472  deal with
2473    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
2474    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
2475        (add the character to genuca.cpp sampleCharsToScripts[])
2476  + look up the USCRIPT_ code for the new sample characters
2477    (should be obvious from the comment in the error output)
2478  + *add* mappings to sampleCharsToScripts[], do not replace them
2479    (in case the script sample characters flip-flop)
2480  + insert new scripts in DUCET script order, see the top_byte table
2481    at the beginning of FractionalUCA.txt
2482- rebuild ICU4C
2483
2484* Unihan collators
2485- run Unicode Tools
2486    org.unicode.draft.GenerateUnihanCollators
2487  with VM arguments
2488    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
2489    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
2490    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
2491    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2492    -DUVERSION=9.0.0
2493    -ea
2494- run Unicode Tools
2495    org.unicode.draft.GenerateUnihanCollatorFiles
2496  with the same arguments
2497- check CLDR diffs
2498    cd ~/svn.cldr/trunk
2499    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2500    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2501- copy to CLDR
2502    cd ~/svn.cldr/trunk
2503    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2504    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2505- commit to CLDR
2506- generate ICU zh collation data: run CLDR
2507    org.unicode.cldr.icu.NewLdml2IcuConverter
2508  with program arguments
2509    -t collation
2510    -s /home/mscherer/svn.cldr/trunk/common/collation
2511    -m /home/mscherer/svn.cldr/trunk/common/supplemental
2512    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
2513    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
2514    zh
2515  and VM arguments
2516    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2517- rebuild ICU4C
2518
2519* run & fix ICU4C tests, now with new CLDR collation root data
2520- run all tests with the collation test data *_SHORT.txt or the full files
2521  (the full ones have comments, useful for debugging)
2522- note on intltest: if collate/UCAConformanceTest fails, then
2523  utility/MultithreadTest/TestCollators will fail as well;
2524  fix the conformance test before looking into the multi-thread test
2525
2526* update Java data files
2527- refresh just the UCD/UCA-related/derived files, just to be safe
2528- see (ICU4C)/source/data/icu4j-readme.txt
2529- mkdir /tmp/icu4j
2530- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2531  output:
2532    ...
2533    Unicode .icu files built to ./out/build/icudt58l
2534    echo timestamp > uni-core-data
2535    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2536    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
2537    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2538    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2539    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
2540    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
2541    mkdir -p /tmp/icu4j/main/shared/data
2542    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2543    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
2544    mkdir -p /tmp/icu4j/main/shared/data
2545    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2546    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2547- copy the big-endian Unicode data files to another location,
2548  separate from the other data files,
2549  and then refresh ICU4J
2550    cd ~/svn.icu/trunk/dbg/data/out/icu4j
2551    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2552    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2553    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2554    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2555    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2556    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2557    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2558    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2559    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2560
2561* When refreshing all of ICU4J data from ICU4C
2562- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2563- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2564or
2565- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2566
2567* update CollationFCD.java
2568  + copy & paste the initializers of lcccIndex[] etc. from
2569    ICU4C/source/i18n/collationfcd.cpp to
2570    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2571
2572* refresh Java test .txt files
2573- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2574    cd $ICU_SRC_DIR/source/data/unidata
2575    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2576    cd ../../test/testdata
2577    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2578    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2579
2580* run & fix ICU4J tests
2581
2582*** LayoutEngine script information
2583
2584* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2585  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2586  in the working directory.
2587
2588  (It also generates ScriptRunData.cpp, which is no longer needed.)
2589
2590  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2591  (a plain text file)
2592  which maps ICU versions to the numbers of script/language constants
2593  that were added then.
2594  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2595
2596  The generated files have a current copyright date and "@deprecated" statement.
2597
2598* Review changes, fix Java tool if necessary, and copy to ICU4C
2599  cd ~/svn.icu4j/trunk/src
2600  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2601  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2602  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2603
2604*** API additions
2605- send notice to icu-design about new born-@stable API (enum constants etc.)
2606
2607*** merge the Unicode update branches back onto the trunk
2608- do not merge the icudata.jar and testdata.jar,
2609  instead rebuild them from merged & tested ICU4C
2610- make sure that changes to Unicode tools & ICU tools are checked in
2611  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2612  http://bugs.icu-project.org/trac/log/tools/trunk
2613
2614---------------------------------------------------------------------------- ***
2615
2616New script codes early in ICU 58: https://unicode-org.atlassian.net/browse/ICU-11764
2617
2618Adding
2619- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
2620- new combination/alias codes: Hanb, Jamo
2621  - used in CLDR 29 and in spoof checker
2622- new Z* code: Zsye
2623
2624Add new codes to uscript.h & UScript.java, see Unicode update logs.
2625  -> com.ibm.icu.lang.UScript
2626    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2627    replace  public static final int \1 = \2; \3
2628
2629Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
2630add new script codes.
2631"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
2632
2633Note: If we have to run preparseucd.py again before the Unicode 9 update,
2634then we need to manually keep/restore the new script codes.
2635
2636ICU_ROOT=~/svn.icu/trunk
2637ICU_SRC_DIR=$ICU_ROOT/src
2638ICUDT=icudt57b
2639export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2640SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2641UNIDATA=$ICU_SRC_DIR/source/data/unidata
2642
2643Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
2644see https://unicode-org.atlassian.net/browse/ICU-12141
2645
2646make install, then icutools cmake & make, then
2647~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2648
2649Generate Java data as usual, only update pnames.icu & uprops.icu.
2650
2651*** LayoutEngine script information
2652
2653* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2654  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2655  in the working directory.
2656
2657  (It also generates ScriptRunData.cpp, which is no longer needed.)
2658
2659  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2660  (a plain text file)
2661  which maps ICU versions to the numbers of script/language constants
2662  that were added then.
2663  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2664
2665  The generated files have a current copyright date and "@deprecated" statement.
2666
2667* Review changes, fix Java tool if necessary, and copy to ICU4C
2668  cd ~/svn.icu4j/trunk/src
2669  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2670  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2671  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2672
2673---------------------------------------------------------------------------- ***
2674
2675Emoji properties added in ICU 57: https://unicode-org.atlassian.net/browse/ICU-11802
2676
2677Edit preparseucd.py to add & parse new properties.
2678They share the UCD property namespace but are not listed in PropertyAliases.txt.
2679
2680Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
2681Initial data from emoji/2.0/
2682
2683ICU_ROOT=~/svn.icu/trunk
2684ICU_SRC_DIR=$ICU_ROOT/src
2685ICUDT=icudt56b
2686export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2687SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2688UNIDATA=$ICU_SRC_DIR/source/data/unidata
2689
2690Add binary-property constants to uchar.h enum UProperty & UProperty.java.
2691
2692~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2693(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
2694
2695Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
2696
2697make install, then icutools cmake & make, then
2698~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2699
2700Generate Java data as usual, only update pnames.icu & uprops.icu.
2701
2702---------------------------------------------------------------------------- ***
2703
2704Unicode 8.0 update for ICU 56
2705
2706* Command-line environment setup
2707
2708ICU_ROOT=~/svn.icu/trunk
2709ICU_SRC_DIR=$ICU_ROOT/src
2710ICUDT=icudt56b
2711export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2712SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2713UNIDATA=$ICU_SRC_DIR/source/data/unidata
2714
2715http://www.unicode.org/review/pri297/  -- beta review
2716http://www.unicode.org/reports/uax-proposed-updates.html
2717http://unicode.org/versions/beta-8.0.0.html
2718http://www.unicode.org/versions/Unicode8.0.0/
2719http://www.unicode.org/reports/tr44/tr44-15.html
2720
2721*** ICU Trac
2722
2723- ticket:11574: Unicode 8
2724- C++ branches/markus/uni80 at r37351 from trunk at r37343
2725- Java branches/markus/uni80 at r37352 from trunk at r37338
2726
2727*** CLDR Trac
2728
2729- cldrbug 8311: UCA 8
2730- branches/markus/uni80 at r11518 from trunk at r11517
2731
2732- cldrbug 8109: Unicode 8.0 script metadata
2733- cldrbug 8418: Updated segmentation for Unicode 8.0
2734
2735*** Unicode version numbers
2736- makedata.mak
2737- uchar.h
2738- com.ibm.icu.util.VersionInfo
2739- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2740
2741- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2742  so that the makefiles see the new version number.
2743
2744*** data files & enums & parser code
2745
2746* file preparation
2747
2748- download UCD & IDNA files
2749- make sure that the Unicode data folder passed into preparseucd.py
2750  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2751- only for manual diffs: remove version suffixes from the file names
2752  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2753  (see https://sites.google.com/site/unicodetools/inputdata)
2754- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2755- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2756- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2757
2758- also: from http://unicode.org/Public/security/8.0.0/ download new
2759  confusables.txt & confusablesWholeScript.txt
2760  and copy to $UNIDATA
2761    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
2762    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
2763
2764* initial preparseucd.py changes
2765- remove new Unicode scripts from the
2766  only-in-ISO-15924 list according to the error message:
2767    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
2768    from _scripts_only_in_iso15924
2769  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2770      and in com.ibm.icu.dev.test.lang.TestUScript.java
2771- property and file name change:
2772    IndicMatraCategory -> IndicPositionalCategory
2773- UnicodeData.txt unusual numeric values (improper fractions)
2774    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
2775    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
2776    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
2777    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
2778    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
2779    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
2780    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
2781    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
2782    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
2783    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
2784  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
2785     which are listed in DerivedNumericValues.txt;
2786     keeps storage in data file simple
2787
2788* PropertyValueAliases.txt changes
2789- 10 new Block (blk) values:
2790    blk; Ahom                             ; Ahom
2791    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
2792    blk; Cherokee_Sup                     ; Cherokee_Supplement
2793    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
2794    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
2795    blk; Hatran                           ; Hatran
2796    blk; Multani                          ; Multani
2797    blk; Old_Hungarian                    ; Old_Hungarian
2798    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
2799    blk; Sutton_SignWriting               ; Sutton_SignWriting
2800  -> add to uchar.h
2801    use long property names for enum constants
2802  -> add to UCharacter.UnicodeBlock IDs
2803    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2804            replace  public static final int \1_ID = \2; \3
2805  -> add to UCharacter.UnicodeBlock objects
2806    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2807            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2808- 6 new Script (sc) values:
2809    sc ; Ahom                             ; Ahom
2810    sc ; Hatr                             ; Hatran
2811    sc ; Hluw                             ; Anatolian_Hieroglyphs
2812    sc ; Hung                             ; Old_Hungarian
2813    sc ; Mult                             ; Multani
2814    sc ; Sgnw                             ; SignWriting
2815  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2816
2817* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2818    (not strictly necessary for NOT_ENCODED scripts)
2819  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2820
2821* generate normalization data files
2822  cd $ICU_ROOT/dbg
2823  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2824  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2825  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2826  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2827  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2828
2829* build ICU (make install)
2830  so that the tools build can pick up the new definitions from the installed header files.
2831
2832  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2833
2834* build Unicode tools using CMake+make
2835
2836~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2837
2838  # Location (--prefix) of where ICU was installed.
2839  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2840  # Location of the ICU source tree.
2841  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2842
2843  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2844  ~/svn.icutools/trunk/dbg/unicode/c$ make
2845
2846* generate core properties data files
2847- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2848- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2849- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2850- rebuild ICU (make install) & tools
2851- run genuca again (see step above) so that it picks up the new nfc.nrm
2852- rebuild ICU (make install) & tools
2853
2854* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2855  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2856- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2857- Unicode 6.0..8.0: U+2260, U+226E, U+226F
2858- nothing new in 8.0, no test file to update
2859
2860* run & fix ICU4C tests
2861- bad Cherokee case folding due to difference in fallbacks:
2862  UCD case folding falls back to no mapping,
2863  ICU runtime case folding falls back to lowercasing;
2864  fixed casepropsbuilder.cpp to generate scf mappings to self
2865  when there is an slc mapping but no scf
2866- Andy handles RBBI & spoof check test failures
2867
2868* collation: CLDR collation root, UCA DUCET
2869
2870- UCA DUCET goes into Mark's Unicode tools, see
2871  https://sites.google.com/site/unicodetools/home#TOC-UCA
2872- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2873- cd (CLDR UCA branch)/common/uca/
2874- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2875  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2876- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2877    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2878    (note removing the underscore before "Rules")
2879    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2880- restore TODO diffs in UCARules.txt
2881    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2882- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2883  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2884  from the CLDR root files (..._CLDR_..._SHORT.txt)
2885    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2886    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2887    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2888- if CLDR common/uca/unihan-index.txt changes, then update
2889  CLDR common/collation/root.xml <collation type="private-unihan">
2890  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2891- run genuca, see command line above;
2892  deal with
2893    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
2894        (add the character to genuca.cpp sampleCharsToScripts[])
2895  + look up the script for the new sample characters
2896    (e.g., in FractionalUCA.txt)
2897  + *add* mappings to sampleCharsToScripts[], do not replace them
2898    (in case the script sample characters flip-flop)
2899  + insert new scripts in DUCET script order, see the top_byte table
2900    at the beginning of FractionalUCA.txt
2901- rebuild ICU4C
2902
2903* run & fix ICU4C tests, now with new CLDR collation root data
2904- run all tests with the collation test data *_SHORT.txt or the full files
2905  (the full ones have comments, useful for debugging)
2906- note on intltest: if collate/UCAConformanceTest fails, then
2907  utility/MultithreadTest/TestCollators will fail as well;
2908  fix the conformance test before looking into the multi-thread test
2909- fixed bug in CollationWeights::getWeightRanges()
2910  exposed by new data and CollationTest::TestRootElements
2911
2912* update Java data files
2913- refresh just the UCD/UCA-related/derived files, just to be safe
2914- see (ICU4C)/source/data/icu4j-readme.txt
2915- mkdir /tmp/icu4j
2916- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2917  output:
2918    ...
2919    Unicode .icu files built to ./out/build/icudt56l
2920    echo timestamp > uni-core-data
2921    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2922    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
2923    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2924    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2925    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
2926    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
2927    mkdir -p /tmp/icu4j/main/shared/data
2928    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2929    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
2930    mkdir -p /tmp/icu4j/main/shared/data
2931    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2932    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2933- copy the big-endian Unicode data files to another location,
2934  separate from the other data files,
2935  and then refresh ICU4J
2936    cd ~/svn.icu/trunk/dbg/data/out/icu4j
2937    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2938    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2939    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2940    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2941    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2942    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2943    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2944    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2945    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2946
2947* When refreshing all of ICU4J data from ICU4C
2948- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2949- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2950or
2951- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2952
2953* update CollationFCD.java
2954  + copy & paste the initializers of lcccIndex[] etc. from
2955    ICU4C/source/i18n/collationfcd.cpp to
2956    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2957
2958* refresh Java test .txt files
2959- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2960    cd $ICU_SRC_DIR/source/data/unidata
2961    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2962    cd ../../test/testdata
2963    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2964    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2965
2966* run & fix ICU4J tests
2967
2968*** LayoutEngine script information
2969
2970* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
2971  because the layout engine was deprecated in ICU 54.
2972  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
2973  to write lines that we used to add manually.
2974
2975* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2976  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2977  in the working directory.
2978
2979  (It also generates ScriptRunData.cpp, which is no longer needed.)
2980
2981  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2982  (a plain text file)
2983  which maps ICU versions to the numbers of script/language constants
2984  that were added then.
2985  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2986
2987  The generated files have a current copyright date and "@deprecated" statement.
2988
2989* Review changes, fix Java tool if necessary, and copy to ICU4C
2990  cd ~/svn.icu4j/trunk/src
2991  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2992  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2993  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2994
2995*** API additions
2996- send notice to icu-design about new born-@stable API (enum constants etc.)
2997
2998*** merge the Unicode update branches back onto the trunk
2999- do not merge the icudata.jar and testdata.jar,
3000  instead rebuild them from merged & tested ICU4C
3001- make sure that changes to Unicode tools & ICU tools are checked in
3002  http://www.unicode.org/utility/trac/log/trunk/unicodetools
3003  http://bugs.icu-project.org/trac/log/tools/trunk
3004
3005---------------------------------------------------------------------------- ***
3006
3007Unicode 7.0 update for ICU 54
3008
3009http://www.unicode.org/review/pri271/  -- beta review
3010http://www.unicode.org/reports/uax-proposed-updates.html
3011http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
3012http://www.unicode.org/reports/tr44/tr44-13.html
3013
3014*** ICU Trac
3015
3016- ticket 10821: Unicode 7.0, UCA 7.0
3017- C++ branches/markus/uni70 at r35584 from trunk at r35580
3018- Java branches/markus/uni70 at r35587 from trunk at r35545
3019
3020*** CLDR Trac
3021
3022- ticket 7195: UCA 7.0 CLDR root collation
3023- branches/markus/uni70 at r10062 from trunk at r10061
3024
3025- ticket 6762: script metadata for Unicode 7.0 new scripts
3026
3027*** Unicode version numbers
3028- makedata.mak
3029- uchar.h
3030- com.ibm.icu.util.VersionInfo
3031- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3032
3033- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3034  so that the makefiles see the new version number.
3035
3036*** data files & enums & parser code
3037
3038* file preparation
3039
3040- download UCD & IDNA files
3041- make sure that the Unicode data folder passed into preparseucd.py
3042  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3043- only for manual diffs: remove version suffixes from the file names
3044  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
3045  (see https://sites.google.com/site/unicodetools/inputdata)
3046- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
3047- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3048- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3049- Restore TODO diffs in source/data/unidata/UCARules.txt
3050    cd $ICU_SRC_DIR
3051    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
3052- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
3053
3054- also: from http://unicode.org/Public/security/7.0.0/ download new
3055  confusables.txt & confusablesWholeScript.txt
3056  and copy to $ICU_ROOT/src/source/data/unidata/
3057
3058* initial preparseucd.py changes
3059- remove new Unicode scripts from the
3060  only-in-ISO-15924 list according to the error message:
3061    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
3062                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
3063                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
3064    from _scripts_only_in_iso15924
3065  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3066      and in com.ibm.icu.dev.test.lang.TestUScript.java
3067- NamesList.txt now has a heading with a non-ASCII character
3068  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
3069  + escape non-ASCII characters in heading comments
3070- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
3071  + get the copyright from the first file whose copyright line contains the current year
3072
3073* PropertyValueAliases.txt changes
3074- 32 new Block (blk) values:
3075    blk; Bassa_Vah                        ; Bassa_Vah
3076    blk; Caucasian_Albanian               ; Caucasian_Albanian
3077    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
3078    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
3079    blk; Duployan                         ; Duployan
3080    blk; Elbasan                          ; Elbasan
3081    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
3082    blk; Grantha                          ; Grantha
3083    blk; Khojki                           ; Khojki
3084    blk; Khudawadi                        ; Khudawadi
3085    blk; Latin_Ext_E                      ; Latin_Extended_E
3086    blk; Linear_A                         ; Linear_A
3087    blk; Mahajani                         ; Mahajani
3088    blk; Manichaean                       ; Manichaean
3089    blk; Mende_Kikakui                    ; Mende_Kikakui
3090    blk; Modi                             ; Modi
3091    blk; Mro                              ; Mro
3092    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
3093    blk; Nabataean                        ; Nabataean
3094    blk; Old_North_Arabian                ; Old_North_Arabian
3095    blk; Old_Permic                       ; Old_Permic
3096    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
3097    blk; Pahawh_Hmong                     ; Pahawh_Hmong
3098    blk; Palmyrene                        ; Palmyrene
3099    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
3100    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
3101    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
3102    blk; Siddham                          ; Siddham
3103    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
3104    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
3105    blk; Tirhuta                          ; Tirhuta
3106    blk; Warang_Citi                      ; Warang_Citi
3107  -> add to uchar.h
3108    use long property names for enum constants
3109  -> add to UCharacter.UnicodeBlock IDs
3110    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3111            replace  public static final int \1_ID = \2; \3
3112  -> add to UCharacter.UnicodeBlock objects
3113    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3114            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3115- 28 new Joining_Group (jg) values:
3116    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
3117    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
3118    jg ; Manichaean_Beth                  ; Manichaean_Beth
3119    jg ; Manichaean_Daleth                ; Manichaean_Daleth
3120    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
3121    jg ; Manichaean_Five                  ; Manichaean_Five
3122    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
3123    jg ; Manichaean_Heth                  ; Manichaean_Heth
3124    jg ; Manichaean_Hundred               ; Manichaean_Hundred
3125    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
3126    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
3127    jg ; Manichaean_Mem                   ; Manichaean_Mem
3128    jg ; Manichaean_Nun                   ; Manichaean_Nun
3129    jg ; Manichaean_One                   ; Manichaean_One
3130    jg ; Manichaean_Pe                    ; Manichaean_Pe
3131    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
3132    jg ; Manichaean_Resh                  ; Manichaean_Resh
3133    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
3134    jg ; Manichaean_Samekh                ; Manichaean_Samekh
3135    jg ; Manichaean_Taw                   ; Manichaean_Taw
3136    jg ; Manichaean_Ten                   ; Manichaean_Ten
3137    jg ; Manichaean_Teth                  ; Manichaean_Teth
3138    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
3139    jg ; Manichaean_Twenty                ; Manichaean_Twenty
3140    jg ; Manichaean_Waw                   ; Manichaean_Waw
3141    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
3142    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
3143    jg ; Straight_Waw                     ; Straight_Waw
3144  -> uchar.h & UCharacter.JoiningGroup
3145- 23 new Script (sc) values:
3146    sc ; Aghb                             ; Caucasian_Albanian
3147    sc ; Bass                             ; Bassa_Vah
3148    sc ; Dupl                             ; Duployan
3149    sc ; Elba                             ; Elbasan
3150    sc ; Gran                             ; Grantha
3151    sc ; Hmng                             ; Pahawh_Hmong
3152    sc ; Khoj                             ; Khojki
3153    sc ; Lina                             ; Linear_A
3154    sc ; Mahj                             ; Mahajani
3155    sc ; Mani                             ; Manichaean
3156    sc ; Mend                             ; Mende_Kikakui
3157    sc ; Modi                             ; Modi
3158    sc ; Mroo                             ; Mro
3159    sc ; Narb                             ; Old_North_Arabian
3160    sc ; Nbat                             ; Nabataean
3161    sc ; Palm                             ; Palmyrene
3162    sc ; Pauc                             ; Pau_Cin_Hau
3163    sc ; Perm                             ; Old_Permic
3164    sc ; Phlp                             ; Psalter_Pahlavi
3165    sc ; Sidd                             ; Siddham
3166    sc ; Sind                             ; Khudawadi
3167    sc ; Tirh                             ; Tirhuta
3168    sc ; Wara                             ; Warang_Citi
3169  -> uscript.h (many were added before)
3170    comment "Mende Kikakui" for USCRIPT_MENDE
3171    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
3172  -> com.ibm.icu.lang.UScript
3173    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3174    replace  public static final int \1 = \2; \3
3175- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3176  (added 2012-11-01)
3177    Ahom        338     Ahom
3178    Hatr        127     Hatran
3179    Mult        323     Multani
3180  (added 2013-10-12)
3181    Modi        324     Modi
3182    Pauc        263     Pau Cin Hau
3183    Sidd        302     Siddham
3184  -> uscript.h (some overlap with additions from Unicode)
3185  -> com.ibm.icu.lang.UScript
3186    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3187    replace  public static final int \1 = \2; \3
3188  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
3189  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3190      and in com.ibm.icu.dev.test.lang.TestUScript.java
3191
3192* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3193    (not strictly necessary for NOT_ENCODED scripts)
3194  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
3195
3196* generate normalization data files
3197- cd $ICU_ROOT/dbg
3198- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3199- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3200- UNIDATA=$ICU_SRC_DIR/source/data/unidata
3201- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
3202- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3203- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3204- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3205- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3206
3207* build ICU (make install)
3208  so that the tools build can pick up the new definitions from the installed header files.
3209
3210~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3211
3212* build Unicode tools using CMake+make
3213
3214~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3215
3216# Location (--prefix) of where ICU was installed.
3217set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
3218# Location of the ICU source tree.
3219set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
3220
3221~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3222~/svn.icutools/trunk/dbg/unicode/c$ make
3223
3224* genprops work
3225- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
3226  + add second array of Joining_Group values for at most 10800..10FFF
3227    icutools: unicode/c/genprops/bidipropsbuilder.cpp
3228    icu: source/common/ubidi_props.h/.c/_data.h
3229    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
3230
3231* generate core properties data files
3232- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
3233- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
3234- rebuild ICU (make install) & tools
3235- run genuca again (see step above) so that it picks up the new nfc.nrm
3236- rebuild ICU (make install) & tools
3237
3238* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3239  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3240- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3241- Unicode 6.0..7.0: U+2260, U+226E, U+226F
3242- nothing new in 7.0, no test file to update
3243
3244* run & fix ICU4C tests
3245
3246* update Java data files
3247- refresh just the UCD-related files, just to be safe
3248- see (ICU4C)/source/data/icu4j-readme.txt
3249- mkdir /tmp/icu4j
3250- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3251  output:
3252    ...
3253    Unicode .icu files built to ./out/build/icudt53l
3254    echo timestamp > uni-core-data
3255    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
3256    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
3257    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3258    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
3259    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
3260    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
3261    mkdir -p /tmp/icu4j/main/shared/data
3262    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3263    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
3264    mkdir -p /tmp/icu4j/main/shared/data
3265    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3266    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
3267- copy the big-endian Unicode data files to another location,
3268  separate from the other data files
3269    ICUDT=icudt54b
3270    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3271    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3272    cd ~/svn.icu/uni70/dbg/data/out/icu4j
3273    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3274    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3275    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
3276    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3277    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3278    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3279- refresh ICU4J
3280    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3281
3282* update CollationFCD.java
3283  + copy & paste the initializers of lcccIndex[] etc. from
3284    ICU4C/source/i18n/collationfcd.cpp to
3285    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
3286
3287* refresh Java test .txt files
3288- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3289    cd $ICU_SRC_DIR/source/data/unidata
3290    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3291    cd ../../test/testdata
3292    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3293    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3294
3295* UCA
3296
3297- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
3298- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
3299- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
3300- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
3301- output files are in ~/svn.unitools/Generated/uca/7.0.0/
3302- review data; compare files, use blankweights.sed or similar
3303  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
3304- cd ~/svn.unitools/Generated/uca/7.0.0/
3305- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3306  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3307- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3308    (note removing the underscore before "Rules")
3309    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3310- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3311  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3312  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3313    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3314    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3315    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3316- run genuca, see command line above
3317- rebuild ICU4C
3318- refresh ICU4J collation data:
3319  (subset of instructions above for properties data refresh, except copies all coll/*)
3320    ICUDT=icudt54b
3321    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3322    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3323    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3324    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3325- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3326- note on intltest: if collate/UCAConformanceTest fails, then
3327  utility/MultithreadTest/TestCollators will fail as well;
3328  fix the conformance test before looking into the multi-thread test
3329- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
3330- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
3331  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
3332
3333* When refreshing all of ICU4J data from ICU4C
3334- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3335- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3336or
3337- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3338
3339* run & fix ICU4J tests
3340
3341*** LayoutEngine script information
3342
3343(For details see the Unicode 5.2 change log below.)
3344
3345* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3346  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3347  in the working directory.
3348  (It also generates ScriptRunData.cpp, which is no longer needed.)
3349
3350  The generated files have a current copyright date and "@stable" statement.
3351  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
3352  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
3353  which may not contain dots any more.
3354
3355- diff current <icu>/source/layout files vs. generated ones
3356    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3357  review and manually merge desired changes;
3358  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
3359  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3360- if you just copy the above files, then
3361  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3362  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3363
3364*** API additions
3365- send notice to icu-design about new born-@stable API (enum constants etc.)
3366
3367*** merge the Unicode update branches back onto the trunk
3368- do not merge the icudata.jar and testdata.jar,
3369  instead rebuild them from merged & tested ICU4C
3370
3371---------------------------------------------------------------------------- ***
3372
3373Unicode 6.3 update
3374
3375http://www.unicode.org/review/pri249/  -- beta review
3376http://www.unicode.org/reports/uax-proposed-updates.html
3377http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
3378http://www.unicode.org/reports/tr44/tr44-11.html
3379
3380*** ICU Trac
3381
3382- ticket 10128: update ICU to Unicode 6.3 beta
3383- ticket 10168: update ICU to Unicode 6.3 final
3384- C++ branches/markus/uni63 at r33552 from trunk at r33551
3385- Java branches/markus/uni63 at r33550 from trunk at r33553
3386
3387- ticket 10142: implement Unicode 6.3 bidi algorithm additions
3388
3389*** Unicode version numbers
3390- makedata.mak
3391- uchar.h
3392  (configure.in & configure: have been modified to extract the version from uchar.h)
3393- com.ibm.icu.util.VersionInfo
3394- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3395
3396- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3397  so that the makefiles see the new version number.
3398
3399*** data files & enums & parser code
3400
3401* file preparation
3402
3403- download UCD, UCA & IDNA files
3404- make sure that the Unicode data folder passed into preparseucd.py
3405  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3406- modify preparseucd.py:
3407  parse new file BidiBrackets.txt
3408  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
3409- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
3410- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3411- Check test file diffs for previously commented-out, known-failing data lines;
3412  probably need to keep those commented out.
3413
3414* PropertyAliases.txt changes
3415- 1 new Enumerated Property
3416  bpt                      ; Bidi_Paired_Bracket_Type
3417  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
3418  -> ubidi_props.h & .c & UBiDiProps.java
3419  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
3420  -> uprops.cpp
3421  -> change ubidi.icu format version from 2.0 to 2.1
3422- 1 new Miscellaneous Property
3423  bpb                      ; Bidi_Paired_Bracket
3424  -> uchar.h & UProperty.java
3425  -> ppucd.h & .cpp
3426
3427* PropertyValueAliases.txt changes
3428- 3 Bidi_Paired_Bracket_Type (bpt) values:
3429  bpt; c                                ; Close
3430  bpt; n                                ; None
3431  bpt; o                                ; Open
3432  -> uchar.h & UCharacter.BidiPairedBracketType
3433  -> ubidi_props.h & .c & UBiDiProps.java
3434  -> change ubidi.icu format version from 2.0 to 2.1
3435- 4 new Bidi_Class (bc) values:
3436  bc ; FSI                              ; First_Strong_Isolate
3437  bc ; LRI                              ; Left_To_Right_Isolate
3438  bc ; RLI                              ; Right_To_Left_Isolate
3439  bc ; PDI                              ; Pop_Directional_Isolate
3440  -> uchar.h & UCharacterEnums.ECharacterDirection
3441  -> until the bidi code gets updated,
3442     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
3443- 3 new Word_Break (WB) values:
3444  WB ; HL                               ; Hebrew_Letter
3445  WB ; SQ                               ; Single_Quote
3446  WB ; DQ                               ; Double_Quote
3447  -> uchar.h & UCharacter.WordBreak
3448  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
3449- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3450  (added 2012-10-16)
3451  Aghb  239     Caucasian Albanian
3452  Mahj  314     Mahajani
3453  -> uscript.h
3454  -> com.ibm.icu.lang.UScript
3455    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3456    replace  public static final int \1 = \2;\3
3457  -> preparseucd.py _scripts_only_in_iso15924
3458  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3459      and in com.ibm.icu.dev.test.lang.TestUScript.java
3460  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3461     (not strictly necessary for NOT_ENCODED scripts)
3462
3463* generate normalization data files
3464- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
3465- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
3466- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
3467- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3468- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3469- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3470- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3471
3472* build ICU (make install)
3473  so that the tools build can pick up the new definitions from the installed header files.
3474
3475~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3476
3477* build Unicode tools using CMake+make
3478
3479~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3480
3481# Location (--prefix) of where ICU was installed.
3482set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
3483# Location of the ICU source tree.
3484set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
3485
3486~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3487~/svn.icutools/trunk/dbg/unicode/c$ make
3488
3489* generate core properties data files
3490- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
3491- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
3492- rebuild ICU (make install) & tools
3493- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3494- rebuild ICU (make install) & tools
3495
3496* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3497  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3498- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3499- Unicode 6.0..6.3: U+2260, U+226E, U+226F
3500- nothing new in 6.3, no test file to update
3501
3502* update Java data files
3503- refresh just the UCD-related files, just to be safe
3504- see (ICU4C)/source/data/icu4j-readme.txt
3505- mkdir /tmp/icu4j
3506- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3507  output:
3508    ...
3509    Unicode .icu files built to ./out/build/icudt52l
3510    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3511    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
3512    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3513    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3514    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
3515    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
3516    mkdir -p /tmp/icu4j/main/shared/data
3517    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3518    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
3519    mkdir -p /tmp/icu4j/main/shared/data
3520    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3521    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
3522- copy the big-endian Unicode data files to another location,
3523  separate from the other data files
3524    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3525    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3526    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3527    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
3528    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3529    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3530    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3531- refresh ICU4J
3532    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3533
3534* refresh Java test .txt files
3535- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3536
3537* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
3538
3539- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3540- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3541- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3542- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3543  (note removing the underscore before "Rules")
3544- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3545  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3546  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3547- check test file diffs for previously commented-out, known-failing data lines;
3548  probably need to keep those commented out
3549- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3550- run genuca, see command line above
3551- rebuild ICU4C
3552- refresh ICU4J collation data:
3553  (subset of instructions above for properties data refresh, except copies all coll/*)
3554    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3555    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3556    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3557    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3558- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3559- note on intltest: if collate/UCAConformanceTest fails, then
3560  utility/MultithreadTest/TestCollators will fail as well;
3561  fix the conformance test before looking into the multi-thread test
3562
3563* test ICU, fix test code where necessary
3564
3565* When refreshing all of ICU4J data from ICU4C
3566- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3567- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3568or
3569- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3570
3571*** LayoutEngine script information
3572- skipped for Unicode 6.3: no new scripts
3573
3574*** merge the Unicode update branches back onto the trunk
3575- do not merge the icudata.jar and testdata.jar,
3576  instead rebuild them from merged & tested ICU4C
3577
3578---------------------------------------------------------------------------- ***
3579
3580Unicode 6.2 update
3581
3582http://www.unicode.org/review/pri230/
3583http://www.unicode.org/versions/beta-6.2.0.html
3584http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
3585http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
3586http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
3587http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
3588http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
3589http://unicode.org/Public/idna/6.2.0/
3590
3591*** ICU Trac
3592
3593- ticket 9515: Unicode 6.2: final ICU update
3594
3595- ticket 9514: UCA 6.2: fix UCARules.txt
3596
3597- ticket 9437: update ICU to Unicode 6.2
3598- C++ branches/markus/uni62 at r32050 from trunk at r32041
3599- Java branches/markus/uni62 at r32068 from trunk at r32066
3600
3601*** Unicode version numbers
3602- makedata.mak
3603- uchar.h
3604  (configure.in & configure: have been modified to extract the version from uchar.h)
3605- com.ibm.icu.util.VersionInfo
3606- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3607
3608*** data files & enums & parser code
3609
3610* file preparation
3611
3612- download UCD, UCA & IDNA files
3613- make sure that the Unicode data folder passed into preparseucd.py
3614  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3615- modify preparseucd.py: NamesList.txt is now in UTF-8
3616- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
3617- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3618- Check test file diffs for previously commented-out, known-failing data lines;
3619  probably need to keep those commented out.
3620
3621* PropertyValueAliases.txt changes
3622- 1 new Line_Break (lb) value:
3623  lb ; RI                               ; Regional_Indicator
3624  -> uchar.h & UCharacter.LineBreak
3625- 1 new Word_Break (WB) value:
3626  WB ; RI                               ; Regional_Indicator
3627  -> uchar.h & UCharacter.WordBreak
3628- 1 new Grapheme_Cluster_Break (GCB) value:
3629  GCB; RI                               ; Regional_Indicator
3630  -> uchar.h & UCharacter.GraphemeClusterBreak
3631
3632* 3 new numeric values
3633  The new value -1, which was really supposed to be NaN but that would have required
3634  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
3635  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
3636    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
3637    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
3638  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
3639    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
3640    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
3641  -> uprops.h, uchar.c & UCharacterProperty.java
3642  -> cucdtst.c & UCharacterTest.java
3643
3644* generate normalization data files
3645- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
3646- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
3647- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
3648- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3649- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3650- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3651- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3652
3653* build ICU (make install)
3654  so that the tools build can pick up the new definitions from the installed header files.
3655* build Unicode tools using CMake+make
3656
3657* generate core properties data files
3658- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
3659- in initial bootstrapping, change the UCA version
3660  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3661- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
3662- rebuild ICU (make install) & tools
3663  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3664    check if the UCA version in FractionalUCA.txt matches the new Unicode version
3665    (see step above)
3666- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3667- rebuild ICU (make install) & tools
3668
3669* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3670  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3671- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3672- Unicode 6.0..6.2: U+2260, U+226E, U+226F
3673- nothing new in 6.2, no test file to update
3674
3675* update Java data files
3676- refresh just the UCD-related files, just to be safe
3677- see (ICU4C)/source/data/icu4j-readme.txt
3678- mkdir /tmp/icu4j
3679- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3680  output:
3681    ...
3682    Unicode .icu files built to ./out/build/icudt50l
3683    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3684    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
3685    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3686    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3687    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
3688    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
3689    mkdir -p /tmp/icu4j/main/shared/data
3690    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3691    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
3692    mkdir -p /tmp/icu4j/main/shared/data
3693    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3694    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
3695- copy the big-endian Unicode data files to another location,
3696  separate from the other data files
3697    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3698    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3699    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3700    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
3701    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3702    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3703    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3704- refresh ICU4J
3705    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3706
3707* refresh Java test .txt files
3708- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3709
3710* UCA
3711
3712- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3713- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3714- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3715- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3716  (note removing the underscore before "Rules")
3717- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3718  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3719  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3720- check test file diffs for previously commented-out, known-failing data lines;
3721  probably need to keep those commented out
3722- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3723- run genuca, see command line above
3724- rebuild ICU4C
3725- refresh ICU4J collation data:
3726  (subset of instructions above for properties data refresh, except copies all coll/*)
3727    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3728    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3729    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3730    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3731- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3732- note on intltest: if collate/UCAConformanceTest fails, then
3733  utility/MultithreadTest/TestCollators will fail as well;
3734  fix the conformance test before looking into the multi-thread test
3735
3736* test ICU, fix test code where necessary
3737
3738* When refreshing all of ICU4J data from ICU4C
3739- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3740- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3741or
3742- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3743
3744*** LayoutEngine script information
3745- skipped for Unicode 6.2: no new scripts
3746
3747*** merge the Unicode update branches back onto the trunk
3748- do not merge the icudata.jar and testdata.jar,
3749  instead rebuild them from merged & tested ICU4C
3750
3751---------------------------------------------------------------------------- ***
3752
3753Future Unicode update
3754
3755Tools simplified since the Unicode 6.1 update. See
3756- https://icu.unicode.org/design/props/ppucd
3757- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
3758
3759* Unicode version numbers
3760- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
3761
3762* file preparation
3763- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
3764- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
3765- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3766- Check test file diffs for previously commented-out, known-failing data lines;
3767  probably need to keep those commented out.
3768
3769* PropertyValueAliases.txt changes
3770- Script codes that are in ISO 15924 but not in Unicode are now listed in
3771  preparseucd.py, in the _scripts_only_in_iso15924 variable.
3772  If there are new ISO codes, then add them.
3773  If Unicode adds some of them, then remove them from the .py variable.
3774
3775* UnicodeData.txt changes
3776- No more manual changes for CJK ranges for algorithmic names;
3777  those are now written to ppucd.txt and genprops reads them from there.
3778
3779* generate core properties data files (makeprops.sh was deleted)
3780- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
3781
3782* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
3783- it is now generated by preparseucd.py
3784
3785* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
3786- it is now generated by preparseucd.py
3787- make sure that the Unicode data folder passed into preparseucd.py
3788  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3789  (can be in some subfolder)
3790
3791* generate normalization data files
3792- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
3793- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
3794- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
3795- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3796- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3797- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3798- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3799
3800* build ICU (make install)
3801* build Unicode tools using CMake+make
3802
3803* new way to call genuca (makeuca.sh was deleted)
3804- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
3805
3806---------------------------------------------------------------------------- ***
3807
3808Unicode 6.1 update
3809
3810*** ICU Trac
3811
3812- ticket 8995 final update to Unicode 6.1
3813- ticket 8994 regenerate source/layout/CanonData.cpp
3814
3815- ticket 8961 support Unicode "Age" value *names*
3816- ticket 8963 support multiple character name aliases & types
3817
3818- ticket 8827 "update ICU to Unicode 6.1"
3819- C++ branches/markus/uni61 at r30864 from trunk at r30843
3820- Java branches/markus/uni61 at r30865 from trunk at r30863
3821
3822*** Unicode version numbers
3823- makedata.mak
3824- uchar.h
3825  (configure.in & configure: have been modified to extract the version from uchar.h)
3826- com.ibm.icu.util.VersionInfo
3827- icutools/unicode/makedefs.sh
3828  + also review & update other definitions in that file,
3829    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
3830
3831*** data files & enums & parser code
3832
3833* file preparation
3834
3835~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
3836- This prepares both unidata and testdata files in respective output subfolders.
3837- Check test file diffs for previously commented-out, known-failing data lines;
3838  probably need to keep those commented out.
3839
3840* PropertyValueAliases.txt changes
3841- 11 new block names:
3842  Arabic_Extended_A
3843  Arabic_Mathematical_Alphabetic_Symbols
3844  Chakma
3845  Meetei_Mayek_Extensions
3846  Meroitic_Cursive
3847  Meroitic_Hieroglyphs
3848  Miao
3849  Sharada
3850  Sora_Sompeng
3851  Sundanese_Supplement
3852  Takri
3853  -> add to uchar.h
3854  -> add to UCharacter.UnicodeBlock IDs
3855    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3856            replace  public static final int \1_ID = \2; \3
3857  -> add to UCharacter.UnicodeBlock objects
3858    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3859            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3860- 1 new Joining_Group (jg) value:
3861  Rohingya_Yeh
3862  -> uchar.h & UCharacter.JoiningGroup
3863- 2 new Line_Break (lb) values:
3864  CJ=Conditional_Japanese_Starter
3865  HL=Hebrew_Letter
3866  -> uchar.h & UCharacter.LineBreak
3867- 7 new scripts:
3868  sc ; Cakm      ; Chakma
3869  sc ; Merc      ; Meroitic_Cursive
3870  sc ; Mero      ; Meroitic_Hieroglyphs
3871  sc ; Plrd      ; Miao
3872  sc ; Shrd      ; Sharada
3873  sc ; Sora      ; Sora_Sompeng
3874  sc ; Takr      ; Takri
3875  -> remove these from SyntheticPropertyValueAliases.txt
3876  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3877      and in com.ibm.icu.dev.test.lang.TestUScript.java
3878- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3879  (added 2011-06-21)
3880  Khoj        322     Khojki
3881  Tirh        326     Tirhuta
3882    and another one added 2011-12-09
3883  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
3884  -> uscript.h
3885  -> com.ibm.icu.lang.UScript
3886    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3887    replace  public static final int \1 = \2;\3
3888  -> SyntheticPropertyValueAliases.txt
3889  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3890      and in com.ibm.icu.dev.test.lang.TestUScript.java
3891
3892* UnicodeData.txt changes
3893- the last Unihan code point changes from U+9FCB to U+9FCC
3894  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
3895  + do change gennames.c
3896  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
3897
3898* DerivedBidiClass.txt changes
3899- 2 new default-AL blocks:
3900#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
3901#     Arabic Mathematical Alphabetic Symbols:
3902#                       U+1EE00  - U+1EEFF  (was default-R)
3903- 2 new default-R blocks:
3904#     Meroitic Hieroglyphs:
3905#                        U+10980 - U+1099F
3906#     Meroitic Cursive:  U+109A0 - U+109FF
3907  -> should be picked up by the explicit data in the file
3908
3909* NameAliases.txt changes
3910- from
3911    # Each line has two fields
3912    # First field: Code point
3913    # Second field: Alias
3914- to
3915    # Each line has three fields, as described here:
3916    #
3917    # First field:  Code point
3918    # Second field: Alias
3919    # Third field:  Type
3920- Also, the file previously allowed multiple aliases but only now does it
3921  actually provide multiple, even multiple of the same type. For example,
3922    FEFF;BYTE ORDER MARK;alternate
3923    FEFF;BOM;abbreviation
3924    FEFF;ZWNBSP;abbreviation
3925- This breaks our gennames parser, unames.icu data structure, and API.
3926  Fix gennames to only pick up "correction" aliases.
3927  New ticket #8963 for further changes.
3928
3929* run genpname/preparse.pl (on Linux)
3930  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3931  + make sure that data.h is writable
3932  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3933  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3934
3935* build ICU (make install)
3936  so that the tools build can pick up the new definitions from the installed header files.
3937* build Unicode tools (at least genpname) using CMake+make
3938
3939* run genpname
3940  (builds both pnames.icu and propname_data.h)
3941- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3942- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3943
3944* build ICU (make install)
3945* build Unicode tools using CMake+make
3946
3947* update source/data/unidata/norm2/nfkc_cf.txt
3948- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3949
3950* update source/data/unidata/norm2/uts46.txt
3951- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3952  to ~/svn.icu/tools/trunk/src/unicode/py
3953- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
3954- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3955- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3956
3957* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3958  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3959- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3960- Unicode 6.0..6.1: U+2260, U+226E, U+226F
3961- nothing new in 6.1, no test file to update
3962
3963* generate core properties data files
3964- in initial bootstrapping, change the UCA version
3965  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3966- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3967- rebuild ICU & tools
3968  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3969    check if the UCA version in FractionalUCA.txt matches the new Unicode version
3970    (see step above)
3971- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
3972  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3973- rebuild ICU & tools
3974
3975* update Java data files
3976- refresh just the UCD-related files, just to be safe
3977- see (ICU4C)/source/data/icu4j-readme.txt
3978- mkdir /tmp/icu4j
3979- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3980  output:
3981    ...
3982    Unicode .icu files built to ./out/build/icudt49l
3983    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3984    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
3985    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3986    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3987    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
3988    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
3989    mkdir -p /tmp/icu4j/main/shared/data
3990    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3991    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
3992    mkdir -p /tmp/icu4j/main/shared/data
3993    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3994    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
3995- copy the big-endian Unicode data files to another location,
3996  separate from the other data files
3997    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3998    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3999    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
4000    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
4001    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
4002    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4003    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
4004- refresh ICU4J
4005    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
4006
4007* refresh Java test .txt files
4008- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4009
4010* test ICU so far, fix test code where necessary
4011- temporarily ignore collation issues that look like UCA/UCD mismatches,
4012  until UCA data is updated
4013
4014* UCA
4015
4016- get output from Mark's tools; look in
4017    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
4018- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4019- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4020  (note removing the underscore before "Rules")
4021- update (ICU)/source/test/testdata/CollationTest_*.txt
4022  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4023  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
4024- check test file diffs for previously commented-out, known-failing data lines;
4025  probably need to keep those commented out
4026- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
4027- run makeuca.sh:
4028  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4029- rebuild ICU4C
4030- refresh ICU4J collation data:
4031  (subset of instructions above for properties data refresh, except copies all coll/*)
4032    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4033    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4034    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4035    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
4036- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
4037- note on intltest: if collate/UCAConformanceTest fails, then
4038  utility/MultithreadTest/TestCollators will fail as well;
4039  fix the conformance test before looking into the multi-thread test
4040
4041* When refreshing all of ICU4J data from ICU4C
4042- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4043- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4044or
4045- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4046
4047*** LayoutEngine script information
4048
4049(For details see the Unicode 5.2 change log below.)
4050
4051* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
4052  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
4053  in the working directory.
4054  (It also generates ScriptRunData.cpp, which is no longer needed.)
4055
4056  The generated files have a current copyright date and "@draft" statement.
4057
4058- diff current <icu>/source/layout files vs. generated ones
4059    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
4060  review and manually merge desired changes;
4061  fix gratuitous changes, incorrect @draft and missing aliases;
4062  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4063- if you just copy the above files, then
4064  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
4065  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4066
4067*** merge the Unicode update branches back onto the trunk
4068- do not merge the icudata.jar and testdata.jar,
4069  instead rebuild them from merged & tested ICU4C
4070
4071---------------------------------------------------------------------------- ***
4072
4073ICU 4.8 (no Unicode update, just new script codes)
4074
4075* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4076  (added 2010-12-21)
4077    Afak    439     Afaka
4078    Jurc    510     Jurchen
4079    Mroo    199     Mro, Mru
4080    Nshu    499     Nüshu
4081    Shrd    319     Sharada, Śāradā
4082    Sora    398     Sora Sompeng
4083    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
4084    Tang    520     Tangut
4085    Wole    480     Woleai
4086  -> uscript.h
4087  -> com.ibm.icu.lang.UScript
4088    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4089    replace  public static final int \1 = \2;\3
4090  -> genpname/SyntheticPropertyValueAliases.txt
4091  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4092      and in com.ibm.icu.dev.test.lang.TestUScript.java
4093
4094* run genpname/preparse.pl (on Linux)
4095  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4096  + make sure that data.h is writable
4097  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4098  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4099
4100* rebuild Unicode tools (at least genpname) using make
4101- You might first need to "make install" ICU so that the tools build can pick
4102  up the new definitions from the installed header files.
4103
4104* run genpname
4105  (builds both pnames.icu and propname_data.h)
4106- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4107- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
4108- rebuild ICU & tools
4109
4110* run genprops
4111- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
4112- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
4113- rebuild ICU & tools
4114
4115* update Java data files
4116- refresh just the UCD-related files, just to be safe
4117- see (ICU4C)/source/data/icu4j-readme.txt
4118- mkdir /tmp/icu4j
4119- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4120- copy the big-endian Unicode data files to another location,
4121  separate from the other data files
4122    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4123    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4124    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4125- refresh ICU4J
4126    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
4127
4128* should have updated the layout engine script codes but forgot
4129
4130---------------------------------------------------------------------------- ***
4131
4132Unicode 6.0 update
4133
4134*** related ICU Trac tickets
4135
41367264 Unicode 6.0 Update
4137
4138*** Unicode version numbers
4139- makedata.mak
4140- uchar.h
4141  (configure.in & configure: have been modified to extract the version from uchar.h)
4142- com.ibm.icu.util.VersionInfo
4143
4144*** data files & enums & parser code
4145
4146* file preparation
4147
4148~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
4149- This now prepares both unidata and testdata files in respective output subfolders.
4150
4151* PropertyAliases.txt changes
4152- new Script_Extensions property defined in the new ScriptExtensions.txt file
4153  but not listed in PropertyAliases.txt; reported to unicode.org;
4154  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
4155    scx; Script_Extensions
4156  -> uchar.h with new UProperty section
4157  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
4158
4159* PropertyValueAliases.txt changes
4160- 12 new block names:
4161  Alchemical_Symbols
4162  Bamum_Supplement
4163  Batak
4164  Brahmi
4165  CJK_Unified_Ideographs_Extension_D
4166  Emoticons
4167  Ethiopic_Extended_A
4168  Kana_Supplement
4169  Mandaic
4170  Miscellaneous_Symbols_And_Pictographs
4171  Playing_Cards
4172  Transport_And_Map_Symbols
4173  -> add to uchar.h
4174  -> add to UCharacter.UnicodeBlock
4175    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
4176            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4177- Joining_Group (jg) values:
4178  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
4179  -> uchar.h & UCharacter.JoiningGroup
4180- 3 new scripts:
4181  sc ; Batk      ; Batak
4182  sc ; Brah      ; Brahmi
4183  sc ; Mand      ; Mandaic
4184  -> remove these from SyntheticPropertyValueAliases.txt
4185  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
4186  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
4187      and in com.ibm.icu.dev.test.lang.TestUScript.java
4188- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4189  (added 2009-11-11..2010-07-18)
4190  Bass        259     Bassa Vah
4191  Dupl        755     Duployan shortand
4192  Elba        226     Elbasan
4193  Gran        343     Grantha
4194  Kpel        436     Kpelle
4195  Loma        437     Loma
4196  Mend        438     Mende
4197  Merc        101     Meroitic Cursive
4198  Narb        106     Old North Arabian
4199  Nbat        159     Nabataean
4200  Palm        126     Palmyrene
4201  Sind        318     Sindhi
4202  Wara        262     Warang Citi
4203  -> uscript.h
4204  -> com.ibm.icu.lang.UScript
4205    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4206    replace  public static final int \1 = \2;\3
4207  -> SyntheticPropertyValueAliases.txt
4208  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4209      and in com.ibm.icu.dev.test.lang.TestUScript.java
4210- ISO 15924 name change
4211  Mero        100     Meroitic Hieroglyphs (was Meroitic)
4212  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
4213- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
4214
4215* UnicodeData.txt changes
4216- new CJK block:
4217  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
4218  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
4219  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
4220
4221* build Unicode tools using CMake+make
4222
4223* run genpname/preparse.pl (on Linux)
4224  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4225  + make sure that data.h is writable
4226  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4227  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4228
4229* rebuild Unicode tools (at least genpname) using make
4230- You might first need to "make install" ICU so that the tools build can pick
4231  up the new definitions from the installed header files.
4232
4233* run genpname
4234- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4235- rebuild ICU & tools
4236
4237* update source/data/unidata/norm2/nfkc_cf.txt
4238- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
4239
4240* update source/data/unidata/norm2/uts46.txt
4241- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
4242  to ~/svn.icu/tools/trunk/src/unicode/py
4243- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
4244- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
4245- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
4246
4247* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4248  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4249- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4250- Unicode 6.0: U+2260, U+226E, U+226F
4251
4252* generate core properties data files
4253- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4254- rebuild ICU & tools
4255- run makeuca.sh so that genuca picks up the new nfc.nrm:
4256  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4257- rebuild ICU & tools
4258
4259* implement new Script_Extensions property (provisional)
4260- parser & generator: genprops & uprops.icu
4261- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
4262- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
4263
4264* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
4265- (one-time change)
4266- genbidi/gencase/genprops tools changes
4267- re-run makeprops.sh (see above)
4268- UCharacterProperty.java, UCharacterTypeIterator.java,
4269  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
4270  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
4271
4272* update Java data files
4273- refresh just the UCD-related files, just to be safe
4274- see (ICU4C)/source/data/icu4j-readme.txt
4275- mkdir /tmp/icu4j
4276- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4277  output:
4278    ...
4279    Unicode .icu files built to ./out/build/icudt45l
4280    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
4281    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4282    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
4283    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
4284    mkdir -p /tmp/icu4j/main/shared/data
4285    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4286- copy the big-endian Unicode data files to another location,
4287  separate from the other data files
4288    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4289    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
4290    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
4291    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
4292    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
4293    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4294    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
4295- refresh ICU4J
4296    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4297
4298* refresh Java test .txt files
4299- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4300
4301* un-hardcode normalization skippable (NF*_Inert) test data
4302- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
4303
4304* copy updated break iterator test files
4305- now handled by early ucdcopy.py and
4306  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
4307  (old instructions:
4308   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
4309   to ~/svn.icu/trunk/src/source/test/testdata)
4310- they are not used in ICU4J
4311
4312* UCA
4313
4314- get output from Mark's tools; look in
4315    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
4316    http://www.macchiato.com/unicode/utc/additional-uca-files
4317    http://www.unicode.org/Public/UCA/6.0.0/
4318    http://www.unicode.org/~mdavis/uca/
4319- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4320- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4321- update Han-implicit ranges for new CJK extensions:
4322  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4323- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
4324  do not add it into invuca so that tailoring primary-after an ignorable works
4325- genuca: permit space between [variable top] bytes
4326- ucol.cpp: treat noncharacters like unassigned rather than ignorable
4327- run makeuca.sh:
4328  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4329- rebuild ICU4C
4330- refresh ICU4J collation data:
4331  (subset of instructions above for properties data refresh, except copies all coll/*)
4332    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4333    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4334    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4335    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4336- update (ICU)/source/test/testdata/CollationTest_*.txt
4337  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4338  with output from Mark's Unicode tools
4339- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4340- note on intltest: if collate/UCAConformanceTest fails, then
4341  utility/MultithreadTest/TestCollators will fail as well;
4342  fix the conformance test before looking into the multi-thread test
4343
4344* When refreshing all of ICU4J data from ICU4C
4345- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4346- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4347or
4348- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4349
4350*** LayoutEngine script information
4351
4352(For details see the Unicode 5.2 change log below.)
4353
4354* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4355ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4356ScriptRunData.cpp, which is no longer needed.)
4357
4358The generated files have a current copyright date and "@draft" statement.
4359
4360* copy the above files into <icu>/source/layout, replacing the old files.
4361* fix mixed line endings
4362* review the diffs and fix incorrect @draft and missing aliases;
4363  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4364* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4365
4366---------------------------------------------------------------------------- ***
4367
4368Unicode 5.2 update
4369
4370*** related ICU Trac tickets
4371
43727084 Unicode 5.2
4373
43747167 verify collation bytes
43757235 Java test NAME_ALIAS
43767236 Java DerivedCoreProperties.txt test
43777237 Java BidiTest.txt
43787238 UTrie2 in core unidata
43797239 test for tailoring gaps
43807240 Java fix CollationMiscTest
43817243 update layout engine for Unicode 5.2
4382
4383*** Unicode version numbers
4384- makedata.mak
4385- uchar.h
4386- configure.in & configure
4387- update ucdVersion in gennames.c if an algorithmic range changes
4388
4389*** data files & enums & parser code
4390
4391* file preparation
4392
4393python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
4394- includes finding files regardless of version numbers,
4395  copying them, and performing the equivalent processing of the
4396  ucdstrip and ucdmerge tools on the desired set of files
4397
4398* notes on changes
4399- PropertyAliases.txt
4400  moved from numeric to enumerated:
4401    ccc       ; Canonical_Combining_Class
4402  new string properties:
4403    NFKC_CF   ; NFKC_Casefold
4404    Name_Alias; Name_Alias
4405  new binary properties:
4406    Cased     ; Cased
4407    CI        ; Case_Ignorable
4408    CWCF      ; Changes_When_Casefolded
4409    CWCM      ; Changes_When_Casemapped
4410    CWKCF     ; Changes_When_NFKC_Casefolded
4411    CWL       ; Changes_When_Lowercased
4412    CWT       ; Changes_When_Titlecased
4413    CWU       ; Changes_When_Uppercased
4414  new CJK Unihan properties (not supported by ICU)
4415- PropertyValueAliases.txt
4416  new block names
4417  new scripts
4418  one script code change:
4419    sc ; Qaai      ; Inherited
4420    ->
4421    sc ; Zinh      ; Inherited                        ; Qaai
4422  new Line_Break (lb) value:
4423    lb ; CP        ; Close_Parenthesis
4424  new Joining_Group (jg) values: Farsi_Yeh, Nya
4425  other new values:
4426    ccc; 214; ATA  ; Attached_Above
4427- DerivedBidiClass.txt
4428  new default-R range: U+1E800 - U+1EFFF
4429- UnicodeData.txt
4430  all of the ISO comments are gone
4431  new CJK block end:
4432    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
4433  new CJK block:
4434    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
4435    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
4436
4437* genpname
4438- run preparse.pl
4439  + cd \svn\icuproj\icu\trunk\source\tools\genpname
4440  + make sure that data.h is writable
4441  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
4442  + preparse.pl complains with errors like the following:
4443      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
4444    This is because ICU 4.0 had scripts from ISO 15924 which are now
4445    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
4446    and PropertyValueAliases.txt.
4447    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4448       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
4449  + preparse.pl complains with errors about block names missing from uchar.h; add them
4450
4451* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4452- new block & script values
4453  + 26 new blocks
4454    copy new blocks from Blocks.txt
4455    MS VC++ 2008 regular expression:
4456      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
4457      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
4458  + several new script values already added in ICU 4.0 for ISO 15924 coverage
4459    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
4460  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
4461  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
4462    (added to SyntheticPropertyValueAliases.txt)
4463- new Joining Group (JG) values: Farsi_Yeh, Nya
4464- new Line_Break (lb) value:
4465    lb ; CP        ; Close_Parenthesis
4466
4467* hardcoded Unihan range end/limit
4468- Unihan range end moves from 9FC3 to 9FCB
4469  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
4470  + do change gennames.c
4471
4472* Compare definitions of new binary properties with what we used to use
4473  in algorithms, to see if the definitions changed.
4474- Verified that definitions for Cased and Case_Ignorable are unchanged.
4475  The gencase tool now parses the newly public Case_Ignorable values
4476  in case the definition changes in the future.
4477
4478* uchar.c & uprops.h & uprops.c & genprops
4479- new numeric values that didn't exist in Unicode data before:
4480    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
4481  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
4482  therefore redesign the encoding of numeric types and values for formatVersion 6;
4483  design for simple numbers up to at least 144 ("one gross"),
4484  large values up to at least 10^20,
4485  and fractions with numerators -1..17 and denominators 1..16
4486  to cover current and expected future values
4487  (e.g., more Han numeric values, Meroitic twelfths)
4488
4489* reimplement Hangul_Syllable_Type for new Jamo characters
4490- the old code assumed that all Jamo characters are in the 11xx block
4491- Unicode 5.2 fills holes there and adds new Jamo characters in
4492    A960..A97F; Hangul Jamo Extended-A
4493  and in
4494    D7B0..D7FF; Hangul Jamo Extended-B
4495- Hangul_Syllable_Type can be trivially derived from a subset of
4496  Grapheme_Cluster_Break values
4497
4498* build Unicode data source code for hardcoding core data
4499C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
4500
4501ICU data make path is \svn\icuproj\icu\trunk\source\data\
4502ICU root path is \svn\icuproj\icu\trunk
4503Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4504Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4505Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4506Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4507Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4508Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4509Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4510Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
4511Creating data file for Unicode Property Names
4512Creating data file for Unicode Character Properties
4513Creating data file for Unicode Case Mapping Properties
4514Creating data file for Unicode BiDi/Shaping Properties
4515Creating data file for Unicode Normalization
4516Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
4517Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
4518
4519- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
4520  and rebuild the common library
4521
4522*** UCA
4523
4524- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
4525- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
4526- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
4527[ Begin obsolete instructions:
4528  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
4529    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
4530      on Windows:
4531        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
4532        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
4533  End obsolete instructions]
4534- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4535  not just the *_STUB.txt files
4536- note on intltest: if collate/UCAConformanceTest fails, then
4537  utility/MultithreadTest/TestCollators will fail as well;
4538  fix the conformance test before looking into the multi-thread test
4539
4540*** Implement Cased & Case_Ignorable properties
4541- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
4542- Problem: These properties should be disjoint, but aren't
4543- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
4544- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
4545
4546*** Implement Changes_When_Xyz properties
4547- without stored data
4548
4549*** Implement Name_Alias property
4550- add it as another name field in unames.icu
4551- make it available via u_charName() and UCharNameChoice and
4552- consider it in u_charFromName()
4553
4554*** Break iterators
4555
4556* Update break iterator rules to new UAX versions and new property values
4557* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
4558
4559*** new BidiTest file
4560- review format and data
4561- copy BidiTest.txt to source/test/testdata
4562- write test code using this data
4563- fix ICU code where it fails the conformance test
4564
4565*** Java
4566- generally, find and update code corresponding to C/C++
4567- UCharacter.UnicodeBlock constants:
4568  a) add an _ID integer per new block, update COUNT
4569  b) add a class instance per new block
4570     Visual Studio regex:
4571        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
4572        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4573- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
4574
4575- port test changes to Java
4576
4577*** LayoutEngine script information
4578
4579(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
4580
4581* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4582ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4583ScriptRunData.cpp, which is no longer needed.)
4584
4585The generated files have a current copyright date and "@draft" statement.
4586
4587-> Eric Mader wrote in email on 20090930:
4588    "I think the tool has been modified to update @draft to @stable for
4589     older scripts and to add @draft for new scripts.
4590     (I worked with an intern on this last year.)
4591     You should check the output after you run it."
4592
4593* copy the above files into <icu>/source/layout, replacing the old files.
4594* fix mixed line endings
4595* review the diffs and fix incorrect @draft and missing aliases
4596* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4597
4598Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4599and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4600
4601-> Eric Mader wrote in email on 20090930:
4602    "This is just a matter of making sure that all the per-script tables have
4603     entries for any new scripts that were added.
4604     If any new Indic characters were added, then the class tables in
4605     IndicClassTables.cpp should be updated to reflect this.
4606     John Emmons should know how to do this if it's required."
4607
4608* rebuild the layout and layoutex libraries.
4609
4610*** Documentation
4611- Update User Guide
4612  + Jamo_Short_Name, sfc->scf, binary property value aliases
4613
4614---------------------------------------------------------------------------- ***
4615
4616Unicode 5.1 update
4617
4618*** related ICU Trac tickets
4619
46205696 Update to Unicode 5.1
4621
4622*** Unicode version numbers
4623- makedata.mak
4624- uchar.h
4625- configure.in & configure
4626- update ucdVersion in gennames.c if an algorithmic range changes
4627
4628*** data files & enums & parser code
4629
4630* file preparation
4631- ucdstrip:
4632    DerivedCoreProperties.txt
4633    DerivedNormalizationProps.txt
4634    NormalizationTest.txt
4635    PropList.txt
4636    Scripts.txt
4637    GraphemeBreakProperty.txt
4638    SentenceBreakProperty.txt
4639    WordBreakProperty.txt
4640- ucdstrip and ucdmerge:
4641    EastAsianWidth.txt
4642    LineBreak.txt
4643
4644* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4645copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
4646copy 5.1.0\ucd\Blocks.txt ..\unidata\
4647copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
4648copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
4649copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4650copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4651copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4652copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4653copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
4654copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
4655copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
4656copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
4657copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
4658
4659ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4660ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4661ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4662ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
4663ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4664ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4665ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4666ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4667ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4668ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4669
4670* genpname
4671- run preparse.pl
4672  + cd \svn\icuproj\icu\uni51\source\tools\genpname
4673  + make sure that data.h is writable
4674  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
4675  + preparse.pl complains with errors like the following:
4676      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
4677    This is because ICU 3.8 had scripts from ISO 15924 which are now
4678    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
4679    and PropertyValueAliases.txt.
4680    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4681       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
4682  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
4683      N/Y, No/Yes, F/T, False/True
4684    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
4685       It will use further values from the file if present.
4686
4687* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4688- new block & script values
4689  + 17 new blocks
4690  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
4691    (removed from SyntheticPropertyValueAliases.txt)
4692  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
4693    (added to SyntheticPropertyValueAliases.txt)
4694- uprops.icu (uprops.h) only provides 7 bits for script codes.
4695  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
4696  There is none above 127 yet which is the script code for an
4697  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
4698  script code values greater than 127.
4699  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
4700  in a parallel bit field, and that overflows now.
4701  Also, future values >=128 would be incompatible anyway.
4702  uprops.h is modified to move around several of the bit fields
4703  in the properties vector words, and now uses 8 bits for the script code.
4704  Two other bit fields also grow to accommodate future growth:
4705  Block (current count: 172) grows from 8 to 9 bits,
4706  and Word_Break grows from 4 to 5 bits.
4707- renamed property Simple_Case_Folding (sfc->scf)
4708  + nothing to be done: handled as normal alias
4709- new property JSN Jamo_Short_Name
4710  + no new API: only contributes to the Name property
4711- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
4712- new Joining Group (JG) value: Burushashki_Yeh_Barree
4713- new Sentence_Break (SB) values:
4714    SB ; CR        ; CR
4715    SB ; EX        ; Extend
4716    SB ; LF        ; LF
4717    SB ; SC        ; SContinue
4718- new Word_Break (WB) values:
4719    WB ; CR        ; CR
4720    WB ; Extend    ; Extend
4721    WB ; LF        ; LF
4722    WB ; MB        ; MidNumLet
4723
4724* Further changes in the 2008-02-29 update:
4725- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
4726  because they should not normally be invisible.
4727- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
4728- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
4729- new Word_Break (WB) value: NL=Newline
4730
4731* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
4732- Unihan range end moves from 9FBB to 9FC3
4733  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
4734  + do change gennames.c
4735
4736* build Unicode data source code for hardcoding core data
4737C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
4738
4739ICU data make path is \svn\icuproj\icu\uni51\source\data\
4740ICU root path is \svn\icuproj\icu\uni51
4741Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4742Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4743Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4744Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4745Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4746Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4747Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4748Creating data file for Unicode Character Properties
4749Creating data file for Unicode Case Mapping Properties
4750Creating data file for Unicode BiDi/Shaping Properties
4751Creating data file for Unicode Normalization
4752Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
4753Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
4754
4755- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
4756  and rebuild the common library
4757
4758*** Break iterators
4759
4760* Update break iterator rules to new UAX versions and new property values
4761
4762*** UCA
4763
4764* update FractionalUCA.txt and UCARules.txt with new canonical closure
4765
4766*** Test suites
4767- Test that APIs using Unicode property value aliases (like UnicodeSet)
4768  support all of the boolean values N/Y, No/Yes, F/T, False/True
4769  -> TestBinaryValues() tests in both cintltst and intltest
4770
4771*** LayoutEngine script information
4772* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4773ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4774ScriptRunData.cpp, which is no longer needed.)
4775
4776The generated files have a current copyright date and "@draft" statement.
4777
4778* copy the above files into <icu>/source/layout, replacing the old files.
4779
4780Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4781and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4782
4783* rebuild the layout and layoutex libraries.
4784
4785*** Documentation
4786- Update User Guide
4787  + Jamo_Short_Name, sfc->scf, binary property value aliases
4788
4789---------------------------------------------------------------------------- ***
4790
4791Unicode 5.0 update
4792
4793*** related Jitterbugs
4794
47955084 RFE: Update to Unicode 5.0
4796
4797*** data files & enums & parser code
4798
4799* file preparation
4800- ucdstrip:
4801    DerivedCoreProperties.txt
4802    DerivedNormalizationProps.txt
4803    NormalizationTest.txt
4804    PropList.txt
4805    Scripts.txt
4806    GraphemeBreakProperty.txt
4807    SentenceBreakProperty.txt
4808    WordBreakProperty.txt
4809- ucdstrip and ucdmerge:
4810    EastAsianWidth.txt
4811    LineBreak.txt
4812
4813* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4814copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
4815copy 5.0.0\ucd\Blocks.txt ..\unidata\
4816copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
4817copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
4818copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4819copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4820copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4821copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4822copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
4823copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
4824copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
4825copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
4826copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
4827
4828ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4829ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4830ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4831ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
4832ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4833ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4834ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4835ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4836ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4837ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4838
4839* update FractionalUCA.txt and UCARules.txt with new canonical closure
4840
4841* genpname
4842- run preparse.pl
4843  + make sure that data.h is writable
4844  + perl preparse.pl \cvs\oss\icu > out.txt
4845
4846* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4847- new block & script values
4848  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
4849
4850* build Unicode data source code for hardcoding core data
4851C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
4852
4853ICU data make path is \cvs\oss\icu\source\data\
4854ICU root path is \cvs\oss\icu
4855Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4856[etc.]
4857Creating data file for Unicode Character Properties
4858Creating data file for Unicode Case Mapping Properties
4859Creating data file for Unicode BiDi/Shaping Properties
4860Creating data file for Unicode Normalization
4861Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
4862Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
4863
4864- copy the .c source files to C:\cvs\oss\icu\source\common
4865  and rebuild the common library
4866
4867*** Unicode version numbers
4868- makedata.mak
4869- uchar.h
4870- configure.in
4871
4872*** LayoutEngine script information
4873* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4874ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4875ScriptRunData.cpp, which is no longer needed.)
4876
4877The generated files have a current copyright date and "@draft" statement.
4878
4879* copy the above files into <icu>/source/layout, replacing the old files.
4880
4881Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4882and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4883
4884* rebuild the layout and layoutex libraries.
4885
4886---------------------------------------------------------------------------- ***
4887
4888Unicode 4.1 update
4889
4890*** related Jitterbugs
4891
48924332 RFE: Update to Unicode 4.1
48934157 RBBI, TR29 4.1 updates
4894
4895*** data files & enums & parser code
4896
4897* file preparation
4898- ucdstrip:
4899    DerivedCoreProperties.txt
4900    DerivedNormalizationProps.txt
4901    NormalizationTest.txt
4902    GraphemeBreakProperty.txt
4903    SentenceBreakProperty.txt
4904    WordBreakProperty.txt
4905- ucdstrip and ucdmerge:
4906    EastAsianWidth.txt
4907    LineBreak.txt
4908
4909* add new files to the repository
4910    GraphemeBreakProperty.txt
4911    SentenceBreakProperty.txt
4912    WordBreakProperty.txt
4913
4914* update FractionalUCA.txt and UCARules.txt with new canonical closure
4915
4916* genpname
4917- handle new enumerated properties in sub read_uchar
4918- run preparse.pl
4919
4920* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4921- new binary properties
4922  + Pattern_Syntax
4923  + Pattern_White_Space
4924- new enumerated properties
4925  + Grapheme_Cluster_Break
4926  + Sentence_Break
4927  + Word_Break
4928- new block & script & line break values
4929
4930* gencase
4931- case-ignorable changes
4932  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4933  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
4934
4935*** Unicode version numbers
4936- makedata.mak
4937- uchar.h
4938- configure.in
4939
4940*** tests
4941- verify that u_charMirror() round-trips
4942- test all new properties and some new values of old properties
4943
4944*** other code
4945
4946* hardcoded Unihan range end/limit
4947- Unihan range end moves from 9FA5 to 9FBB
4948  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
4949  + do not modify BOCU/BOCSU code because that would change the encoding
4950    and break binary compatibility!
4951  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
4952    NamePrepProfile.txt
4953  + ignore trietest.c: test data is arbitrary
4954  + ignore tstnorm.cpp: test optimization, not important
4955  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
4956  + do change line_th.txt and word_th.txt
4957    by replacing hardcoded ranges with the new property values
4958  + do change gennames.c
4959
4960source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4961source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4962source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
4963
4964* case mappings
4965- compare new special casing context conditions with previous ones
4966  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4967
4968* genpname
4969- consider storing only the short name if it is the same as the long name
4970
4971*** other reviews
4972- UAX #29 changes (grapheme/word/sentence breaks)
4973- UAX #14 changes (line breaks)
4974- Pattern_Syntax & Pattern_White_Space
4975
4976---------------------------------------------------------------------------- ***
4977
4978Unicode 4.0.1 update
4979
4980*** related Jitterbugs
4981
49823170 RFE: Update to Unicode 4.0.1
49833171 Add new Unicode 4.0.1 properties
49843520 use Unicode 4.0.1 updates for break iteration
4985
4986*** data files & enums & parser code
4987
4988* file preparation
4989- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
4990- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
4991
4992* file fixes
4993- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
4994  according to PRI #26
4995  http://www.unicode.org/review/resolved-pri.html#pri26
4996- undone again because no corrigendum in sight;
4997  instead modified tests to not check consistency on this for Unicode 4.0.1
4998
4999* ucdterms.txt
5000- update from http://www.unicode.org/copyright.html
5001  formatted for plain text
5002
5003* uchar.h & uprops.h & uprops.c & genprops
5004- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
5005- add U_LB_INSEPARABLE due to a spelling fix
5006  + put short name comment only on line with new constant
5007    for genpname perl script parser
5008- new binary properties
5009  + STerm
5010  + Variation_Selector
5011
5012* genpname
5013- fix genpname perl script so that it doesn't choke on more than 2 names per property value
5014- perl script: correctly calculate the maximum number of fields per row
5015
5016* uscript.h
5017- new script code Hrkt=Katakana_Or_Hiragana
5018
5019* gennorm.c track changes in DerivedNormalizationProps.txt
5020- "FNC" -> "FC_NFKC"
5021- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
5022
5023* genprops/props2.c track changes in DerivedNumericValues.txt
5024- changed from 3 columns to 2, dropping the numeric type
5025  + assume that the type is always numeric for Han characters,
5026    and that only those are added in addition to what UnicodeData.txt lists
5027
5028*** Unicode version numbers
5029- makedata.mak
5030- uchar.h
5031- configure.in
5032
5033*** tests
5034- update test of default bidi classes according to PRI #28
5035  /tsutil/cucdtst/TestUnicodeData
5036  http://www.unicode.org/review/resolved-pri.html#pri28
5037- bidi tests: change exemplar character for ES depending on Unicode version
5038- change hardcoded expected property values where they change
5039
5040*** other code
5041
5042* name matching
5043- read UCD.html
5044
5045* scripts
5046- use new Hrkt=Katakana_Or_Hiragana
5047
5048* ZWJ & ZWNJ
5049- are now part of combining character sequences
5050- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
5051