• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2016 and later: Unicode, Inc. and others.
2* License & terms of use: http://www.unicode.org/copyright.html
3* Copyright (C) 2004-2016, International Business Machines
4* Corporation and others.  All Rights Reserved.
5*
6*   file name:  changes.txt
7*   encoding:   US-ASCII
8*   tab size:   8 (not used)
9*   indentation:4
10*
11*   created on: 2004may06
12*   created by: Markus W. Scherer
13
14* change log for Unicode updates
15
16For an overview, see https://unicode-org.github.io/icu/processes/unicode-update
17
18Notes:
19
20This log includes several command lines as used in the update process.
21Some of them include a console prompt with the present working directory (pwd) followed by a $ sign.
22Use a console window that is set to that directory, or cd to there,
23and then paste the command that follows the $ sign.
24
25Most command lines use environment variables to make them more portable across versions
26and machine configurations. When you set up a console window, copy & paste the `export` commands
27from near the top of the current section before pasting tool command lines.
28Adjust the environment variables to the current version and your machine setup.
29(The command lines are currently as used on Linux.)
30
31---------------------------------------------------------------------------- ***
32
33* New ISO 15924 script codes
34
35Normally, add new script codes as part of a Unicode update.
36See https://unicode-org.github.io/icu/processes/release/tasks/standards#update-script-code-enums
37and see the change logs below.
38
39---------------------------------------------------------------------------- ***
40
41Unicode 15.0 update for ICU 72
42
43https://www.unicode.org/versions/Unicode15.0.0/
44https://www.unicode.org/versions/beta-15.0.0.html
45https://www.unicode.org/Public/15.0.0/ucd/
46https://www.unicode.org/reports/uax-proposed-updates.html
47https://www.unicode.org/reports/tr44/tr44-29.html
48
49https://unicode-org.atlassian.net/browse/ICU-21980 Unicode 15
50https://unicode-org.atlassian.net/browse/CLDR-15516 Unicode 15
51https://unicode-org.atlassian.net/browse/CLDR-15253 Unicode 15 script metadata (in CLDR 41)
52
53* Command-line environment setup
54
55export UNICODE_DATA=~/unidata/uni15/20220830
56export CLDR_SRC=~/cldr/uni/src
57export ICU_ROOT=~/icu/uni
58export ICU_SRC=$ICU_ROOT/src
59export ICUDT=icudt72b
60export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
61export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
62export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
63
64*** Unicode version numbers
65- makedata.mak
66- uchar.h
67- com.ibm.icu.util.VersionInfo
68- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
69
70- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
71    so that the makefiles see the new version number.
72  cd $ICU_ROOT/dbg/icu4c
73  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
74
75*** data files & enums & parser code
76
77* download files
78- same as for the early Unicode Tools setup and data refresh:
79  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
80  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
81- mkdir -p $UNICODE_DATA
82- download Unicode files into $UNICODE_DATA
83  + subfolders: emoji, idna, security, ucd, uca
84  + old way of fetching files: from the "Public" area on unicode.org
85    ~ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
86    ~ split Unihan into single-property files
87      ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
88  + new way of fetching files, if available:
89    copy the files from a Unicode Tools workspace that is up to date with
90    https://github.com/unicode-org/unicodetools
91    and which might at this point be *ahead* of "Public"
92    ~ before the Unicode release copy files from "dev" subfolders, for example
93      https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
94  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
95    or from the UCD/cldr/ output folder of the Unicode Tools:
96    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
97  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
98    or
99  cp ~/unitools/mine/Generated/UCD/15.0.0/cldr/GraphemeBreakTest-cldr.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
100
101* for manual diffs and for Unicode Tools input data updates:
102  remove version suffixes from the file names
103    ~$ unidata/desuffixucd.py $UNICODE_DATA
104  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
105
106* process and/or copy files
107- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
108  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
109  + For debugging, and tweaking how ppucd.txt is written,
110    the tool has an --only_ppucd option:
111    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
112
113- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
114
115* new constants for new property values
116- preparseucd.py error:
117    ValueError: missing uchar.h enum constants for some property values: [('blk', {'Nag_Mundari', 'CJK_Ext_H', 'Kawi', 'Kaktovik_Numerals', 'Devanagari_Ext_A', 'Arabic_Ext_C', 'Cyrillic_Ext_D'}), ('sc', {'Nagm', 'Kawi'})]
118  = PropertyValueAliases.txt new property values (diff old & new .txt files)
119    ~/unidata$ diff -u uni14/20210922/ucd/PropertyValueAliases.txt uni15/beta/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
120    +age; 15.0                             ; V15_0
121    +blk; Arabic_Ext_C                     ; Arabic_Extended_C
122    +blk; CJK_Ext_H                        ; CJK_Unified_Ideographs_Extension_H
123    +blk; Cyrillic_Ext_D                   ; Cyrillic_Extended_D
124    +blk; Devanagari_Ext_A                 ; Devanagari_Extended_A
125    +blk; Kaktovik_Numerals                ; Kaktovik_Numerals
126    +blk; Kawi                             ; Kawi
127    +blk; Nag_Mundari                      ; Nag_Mundari
128    +sc ; Kawi                             ; Kawi
129    +sc ; Nagm                             ; Nag_Mundari
130  -> add new blocks to uchar.h before UBLOCK_COUNT
131    use long property names for enum constants,
132    for the trailing comment get the block start code point: diff old & new Blocks.txt
133    ~/unidata$ diff -u uni14/20210922/ucd/Blocks.txt uni15/beta/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
134    +10EC0..10EFF; Arabic Extended-C
135    +11B00..11B5F; Devanagari Extended-A
136    +11F00..11F5F; Kawi
137    -13430..1343F; Egyptian Hieroglyph Format Controls
138    +13430..1345F; Egyptian Hieroglyph Format Controls
139    +1D2C0..1D2DF; Kaktovik Numerals
140    +1E030..1E08F; Cyrillic Extended-D
141    +1E4D0..1E4FF; Nag Mundari
142    +31350..323AF; CJK Unified Ideographs Extension H
143    (ignore blocks whose end code point changed)
144  -> add new blocks to UCharacter.UnicodeBlock IDs
145    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
146            replace  public static final int \1_ID = \2; \3
147  -> add new blocks to UCharacter.UnicodeBlock objects
148    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
149            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
150  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
151    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
152            replace  public static final int \1 = \2; \3
153  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
154      and in com.ibm.icu.dev.test.lang.TestUScript.java
155
156* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
157    (not strictly necessary for NOT_ENCODED scripts)
158  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
159
160* build ICU
161  to make sure that there are no syntax errors
162
163  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
164
165* update spoof checker UnicodeSet initializers:
166    inclusionPat & recommendedPat in i18n/uspoof.cpp
167    INCLUSION & RECOMMENDED in SpoofChecker.java
168- make sure that the Unicode Tools tree contains the latest security data files
169- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
170- run the tool (no special environment variables needed)
171- copy & paste from the Console output into the .cpp & .java files
172
173* Bazel build process
174
175See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
176for an overview and for setup instructions.
177
178Consider running `bazelisk --version` outside of the $ICU_SRC folder
179to find out the latest `bazel` version, and
180copying that version number into the $ICU_SRC/.bazeliskrc config file.
181(Revert if you find incompatibilities, or, better, update our build & config files.)
182
183* generate data files
184
185- remember to define the environment variables
186  (see the start of the section for this Unicode version)
187- cd $ICU_SRC
188- optional but not necessary:
189    bazelisk clean
190- build/bootstrap/generate new files:
191    icu4c/source/data/unidata/generate.sh
192
193* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
194  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
195- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
196    ~/unitools/mine/src$ grep disallowed_STD3_valid unicodetools/data/idna/dev/IdnaMappingTable.txt
197- Unicode 6.0..15.0: U+2260, U+226E, U+226F
198- nothing new in this Unicode version, no test file to update
199
200* run & fix ICU4C tests
201- Note: Some of the collation data and test data will be updated below,
202  so at this time we might get some collation test failures.
203  Ignore these for now.
204- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
205  (no rule changes in Unicode 15)
206- update CLDR GraphemeBreakTest.txt
207    cd ~/unitools/mine/Generated
208    cp UCD/15.0.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
209    cp UCD/15.0.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
210    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
211- Andy helps with RBBI & spoof check test failures
212
213* collation: CLDR collation root, UCA DUCET
214
215- UCA DUCET goes into Mark's Unicode tools,
216  and a tool-tailored version goes into CLDR, see
217    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
218
219- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
220    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
221- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
222    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
223    (note removing the underscore before "Rules")
224    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
225- restore TODO diffs in UCARules.txt
226    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
227- update (ICU4C)/source/test/testdata/CollationTest_*.txt
228  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
229  from the CLDR root files (..._CLDR_..._SHORT.txt)
230    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
231    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
232    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
233- if CLDR common/uca/unihan-index.txt changes, then update
234  CLDR common/collation/root.xml <collation type="private-unihan">
235  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
236
237- generate data files, as above (generate.sh), now to pick up new collation data
238- update CollationFCD.java:
239  copy & paste the initializers of lcccIndex[] etc. from
240    ICU4C/source/i18n/collationfcd.cpp to
241    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
242- rebuild ICU4C (make clean, make check, as usual)
243
244* Unihan collators
245    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
246- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
247  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
248- generate ICU zh collation data
249    instructions inspired by
250    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
251    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
252  + setup:
253    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
254        (didn't work without setting JAVA_HOME,
255         nor with the Google default of /usr/local/buildtools/java/jdk
256         [Google security limitations in the XML parser])
257    export TOOLS_ROOT=~/icu/uni/src/tools
258    export CLDR_DIR=~/cldr/uni/src
259    export CLDR_DATA_DIR=~/cldr/uni/src
260        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
261    cd "$TOOLS_ROOT/cldr/lib"
262    ./install-cldr-jars.sh "$CLDR_DIR"
263  + generate the files we need
264    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
265    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
266  + diff
267    cd $ICU_SRC
268    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
269    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
270  + copy into the source tree
271    cd $ICU_SRC
272    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
273    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
274- rebuild ICU4C
275
276* run & fix ICU4C tests, now with new CLDR collation root data
277- run all tests with the collation test data *_SHORT.txt or the full files
278  (the full ones have comments, useful for debugging)
279- note on intltest: if collate/UCAConformanceTest fails, then
280  utility/MultithreadTest/TestCollators will fail as well;
281  fix the conformance test before looking into the multi-thread test
282
283* update Java data files
284- refresh just the UCD/UCA-related/derived files, just to be safe
285- see (ICU4C)/source/data/icu4j-readme.txt
286- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
287- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
288    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
289    you need to reconfigure with unicore data; see the "configure" line above.
290  output:
291    ...
292    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
293    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt72b
294    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt72b
295    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt72l.dat ./out/icu4j/icudt72b.dat -s ./out/build/icudt72l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt72b
296    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt72b"
297    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt72b/
298    mkdir -p /tmp/icu4j/main/shared/data
299    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
300    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt72b/
301    mkdir -p /tmp/icu4j/main/shared/data
302    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
303    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
304- copy the big-endian Unicode data files to another location,
305  separate from the other data files,
306  and then refresh ICU4J
307    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
308    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
309    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
310    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
311    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
312    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
313    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
314    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
315    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
316    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
317
318* When refreshing all of ICU4J data from ICU4C
319- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
320- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
321or
322- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
323
324* refresh Java test .txt files
325- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
326    cd $ICU_SRC/icu4c/source/data/unidata
327    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
328    cd ../../test/testdata
329    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
330    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
331
332* run & fix ICU4J tests
333
334*** API additions
335- send notice to icu-design about new born-@stable API (enum constants etc.)
336
337*** CLDR numbering systems
338- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
339  for example:
340    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
341    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.txt
342    ~/icu/uni/src$ diff -u /tmp/icu/nv4-14.txt /tmp/icu/nv4-15.txt
343    -->
344    +cp;11F54;-Alpha;gc=Nd;InSC=Number;lb=NU;na=KAWI DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
345    +cp;1E4F4;-Alpha;gc=Nd;-IDS;lb=NU;na=NAG MUNDARI DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
346  or:
347    ~/unitools/mine/src$ diff -u unicodetools/data/ucd/14.0.0-Update/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
348    -->
349    +11F50..11F59  ; Nd #  [10] KAWI DIGIT ZERO..KAWI DIGIT NINE
350    +1E4F0..1E4F9  ; Nd #  [10] NAG MUNDARI DIGIT ZERO..NAG MUNDARI DIGIT NINE
351  Unicode 15:
352    kawi 11F50..11F59 Kawi
353    nagm 1E4F0..1E4F9 Nag Mundari
354    https://github.com/unicode-org/cldr/pull/2041
355
356*** merge the Unicode update branches back onto the trunk
357- do not merge the icudata.jar and testdata.jar,
358  instead rebuild them from merged & tested ICU4C
359- if there is a merge conflict in icudata.jar, here is one way to deal with it:
360  +   remove icudata.jar from the commit so that rebasing is trivial
361  + ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
362  + ~/icu/uni/src$ git commit -a --amend
363  +   switch to main, pull updates, switch back to the dev branch
364  + ~/icu/uni/src$ git rebase main
365  +   rebuild icudata.jar
366  + ~/icu/uni/src$ git commit -a --amend
367  + ~/icu/uni/src$ git push -f
368- make sure that changes to Unicode tools are checked in:
369  https://github.com/unicode-org/unicodetools
370
371---------------------------------------------------------------------------- ***
372
373Unicode 14.0 update for ICU 70
374
375https://www.unicode.org/versions/Unicode14.0.0/
376https://www.unicode.org/versions/beta-14.0.0.html
377https://www.unicode.org/Public/14.0.0/ucd/
378https://www.unicode.org/reports/uax-proposed-updates.html
379https://www.unicode.org/reports/tr44/tr44-27.html
380
381https://unicode-org.atlassian.net/browse/CLDR-14801
382https://unicode-org.atlassian.net/browse/ICU-21635
383
384* Command-line environment setup
385
386export UNICODE_DATA=~/unidata/uni14/20210903
387export CLDR_SRC=~/cldr/uni/src
388export ICU_ROOT=~/icu/uni
389export ICU_SRC=$ICU_ROOT/src
390export ICUDT=icudt70b
391export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
392export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
393export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
394
395*** Unicode version numbers
396- makedata.mak
397- uchar.h
398- com.ibm.icu.util.VersionInfo
399- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
400
401- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
402    so that the makefiles see the new version number.
403  cd $ICU_ROOT/dbg/icu4c
404  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
405
406*** data files & enums & parser code
407
408* download files
409- same as for the early Unicode Tools setup and data refresh:
410  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
411  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
412- mkdir -p $UNICODE_DATA
413- download Unicode files into $UNICODE_DATA
414  + subfolders: emoji, idna, security, ucd, uca
415  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
416  + split Unihan into single-property files
417    ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
418  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
419    or from the UCD/cldr/ output folder of the Unicode Tools:
420    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
421  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
422    or
423  cp ~/unitools/mine/Generated/UCD/d19/cldr/GraphemeBreakTest-cldr-14.0.0d19.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
424
425* for manual diffs and for Unicode Tools input data updates:
426  remove version suffixes from the file names
427    ~$ unidata/desuffixucd.py $UNICODE_DATA
428  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
429
430* process and/or copy files
431- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
432  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
433  + For debugging, and tweaking how ppucd.txt is written,
434    the tool has an --only_ppucd option:
435    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
436
437- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
438
439* new constants for new property values
440- preparseucd.py error:
441    ValueError: missing uchar.h enum constants for some property values:
442    [(u'blk', set([u'Toto', u'Tangsa', u'Cypro_Minoan', u'Arabic_Ext_B', u'Vithkuqi', u'Old_Uyghur', u'Latin_Ext_F', u'UCAS_Ext_A', u'Kana_Ext_B', u'Ethiopic_Ext_B', u'Latin_Ext_G', u'Znamenny_Music'])),
443    (u'jg', set([u'Vertical_Tail', u'Thin_Yeh'])),
444    (u'sc', set([u'Toto', u'Ougr', u'Vith', u'Tnsa', u'Cpmn']))]
445  = PropertyValueAliases.txt new property values (diff old & new .txt files)
446    ~/unidata$ diff -u uni13/20200304/ucd/PropertyValueAliases.txt uni14/20210609/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
447    +age; 14.0                             ; V14_0
448    +blk; Arabic_Ext_B                     ; Arabic_Extended_B
449    +blk; Cypro_Minoan                     ; Cypro_Minoan
450    +blk; Ethiopic_Ext_B                   ; Ethiopic_Extended_B
451    +blk; Kana_Ext_B                       ; Kana_Extended_B
452    +blk; Latin_Ext_F                      ; Latin_Extended_F
453    +blk; Latin_Ext_G                      ; Latin_Extended_G
454    +blk; Old_Uyghur                       ; Old_Uyghur
455    +blk; Tangsa                           ; Tangsa
456    +blk; Toto                             ; Toto
457    +blk; UCAS_Ext_A                       ; Unified_Canadian_Aboriginal_Syllabics_Extended_A
458    +blk; Vithkuqi                         ; Vithkuqi
459    +blk; Znamenny_Music                   ; Znamenny_Musical_Notation
460    +jg ; Thin_Yeh                         ; Thin_Yeh
461    +jg ; Vertical_Tail                    ; Vertical_Tail
462    +sc ; Cpmn                             ; Cypro_Minoan
463    +sc ; Ougr                             ; Old_Uyghur
464    +sc ; Tnsa                             ; Tangsa
465    +sc ; Toto                             ; Toto
466    +sc ; Vith                             ; Vithkuqi
467  -> add new blocks to uchar.h before UBLOCK_COUNT
468    use long property names for enum constants,
469    for the trailing comment get the block start code point: diff old & new Blocks.txt
470    ~/unidata$ diff -u uni13/20200304/ucd/Blocks.txt uni14/20210609/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
471    +0870..089F; Arabic Extended-B
472    +10570..105BF; Vithkuqi
473    +10780..107BF; Latin Extended-F
474    +10F70..10FAF; Old Uyghur
475    -11700..1173F; Ahom
476    +11700..1174F; Ahom
477    +11AB0..11ABF; Unified Canadian Aboriginal Syllabics Extended-A
478    +12F90..12FFF; Cypro-Minoan
479    +16A70..16ACF; Tangsa
480    -18D00..18D8F; Tangut Supplement
481    +18D00..18D7F; Tangut Supplement
482    +1AFF0..1AFFF; Kana Extended-B
483    +1CF00..1CFCF; Znamenny Musical Notation
484    +1DF00..1DFFF; Latin Extended-G
485    +1E290..1E2BF; Toto
486    +1E7E0..1E7FF; Ethiopic Extended-B
487    (ignore blocks whose end code point changed)
488  -> add new blocks to UCharacter.UnicodeBlock IDs
489    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
490            replace  public static final int \1_ID = \2; \3
491  -> add new blocks to UCharacter.UnicodeBlock objects
492    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
493            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
494  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
495    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
496            replace  public static final int \1 = \2; \3
497  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
498      and in com.ibm.icu.dev.test.lang.TestUScript.java
499  -> add new joining groups to uchar.h & UCharacter.JoiningGroup
500
501* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
502    (not strictly necessary for NOT_ENCODED scripts)
503  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
504
505* build ICU
506  to make sure that there are no syntax errors
507
508  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
509
510* update spoof checker UnicodeSet initializers:
511    inclusionPat & recommendedPat in i18n/uspoof.cpp
512    INCLUSION & RECOMMENDED in SpoofChecker.java
513- make sure that the Unicode Tools tree contains the latest security data files
514- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
515- run the tool (no special environment variables needed)
516- copy & paste from the Console output into the .cpp & .java files
517
518* Bazel build process
519
520See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
521for an overview and for setup instructions.
522
523Consider running `bazelisk --version` outside of the $ICU_SRC folder
524to find out the latest `bazel` version, and
525copying that version number into the $ICU_SRC/.bazeliskrc config file.
526(Revert if you find incompatibilities, or, better, update our build & config files.)
527
528* generate data files
529
530- remember to define the environment variables
531  (see the start of the section for this Unicode version)
532- cd $ICU_SRC
533- optional but not necessary:
534    bazelisk clean
535- build/bootstrap/generate new files:
536    icu4c/source/data/unidata/generate.sh
537
538* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
539  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
540- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
541- Unicode 6.0..14.0: U+2260, U+226E, U+226F
542- nothing new in this Unicode version, no test file to update
543
544* run & fix ICU4C tests
545- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
546- update CLDR GraphemeBreakTest.txt
547    cd ~/unitools/mine/Generated
548    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
549    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
550    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
551- Andy helps with RBBI & spoof check test failures
552
553* collation: CLDR collation root, UCA DUCET
554
555- UCA DUCET goes into Mark's Unicode tools,
556  and a tool-tailored version goes into CLDR, see
557    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
558
559- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
560    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
561- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
562    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
563    (note removing the underscore before "Rules")
564    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
565- restore TODO diffs in UCARules.txt
566    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
567- update (ICU4C)/source/test/testdata/CollationTest_*.txt
568  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
569  from the CLDR root files (..._CLDR_..._SHORT.txt)
570    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
571    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
572    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
573- if CLDR common/uca/unihan-index.txt changes, then update
574  CLDR common/collation/root.xml <collation type="private-unihan">
575  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
576
577- generate data files, as above (generate.sh), now to pick up new collation data
578- update CollationFCD.java:
579  copy & paste the initializers of lcccIndex[] etc. from
580    ICU4C/source/i18n/collationfcd.cpp to
581    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
582- rebuild ICU4C (make clean, make check, as usual)
583
584* Unihan collators
585    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
586- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
587  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
588- generate ICU zh collation data
589    instructions inspired by
590    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
591    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
592  + setup:
593    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
594        (didn't work without setting JAVA_HOME,
595         nor with the Google default of /usr/local/buildtools/java/jdk
596         [Google security limitations in the XML parser])
597    export TOOLS_ROOT=~/icu/uni/src/tools
598    export CLDR_DIR=~/cldr/uni/src
599    export CLDR_DATA_DIR=~/cldr/uni/src
600        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
601    cd "$TOOLS_ROOT/cldr/lib"
602    ./install-cldr-jars.sh "$CLDR_DIR"
603  + generate the files we need
604    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
605    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
606  + diff
607    cd $ICU_SRC
608    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
609    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
610  + copy into the source tree
611    cd $ICU_SRC
612    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
613    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
614- rebuild ICU4C
615
616* run & fix ICU4C tests, now with new CLDR collation root data
617- run all tests with the collation test data *_SHORT.txt or the full files
618  (the full ones have comments, useful for debugging)
619- note on intltest: if collate/UCAConformanceTest fails, then
620  utility/MultithreadTest/TestCollators will fail as well;
621  fix the conformance test before looking into the multi-thread test
622
623* update Java data files
624- refresh just the UCD/UCA-related/derived files, just to be safe
625- see (ICU4C)/source/data/icu4j-readme.txt
626- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
627- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
628    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
629    you need to reconfigure with unicore data; see the "configure" line above.
630  output:
631    ...
632    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
633    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt70b
634    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b
635    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt70l.dat ./out/icu4j/icudt70b.dat -s ./out/build/icudt70l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt70b
636    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b"
637    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt70b/
638    mkdir -p /tmp/icu4j/main/shared/data
639    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
640    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt70b/
641    mkdir -p /tmp/icu4j/main/shared/data
642    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
643    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
644- copy the big-endian Unicode data files to another location,
645  separate from the other data files,
646  and then refresh ICU4J
647    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
648    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
649    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
650    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
651    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
652    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
653    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
654    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
655    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
656    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
657
658* When refreshing all of ICU4J data from ICU4C
659- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
660- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
661or
662- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
663
664* refresh Java test .txt files
665- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
666    cd $ICU_SRC/icu4c/source/data/unidata
667    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
668    cd ../../test/testdata
669    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
670    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
671
672* run & fix ICU4J tests
673
674*** API additions
675- send notice to icu-design about new born-@stable API (enum constants etc.)
676
677*** CLDR numbering systems
678- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
679  for example:
680    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-13.txt
681    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
682    ~/icu/uni/src$ diff -u /tmp/icu/nv4-13.txt /tmp/icu/nv4-14.txt
683    -->
684    +cp;16AC4;-Alpha;gc=Nd;-IDS;lb=NU;na=TANGSA DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
685  Unicode 14:
686    tnsa 16AC0..16AC9 Tangsa
687    https://github.com/unicode-org/cldr/pull/1326
688
689*** merge the Unicode update branches back onto the trunk
690- do not merge the icudata.jar and testdata.jar,
691  instead rebuild them from merged & tested ICU4C
692- make sure that changes to Unicode tools are checked in:
693  https://github.com/unicode-org/unicodetools
694
695---------------------------------------------------------------------------- ***
696
697Unicode 13.0 update for ICU 66
698
699https://www.unicode.org/versions/Unicode13.0.0/
700https://www.unicode.org/versions/beta-13.0.0.html
701https://www.unicode.org/Public/13.0.0/ucd/
702https://www.unicode.org/reports/uax-proposed-updates.html
703https://www.unicode.org/reports/tr44/tr44-25.html
704
705https://unicode-org.atlassian.net/browse/CLDR-13387
706https://unicode-org.atlassian.net/browse/ICU-20893
707
708* Command-line environment setup
709
710UNICODE_DATA=~/unidata/uni13/20200212
711CLDR_SRC=~/cldr/uni/src
712ICU_ROOT=~/icu/uni
713ICU_SRC=$ICU_ROOT/src
714ICUDT=icudt66b
715ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
716ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
717export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
718
719*** Unicode version numbers
720- makedata.mak
721- uchar.h
722- com.ibm.icu.util.VersionInfo
723- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
724
725- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
726    so that the makefiles see the new version number.
727  cd $ICU_ROOT/dbg/icu4c
728  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
729
730*** data files & enums & parser code
731
732* download files
733- mkdir -p $UNICODE_DATA
734- download Unicode files into $UNICODE_DATA
735  + subfolders: emoji, idna, security, ucd, uca
736  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
737  + split Unihan into single-property files
738    ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
739  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
740    or from the ucd/cldr/ output folder of the Unicode Tools:
741    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
742  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
743
744* for manual diffs and for Unicode Tools input data updates:
745  remove version suffixes from the file names
746    ~$ unidata/desuffixucd.py $UNICODE_DATA
747  (see https://sites.google.com/site/unicodetools/inputdata)
748
749* process and/or copy files
750- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
751  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
752  + For debugging, and tweaking how ppucd.txt is written,
753    the tool has an --only_ppucd option:
754    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
755
756- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
757
758* new constants for new property values
759- preparseucd.py error:
760    ValueError: missing uchar.h enum constants for some property values:
761    [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
762        u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
763    (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
764    (u'InPC', set([u'Top_And_Bottom_And_Left']))]
765  = PropertyValueAliases.txt new property values (diff old & new .txt files)
766    blk; Chorasmian                       ; Chorasmian
767    blk; CJK_Ext_G                        ; CJK_Unified_Ideographs_Extension_G
768    blk; Dives_Akuru                      ; Dives_Akuru
769    blk; Khitan_Small_Script              ; Khitan_Small_Script
770    blk; Lisu_Sup                         ; Lisu_Supplement
771    blk; Symbols_For_Legacy_Computing     ; Symbols_For_Legacy_Computing
772    blk; Tangut_Sup                       ; Tangut_Supplement
773    blk; Yezidi                           ; Yezidi
774  -> add to uchar.h before UBLOCK_COUNT
775    use long property names for enum constants,
776    for the trailing comment get the block start code point: diff old & new Blocks.txt
777  -> add to UCharacter.UnicodeBlock IDs
778    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
779            replace  public static final int \1_ID = \2; \3
780  -> add to UCharacter.UnicodeBlock objects
781    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
782            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
783
784    sc ; Chrs                             ; Chorasmian
785    sc ; Diak                             ; Dives_Akuru
786    sc ; Kits                             ; Khitan_Small_Script
787    sc ; Yezi                             ; Yezidi
788  -> uscript.h & com.ibm.icu.lang.UScript
789  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
790      and in com.ibm.icu.dev.test.lang.TestUScript.java
791
792    InPC; Top_And_Bottom_And_Left         ; Top_And_Bottom_And_Left
793  -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
794
795* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
796    (not strictly necessary for NOT_ENCODED scripts)
797  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
798
799* build ICU (make install)
800  to make sure that there are no syntax errors, and
801  so that the tools build can pick up the new definitions from the installed header files.
802
803  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
804
805* update spoof checker UnicodeSet initializers:
806    inclusionPat & recommendedPat in i18n/uspoof.cpp
807    INCLUSION & RECOMMENDED in SpoofChecker.java
808- make sure that the Unicode Tools tree contains the latest security data files
809- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
810- update the hardcoded version number there in the DIRECTORY path
811- run the tool (no special environment variables needed)
812- copy & paste from the Console output into the .cpp & .java files
813
814* generate normalization data files
815  cd $ICU_ROOT/dbg/icu4c
816  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
817  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
818  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
819  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
820  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
821
822* build ICU (make install)
823  so that the tools build can pick up the new definitions from the installed header files.
824
825  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
826
827* build Unicode tools using CMake+make
828
829$ICU_SRC/tools/unicode/c/icudefs.txt:
830
831# Location (--prefix) of where ICU was installed.
832set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
833# Location of the ICU4C source tree.
834set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
835
836  $ICU_ROOT/dbg$
837    mkdir -p tools/unicode/c
838    cd tools/unicode/c
839
840  $ICU_ROOT/dbg/tools/unicode/c$
841    cmake ../../../../src/tools/unicode/c
842    make
843
844* generate core properties data files
845  $ICU_ROOT/dbg/tools/unicode/c$
846    genprops/genprops $ICU_SRC/icu4c
847- tool failure:
848    genprops: Script_Extensions indexes overflow bit field
849    genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
850  -> uprops.icu data file format :
851     add two more bits to store a script code or Script_Extensions index
852  -> generator code, C++ & Java runtime, uprops.icu format version 7.7
853- rebuild ICU (make install) & tools
854
855* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
856  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
857- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
858- Unicode 6.0..13.0: U+2260, U+226E, U+226F
859- nothing new in this Unicode version, no test file to update
860
861* run & fix ICU4C tests
862- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
863- Andy helps with RBBI & spoof check test failures
864
865* collation: CLDR collation root, UCA DUCET
866
867- UCA DUCET goes into Mark's Unicode tools, see
868    https://sites.google.com/site/unicodetools/home#TOC-UCA
869  diff the main mapping file, look for bad changes
870  (for example, more bytes per weight for common characters)
871    ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
872    ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
873
874- CLDR root data files are checked into $CLDR_SRC/common/uca/
875    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
876
877- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
878    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
879- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
880    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
881    (note removing the underscore before "Rules")
882    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
883- restore TODO diffs in UCARules.txt
884    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
885- update (ICU4C)/source/test/testdata/CollationTest_*.txt
886  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
887  from the CLDR root files (..._CLDR_..._SHORT.txt)
888    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
889    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
890    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
891- if CLDR common/uca/unihan-index.txt changes, then update
892  CLDR common/collation/root.xml <collation type="private-unihan">
893  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
894
895- run genuca
896  $ICU_ROOT/dbg/tools/unicode/c$
897    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
898    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
899- rebuild ICU4C
900
901* Unihan collators
902    https://sites.google.com/site/unicodetools/unihan
903- run Unicode Tools
904    org.unicode.draft.GenerateUnihanCollators
905  with VM arguments
906    -ea
907    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
908    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
909    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
910    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
911    -DUVERSION=13.0.0
912- run Unicode Tools
913    org.unicode.draft.GenerateUnihanCollatorFiles
914  with the same arguments
915- check CLDR diffs
916    cd $CLDR_SRC
917    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
918    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
919- copy to CLDR
920    cd $CLDR_SRC
921    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
922    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
923- run CLDR unit tests, commit to CLDR
924- generate ICU zh collation data: run CLDR
925    org.unicode.cldr.icu.NewLdml2IcuConverter
926  with program arguments
927    -t collation
928    -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
929    -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
930    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
931    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
932    zh
933  and VM arguments
934    -ea
935    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
936- rebuild ICU4C
937
938* run & fix ICU4C tests, now with new CLDR collation root data
939- run all tests with the collation test data *_SHORT.txt or the full files
940  (the full ones have comments, useful for debugging)
941- note on intltest: if collate/UCAConformanceTest fails, then
942  utility/MultithreadTest/TestCollators will fail as well;
943  fix the conformance test before looking into the multi-thread test
944
945* update Java data files
946- refresh just the UCD/UCA-related/derived files, just to be safe
947- see (ICU4C)/source/data/icu4j-readme.txt
948- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
949- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
950  output:
951    ...
952    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
953    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
954    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
955    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
956    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
957    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
958    mkdir -p /tmp/icu4j/main/shared/data
959    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
960    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
961    mkdir -p /tmp/icu4j/main/shared/data
962    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
963    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
964- copy the big-endian Unicode data files to another location,
965  separate from the other data files,
966  and then refresh ICU4J
967    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
968    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
969    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
970    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
971    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
972    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
973    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
974    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
975    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
976    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
977
978* When refreshing all of ICU4J data from ICU4C
979- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
980- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
981or
982- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
983
984* update CollationFCD.java
985  + copy & paste the initializers of lcccIndex[] etc. from
986    ICU4C/source/i18n/collationfcd.cpp to
987    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
988
989* refresh Java test .txt files
990- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
991    cd $ICU_SRC/icu4c/source/data/unidata
992    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
993    cd ../../test/testdata
994    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
995    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
996
997* run & fix ICU4J tests
998
999*** API additions
1000- send notice to icu-design about new born-@stable API (enum constants etc.)
1001
1002*** CLDR numbering systems
1003- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1004  for example, look for
1005    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1006    in new blocks (Blocks.txt)
1007  Unicode 13:
1008    diak 11950..11959 Dives_Akuru
1009
1010*** merge the Unicode update branches back onto the trunk
1011- do not merge the icudata.jar and testdata.jar,
1012  instead rebuild them from merged & tested ICU4C
1013- make sure that changes to Unicode tools are checked in:
1014  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1015
1016---------------------------------------------------------------------------- ***
1017
1018Unicode 12.1 update for ICU 64.2
1019
1020** This is an abbreviated update with one new character for the new
1021** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
1022https://en.wikipedia.org/wiki/Reiwa_period
1023
1024http://www.unicode.org/versions/Unicode12.1.0/
1025
1026ICU-20497 Unicode 12.1
1027
1028cldrbug 11978: Unicode 12.1
1029
1030* Command-line environment setup
1031
1032UNICODE_DATA=~/unidata/uni121/20190403
1033CLDR_SRC=~/svn.cldr/uni
1034ICU_ROOT=~/icu/uni
1035ICU_SRC=$ICU_ROOT/src
1036ICUDT=icudt64b
1037ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1038ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1039export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1040
1041*** Unicode version numbers
1042- makedata.mak
1043- uchar.h
1044- com.ibm.icu.util.VersionInfo
1045- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1046
1047- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1048    so that the makefiles see the new version number.
1049  cd $ICU_ROOT/dbg/icu4c
1050  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
1051
1052*** data files & enums & parser code
1053
1054* download files
1055- mkdir -p $UNICODE_DATA
1056- download Unicode files into $UNICODE_DATA
1057  + subfolders: emoji, idna, security, ucd, uca
1058  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1059
1060* for manual diffs and for Unicode Tools input data updates:
1061  remove version suffixes from the file names
1062    ~$ unidata/desuffixucd.py $UNICODE_DATA
1063  (see https://sites.google.com/site/unicodetools/inputdata)
1064
1065* process and/or copy files
1066- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1067  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1068  + For debugging, and tweaking how ppucd.txt is written,
1069    the tool has an --only_ppucd option:
1070    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1071
1072- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1073
1074* build ICU (make install)
1075  so that the tools build can pick up the new definitions from the installed header files.
1076
1077  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1078
1079* update spoof checker UnicodeSet initializers:
1080    inclusionPat & recommendedPat in uspoof.cpp
1081    INCLUSION & RECOMMENDED in SpoofChecker.java
1082- make sure that the Unicode Tools tree contains the latest security data files
1083- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1084- update the hardcoded version number there in the DIRECTORY path
1085- run the tool (no special environment variables needed)
1086- copy & paste from the Console output into the .cpp & .java files
1087
1088* generate normalization data files
1089  cd $ICU_ROOT/dbg/icu4c
1090  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1091  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1092  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1093  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1094  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1095
1096* build ICU (make install)
1097  so that the tools build can pick up the new definitions from the installed header files.
1098
1099  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1100
1101* build Unicode tools using CMake+make
1102
1103$ICU_SRC/tools/unicode/c/icudefs.txt:
1104
1105# Location (--prefix) of where ICU was installed.
1106set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1107# Location of the ICU4C source tree.
1108set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
1109
1110  $ICU_ROOT/dbg$
1111    mkdir -p tools/unicode/c
1112    cd tools/unicode/c
1113
1114  $ICU_ROOT/dbg/tools/unicode/c$
1115    cmake ../../../../src/tools/unicode/c
1116    make
1117
1118* generate core properties data files
1119  $ICU_ROOT/dbg/tools/unicode/c$
1120    genprops/genprops $ICU_SRC/icu4c
1121    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
1122    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1123- rebuild ICU (make install) & tools
1124
1125* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1126  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1127- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1128- Unicode 6.0..12.1: U+2260, U+226E, U+226F
1129- nothing new in this Unicode version, no test file to update
1130
1131* run & fix ICU4C tests
1132- Andy handles RBBI & spoof check test failures
1133
1134* collation: CLDR collation root, UCA DUCET
1135
1136- UCA DUCET goes into Mark's Unicode tools, see
1137    https://sites.google.com/site/unicodetools/home#TOC-UCA
1138  diff the main mapping file, look for bad changes
1139  (for example, more bytes per weight for common characters)
1140    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
1141    ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
1142
1143- CLDR root data files are checked into $CLDR_SRC/common/uca/
1144    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1145
1146- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1147    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1148- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1149    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1150    (note removing the underscore before "Rules")
1151    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1152- restore TODO diffs in UCARules.txt
1153    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1154- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1155  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1156  from the CLDR root files (..._CLDR_..._SHORT.txt)
1157    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1158    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1159    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1160- if CLDR common/uca/unihan-index.txt changes, then update
1161  CLDR common/collation/root.xml <collation type="private-unihan">
1162  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1163
1164- run genuca, see command line above
1165- rebuild ICU4C
1166
1167* Unihan collators
1168    https://sites.google.com/site/unicodetools/unihan
1169- run Unicode Tools
1170    org.unicode.draft.GenerateUnihanCollators
1171  with VM arguments
1172    -ea
1173    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1174    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1175    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1176    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1177    -DUVERSION=12.1.0
1178- run Unicode Tools
1179    org.unicode.draft.GenerateUnihanCollatorFiles
1180  with the same arguments
1181- check CLDR diffs
1182    cd $CLDR_SRC
1183    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1184    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1185- copy to CLDR
1186    cd $CLDR_SRC
1187    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1188    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1189- run CLDR unit tests, commit to CLDR
1190- generate ICU zh collation data: run CLDR
1191    org.unicode.cldr.icu.NewLdml2IcuConverter
1192  with program arguments
1193    -t collation
1194    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1195    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1196    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
1197    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
1198    zh
1199  and VM arguments
1200    -ea
1201    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1202- rebuild ICU4C
1203
1204* run & fix ICU4C tests, now with new CLDR collation root data
1205- run all tests with the collation test data *_SHORT.txt or the full files
1206  (the full ones have comments, useful for debugging)
1207- note on intltest: if collate/UCAConformanceTest fails, then
1208  utility/MultithreadTest/TestCollators will fail as well;
1209  fix the conformance test before looking into the multi-thread test
1210
1211* update Java data files
1212- refresh just the UCD/UCA-related/derived files, just to be safe
1213- see (ICU4C)/source/data/icu4j-readme.txt
1214- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1215- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1216  output:
1217    ...
1218    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1219    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
1220    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
1221    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
1222    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
1223    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
1224    mkdir -p /tmp/icu4j/main/shared/data
1225    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1226    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
1227    mkdir -p /tmp/icu4j/main/shared/data
1228    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1229    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1230- copy the big-endian Unicode data files to another location,
1231  separate from the other data files,
1232  and then refresh ICU4J
1233    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1234    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1235    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1236    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1237    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1238    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1239    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1240    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1241    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1242    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1243
1244* When refreshing all of ICU4J data from ICU4C
1245- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1246- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1247or
1248- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1249
1250* update CollationFCD.java
1251  + copy & paste the initializers of lcccIndex[] etc. from
1252    ICU4C/source/i18n/collationfcd.cpp to
1253    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1254
1255* refresh Java test .txt files
1256- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1257    cd $ICU_SRC/icu4c/source/data/unidata
1258    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1259    cd ../../test/testdata
1260    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1261    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1262
1263* run & fix ICU4J tests
1264
1265*** API additions
1266- send notice to icu-design about new born-@stable API (enum constants etc.)
1267
1268*** CLDR numbering systems
1269- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1270  for example, look for
1271    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1272    in new blocks (Blocks.txt)
1273  Unicode 12: using Unicode 12 CLDR ticket #11478
1274    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
1275    wcho 1E2F0..1E2F9 Wancho
1276  Unicode 11: using Unicode 11 CLDR ticket #10978
1277    rohg 10D30..10D39 Hanifi_Rohingya
1278    gong 11DA0..11DA9 Gunjala_Gondi
1279  Earlier: CLDR tickets specific to adding new numbering systems.
1280  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1281  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1282
1283*** merge the Unicode update branches back onto the trunk
1284- do not merge the icudata.jar and testdata.jar,
1285  instead rebuild them from merged & tested ICU4C
1286- make sure that changes to Unicode tools are checked in:
1287  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1288
1289---------------------------------------------------------------------------- ***
1290
1291Unicode 12.0 update for ICU 64
1292
1293http://www.unicode.org/versions/Unicode12.0.0/
1294http://unicode.org/versions/beta-12.0.0.html
1295https://www.unicode.org/review/pri389/
1296http://www.unicode.org/reports/uax-proposed-updates.html
1297http://www.unicode.org/reports/tr44/tr44-23.html
1298
1299ICU-20203 Unicode 12
1300
1301ICU-20111 move text layout properties data into a data file
1302
1303cldrbug 11478: Unicode 12
1304Accidentally used ^/trunk instead of ^/branches/markus/uni12
1305
1306* Command-line environment setup
1307
1308UNICODE_DATA=~/unidata/uni12/20190309
1309CLDR_SRC=~/svn.cldr/uni
1310ICU_ROOT=~/icu/uni
1311ICU_SRC=$ICU_ROOT/src
1312ICUDT=icudt63b
1313ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1314ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1315export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1316
1317*** Unicode version numbers
1318- makedata.mak
1319- uchar.h
1320- com.ibm.icu.util.VersionInfo
1321- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1322
1323- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1324  so that the makefiles see the new version number.
1325
1326*** data files & enums & parser code
1327
1328* download files
1329- mkdir -p $UNICODE_DATA
1330- download Unicode files into $UNICODE_DATA
1331  + subfolders: emoji, idna, security, ucd, uca
1332  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1333
1334* for manual diffs and for Unicode Tools input data updates:
1335  remove version suffixes from the file names
1336    ~$ unidata/desuffixucd.py $UNICODE_DATA
1337  (see https://sites.google.com/site/unicodetools/inputdata)
1338
1339* process and/or copy files
1340- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1341  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1342  + For debugging, and tweaking how ppucd.txt is written,
1343    the tool has an --only_ppucd option:
1344    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1345
1346- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1347
1348* build ICU (make install)
1349  so that the tools build can pick up the new definitions from the installed header files.
1350
1351  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1352
1353* new constants for new property values
1354- preparseucd.py error:
1355    ValueError: missing uchar.h enum constants for some property values:
1356    [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
1357        u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
1358        u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
1359    (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
1360  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1361    blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
1362    blk; Elymaic                          ; Elymaic
1363    blk; Nandinagari                      ; Nandinagari
1364    blk; Nyiakeng_Puachue_Hmong           ; Nyiakeng_Puachue_Hmong
1365    blk; Ottoman_Siyaq_Numbers            ; Ottoman_Siyaq_Numbers
1366    blk; Small_Kana_Ext                   ; Small_Kana_Extension
1367    blk; Symbols_And_Pictographs_Ext_A    ; Symbols_And_Pictographs_Extended_A
1368    blk; Tamil_Sup                        ; Tamil_Supplement
1369    blk; Wancho                           ; Wancho
1370  -> add to uchar.h
1371    use long property names for enum constants,
1372    for the trailing comment get the block start code point: diff old & new Blocks.txt
1373  -> add to UCharacter.UnicodeBlock IDs
1374    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1375            replace  public static final int \1_ID = \2; \3
1376  -> add to UCharacter.UnicodeBlock objects
1377    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1378            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
1379
1380    sc ; Elym                             ; Elymaic
1381    sc ; Hmnp                             ; Nyiakeng_Puachue_Hmong
1382    sc ; Nand                             ; Nandinagari
1383    sc ; Wcho                             ; Wancho
1384  -> uscript.h & com.ibm.icu.lang.UScript
1385  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1386      and in com.ibm.icu.dev.test.lang.TestUScript.java
1387
1388* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1389    (not strictly necessary for NOT_ENCODED scripts)
1390  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1391
1392* update spoof checker UnicodeSet initializers:
1393    inclusionPat & recommendedPat in uspoof.cpp
1394    INCLUSION & RECOMMENDED in SpoofChecker.java
1395- make sure that the Unicode Tools tree contains the latest security data files
1396- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1397- update the hardcoded version number there in the DIRECTORY path
1398- run the tool (no special environment variables needed)
1399- copy & paste from the Console output into the .cpp & .java files
1400
1401* generate normalization data files
1402  cd $ICU_ROOT/dbg/icu4c
1403  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1404  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1405  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1406  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1407  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1408
1409* build ICU (make install)
1410  so that the tools build can pick up the new definitions from the installed header files.
1411
1412  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1413
1414* build Unicode tools using CMake+make
1415
1416$ICU_SRC/tools/unicode/c/icudefs.txt:
1417
1418# Location (--prefix) of where ICU was installed.
1419set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1420# Location of the ICU4C source tree.
1421set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
1422
1423  $ICU_ROOT/dbg$
1424    mkdir -p tools/unicode/c
1425    cd tools/unicode/c
1426
1427  $ICU_ROOT/dbg/tools/unicode/c$
1428    cmake ../../../../src/tools/unicode/c
1429    make
1430
1431* generate core properties data files
1432  $ICU_ROOT/dbg/tools/unicode/c$
1433    genprops/genprops $ICU_SRC/icu4c
1434    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
1435    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1436- rebuild ICU (make install) & tools
1437
1438* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1439  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1440- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1441- Unicode 6.0..12.0: U+2260, U+226E, U+226F
1442- nothing new in this Unicode version, no test file to update
1443
1444* run & fix ICU4C tests
1445- update test of default bidi classes:
1446  Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
1447  see diffs in DerivedBidiClass.txt
1448  + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
1449  + UCharacterTest.java TestIteration() defaultBidi[]
1450- Andy handles RBBI & spoof check test failures
1451
1452* collation: CLDR collation root, UCA DUCET
1453
1454- UCA DUCET goes into Mark's Unicode tools, see
1455    https://sites.google.com/site/unicodetools/home#TOC-UCA
1456  diff the main mapping file, look for bad changes
1457  (for example, more bytes per weight for common characters)
1458    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
1459    ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
1460
1461- CLDR root data files are checked into $CLDR_SRC/common/uca/
1462    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1463
1464- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1465    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1466- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1467    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1468    (note removing the underscore before "Rules")
1469    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1470- restore TODO diffs in UCARules.txt
1471    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1472- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1473  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1474  from the CLDR root files (..._CLDR_..._SHORT.txt)
1475    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1476    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1477    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1478- if CLDR common/uca/unihan-index.txt changes, then update
1479  CLDR common/collation/root.xml <collation type="private-unihan">
1480  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1481
1482- run genuca, see command line above;
1483  deal with
1484    Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1485    FDD1 119CE;	[71 CD 02, 05, 05]	# Nandinagari first primary (compressible)
1486        (add the character to genuca.cpp sampleCharsToScripts[])
1487  + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
1488    and cache its values.
1489    Works as long as the script metadata is updated before the collation data.
1490- rebuild ICU4C
1491
1492* Unihan collators
1493    https://sites.google.com/site/unicodetools/unihan
1494- run Unicode Tools
1495    org.unicode.draft.GenerateUnihanCollators
1496  with VM arguments
1497    -ea
1498    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1499    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1500    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1501    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1502    -DUVERSION=12.0.0
1503- run Unicode Tools
1504    org.unicode.draft.GenerateUnihanCollatorFiles
1505  with the same arguments
1506- check CLDR diffs
1507    cd $CLDR_SRC
1508    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1509    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1510- copy to CLDR
1511    cd $CLDR_SRC
1512    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1513    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1514- run CLDR unit tests, commit to CLDR
1515- generate ICU zh collation data: run CLDR
1516    org.unicode.cldr.icu.NewLdml2IcuConverter
1517  with program arguments
1518    -t collation
1519    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1520    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1521    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
1522    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
1523    zh
1524  and VM arguments
1525    -ea
1526    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1527- rebuild ICU4C
1528
1529* run & fix ICU4C tests, now with new CLDR collation root data
1530- run all tests with the collation test data *_SHORT.txt or the full files
1531  (the full ones have comments, useful for debugging)
1532- note on intltest: if collate/UCAConformanceTest fails, then
1533  utility/MultithreadTest/TestCollators will fail as well;
1534  fix the conformance test before looking into the multi-thread test
1535
1536* update Java data files
1537- refresh just the UCD/UCA-related/derived files, just to be safe
1538- see (ICU4C)/source/data/icu4j-readme.txt
1539- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1540- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1541  output:
1542    ...
1543    Unicode .icu files built to ./out/build/icudt63l
1544    echo timestamp > uni-core-data
1545    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
1546    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
1547    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1548    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
1549    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
1550    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
1551    mkdir -p /tmp/icu4j/main/shared/data
1552    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1553    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
1554    mkdir -p /tmp/icu4j/main/shared/data
1555    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1556    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1557- copy the big-endian Unicode data files to another location,
1558  separate from the other data files,
1559  and then refresh ICU4J
1560    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1561    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1562    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1563    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1564    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1565    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1566    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1567    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1568    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1569    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1570
1571* When refreshing all of ICU4J data from ICU4C
1572- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1573- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1574or
1575- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1576
1577* update CollationFCD.java
1578  + copy & paste the initializers of lcccIndex[] etc. from
1579    ICU4C/source/i18n/collationfcd.cpp to
1580    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1581
1582* refresh Java test .txt files
1583- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1584    cd $ICU_SRC/icu4c/source/data/unidata
1585    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1586    cd ../../test/testdata
1587    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1588    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1589
1590* run & fix ICU4J tests
1591
1592*** API additions
1593- send notice to icu-design about new born-@stable API (enum constants etc.)
1594
1595*** CLDR numbering systems
1596- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1597  for example, look for
1598    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1599    in new blocks (Blocks.txt)
1600  Unicode 12: using Unicode 12 CLDR ticket #11478
1601    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
1602    wcho 1E2F0..1E2F9 Wancho
1603  Unicode 11: using Unicode 11 CLDR ticket #10978
1604    rohg 10D30..10D39 Hanifi_Rohingya
1605    gong 11DA0..11DA9 Gunjala_Gondi
1606  Earlier: CLDR tickets specific to adding new numbering systems.
1607  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1608  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1609
1610*** merge the Unicode update branches back onto the trunk
1611- do not merge the icudata.jar and testdata.jar,
1612  instead rebuild them from merged & tested ICU4C
1613- make sure that changes to Unicode tools are checked in:
1614  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1615
1616---------------------------------------------------------------------------- ***
1617
1618ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
1619
1620* Command-line environment setup
1621
1622UNICODE_DATA=~/unidata/uni11/20180609
1623CLDR_SRC=~/svn.cldr/uni
1624ICU_ROOT=~/icu/mine
1625ICU_SRC=$ICU_ROOT/src
1626ICUDT=icudt62b
1627ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1628ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1629export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1630
1631*** Links
1632
1633https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
1634https://unicode-org.atlassian.net/browse/ICU-12850 vo
1635
1636*** data files & enums & parser code
1637
1638* API additions
1639- for each of the three new enumerated properties
1640  + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
1641  + uchar.h: update UCHAR_INT_LIMIT
1642  + uchar.h: add the enum U<long prop name>
1643    with constants U_<short prop name>_<long value name>
1644  + UProperty.java: add the constant <long prop name>
1645  + UProperty.java: update INT_LIMIT
1646  + UCharacter.java: add the interface <long prop name>
1647    with constants <long value name>
1648
1649* process and/or copy files
1650- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1651  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1652  + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
1653    names and aliases.
1654  + For debugging, and tweaking how ppucd.txt is written,
1655    the tool has an --only_ppucd option:
1656    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1657
1658* preparseucd.py changes
1659- add new property short names (uppercase) to _prop_and_value_re
1660  so that ParseUCharHeader() parses the new enum constants
1661
1662* build ICU (make install)
1663  so that the tools build can pick up the new definitions from the installed header files.
1664
1665  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1666
1667* build Unicode tools using CMake+make
1668
1669$ICU_SRC/tools/unicode/c/icudefs.txt:
1670
1671# Location (--prefix) of where ICU was installed.
1672set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1673# Location of the ICU4C source tree.
1674set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
1675
1676  $ICU_ROOT/dbg$
1677    mkdir -p tools/unicode/c
1678    cd tools/unicode/c
1679
1680  $ICU_ROOT/dbg/tools/unicode/c$
1681    cmake ../../../../../src/tools/unicode/c
1682    make
1683
1684* generate core properties data files
1685  $ICU_ROOT/dbg/tools/unicode/c$
1686    genprops/genprops $ICU_SRC/icu4c
1687- rebuild ICU (make install) & tools
1688
1689* write data for runtime, hardcoded for now
1690- add genprops/layoutpropsbuilder.cpp with pieces from sibling files
1691- generate new icu4c/source/common/ulayout_props_data.h
1692- for each of the three new enumerated properties
1693  + int property max value
1694  + small, 8-bit UCPTrie
1695    (A small 16-bit trie with bit fields for these three properties
1696    is very nearly the same size as the sum of the three.)
1697
1698* wire into C++
1699- uprops.cpp: #include ulayout_props_data.h
1700- uprops.cpp: add getInPC() etc. functions
1701- uprops.cpp: add lines to intProps[], include max values
1702- uprops.h: add UPropertySource constants
1703- uprops.cpp: add uprops_addPropertyStarts(src)
1704- uniset_props.cpp: add to UnicodeSet_initInclusion()
1705- intltest/ucdtest.cpp: write unit tests
1706
1707* update Java data files
1708- refresh just the pnames.icu file with the new property [value] names, just to be safe
1709- see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
1710- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1711- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1712- copy the big-endian Unicode data files to another location,
1713  separate from the other data files,
1714  and then refresh ICU4J
1715    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1716    cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1717    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1718
1719* wire into Java
1720- UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
1721- UCharacterProperty.java: for each new property
1722  + create a nested class to hold its CodePointTrie
1723  + initialize it from a string literal
1724  + paste in the initializer printed by genprops
1725  + add a new IntProperty object to the intProps[] array
1726  + use the correct max int value for each property, also printed by genprops
1727- UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
1728- UnicodeSet.java: add to getInclusions()
1729- UCharacterTest.java: write unit tests
1730
1731---------------------------------------------------------------------------- ***
1732
1733Unicode 11.0 update for ICU 62
1734
1735http://www.unicode.org/versions/Unicode11.0.0/
1736http://unicode.org/versions/beta-11.0.0.html
1737https://www.unicode.org/review/pri372/
1738http://www.unicode.org/reports/uax-proposed-updates.html
1739http://www.unicode.org/reports/tr44/tr44-21.html
1740
1741* Command-line environment setup
1742
1743UNICODE_DATA=~/unidata/uni11/20180521
1744CLDR_SRC=~/svn.cldr/uni
1745ICU_ROOT=~/svn.icu/uni
1746ICU_SRC=$ICU_ROOT/src
1747ICUDT=icudt61b
1748ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1749ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1750export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1751
1752*** ICU Trac
1753
1754- ticket:13630: Unicode 11
1755- ^/branches/markus/uni11
1756
1757*** CLDR Trac
1758
1759- cldrbug 10978: Unicode 11
1760- ^/branches/markus/uni11
1761
1762*** Unicode version numbers
1763- makedata.mak
1764- uchar.h
1765- com.ibm.icu.util.VersionInfo
1766- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1767
1768- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1769  so that the makefiles see the new version number.
1770
1771*** data files & enums & parser code
1772
1773* download files
1774- mkdir -p $UNICODE_DATA
1775- download Unicode files into $UNICODE_DATA
1776  + subfolders: emoji, idna, security, ucd, uca
1777  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1778
1779* for manual diffs and for Unicode Tools input data updates:
1780  remove version suffixes from the file names
1781    ~$ unidata/desuffixucd.py $UNICODE_DATA
1782  (see https://sites.google.com/site/unicodetools/inputdata)
1783
1784* process and/or copy files
1785- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1786  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1787  + For debugging, and tweaking how ppucd.txt is written,
1788    the tool has an --only_ppucd option:
1789    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1790
1791- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1792
1793* build ICU (make install)
1794  so that the tools build can pick up the new definitions from the installed header files.
1795
1796  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1797
1798* preparseucd.py changes
1799- fix other errors
1800    NameError: unknown property Extended_Pictographic
1801  -> add Extended_Pictographic binary property
1802  -> add new short names for all Emoji properties
1803
1804* new constants for new property values
1805- preparseucd.py error:
1806    ValueError: missing uchar.h enum constants for some property values:
1807    [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
1808                   u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
1809                   u'Indic_Siyaq_Numbers'])),
1810     (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
1811     (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
1812     (u'GCB', set([u'LinkC', u'Virama'])),
1813     (u'WB', set([u'WSegSpace']))]
1814  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1815    blk; Chess_Symbols                    ; Chess_Symbols
1816    blk; Dogra                            ; Dogra
1817    blk; Georgian_Ext                     ; Georgian_Extended
1818    blk; Gunjala_Gondi                    ; Gunjala_Gondi
1819    blk; Hanifi_Rohingya                  ; Hanifi_Rohingya
1820    blk; Indic_Siyaq_Numbers              ; Indic_Siyaq_Numbers
1821    blk; Makasar                          ; Makasar
1822    blk; Mayan_Numerals                   ; Mayan_Numerals
1823    blk; Medefaidrin                      ; Medefaidrin
1824    blk; Old_Sogdian                      ; Old_Sogdian
1825    blk; Sogdian                          ; Sogdian
1826  -> add to uchar.h
1827    use long property names for enum constants,
1828    for the trailing comment get the block start code point: diff old & new Blocks.txt
1829  -> add to UCharacter.UnicodeBlock IDs
1830    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1831            replace  public static final int \1_ID = \2; \3
1832  -> add to UCharacter.UnicodeBlock objects
1833    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1834            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1835
1836    GCB; LinkC                            ; LinkingConsonant
1837    GCB; Virama                           ; Virama
1838  -> uchar.h & UCharacter.GraphemeClusterBreak
1839  -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
1840
1841    InSC; Consonant_Initial_Postfixed     ; Consonant_Initial_Postfixed
1842  -> ignore: ICU does not yet support this property
1843
1844    jg ; Hanifi_Rohingya_Kinna_Ya         ; Hanifi_Rohingya_Kinna_Ya
1845    jg ; Hanifi_Rohingya_Pa               ; Hanifi_Rohingya_Pa
1846  -> uchar.h & UCharacter.JoiningGroup
1847
1848    sc ; Dogr                             ; Dogra
1849    sc ; Gong                             ; Gunjala_Gondi
1850    sc ; Maka                             ; Makasar
1851    sc ; Medf                             ; Medefaidrin
1852    sc ; Rohg                             ; Hanifi_Rohingya
1853    sc ; Sogd                             ; Sogdian
1854    sc ; Sogo                             ; Old_Sogdian
1855  -> uscript.h & com.ibm.icu.lang.UScript
1856  -> Nushu had been added already
1857  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1858      and in com.ibm.icu.dev.test.lang.TestUScript.java
1859
1860    WB ; WSegSpace                        ; WSegSpace
1861  -> uchar.h & UCharacter.WordBreak
1862
1863* New short names for emoji properties
1864- see UTS #51
1865- short names set in preparseucd.py
1866
1867* New properties
1868- boolean emoji property Extended_Pictographic
1869  -> added in preparseucd.py
1870  -> uchar.h & UProperty.java
1871- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
1872  as shown in PropertyValueAliases.txt
1873  -> ignore for now
1874
1875* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1876    (not strictly necessary for NOT_ENCODED scripts)
1877  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1878
1879* update spoof checker UnicodeSet initializers:
1880    inclusionPat & recommendedPat in uspoof.cpp
1881    INCLUSION & RECOMMENDED in SpoofChecker.java
1882- make sure that the Unicode Tools tree contains the latest security data files
1883- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1884- update the hardcoded version number there in the DIRECTORY path
1885- run the tool (no special environment variables needed)
1886- copy & paste from the Console output into the .cpp & .java files
1887
1888* generate normalization data files
1889  cd $ICU_ROOT/dbg/icu4c
1890  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1891  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1892  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1893  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1894  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1895
1896* build ICU (make install)
1897  so that the tools build can pick up the new definitions from the installed header files.
1898
1899  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1900
1901* build Unicode tools using CMake+make
1902
1903$ICU_SRC/tools/unicode/c/icudefs.txt:
1904
1905# Location (--prefix) of where ICU was installed.
1906set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1907# Location of the ICU4C source tree.
1908set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
1909
1910  $ICU_ROOT/dbg$
1911    mkdir -p tools/unicode/c
1912    cd tools/unicode/c
1913
1914  $ICU_ROOT/dbg/tools/unicode/c$
1915    cmake ../../../../src/tools/unicode/c
1916    make
1917
1918* generate core properties data files
1919  $ICU_ROOT/dbg/tools/unicode/c$
1920    genprops/genprops $ICU_SRC/icu4c
1921    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1922    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1923- rebuild ICU (make install) & tools
1924
1925* Fix case props
1926    genprops error: casepropsbuilder: too many exceptions words
1927    genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
1928- With the addition of Georgian Mtavruli capital letters,
1929  there are now too many simple case mappings with big mapping deltas
1930  that yield uncompressible exceptions.
1931- Changing the data structure (now formatVersion 4),
1932  adding one bit for no-simple-case-folding (for Cherokee), and
1933  one optional slot for a big delta (for most faraway mappings),
1934  together with another bit for whether that is negative.
1935  This makes most Cherokee & Georgian etc. case mappings compressible,
1936  reducing the number of exceptions words.
1937- Further changes to gain one more bit for the exceptions index,
1938  for future growth. Details see casepropsbuilder.cpp.
1939
1940* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1941  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1942- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1943- Unicode 6.0..11.0: U+2260, U+226E, U+226F
1944- nothing new in this Unicode version, no test file to update
1945
1946* run & fix ICU4C tests
1947- Andy handles RBBI & spoof check test failures
1948
1949- Errors in char.txt, word.txt, word_POSIX.txt like
1950    createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET"  at line 46, column 16
1951  because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
1952  -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
1953     not empty, just to get ICU building.
1954  -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
1955     and properties together with the rules that used them (GB 10, WB 14).
1956  -> Andy adjusts the rule sets further to sync with
1957     Unicode 11 grapheme, word, and line break spec changes.
1958
1959* collation: CLDR collation root, UCA DUCET
1960
1961- UCA DUCET goes into Mark's Unicode tools, see
1962    https://sites.google.com/site/unicodetools/home#TOC-UCA
1963  diff the main mapping file, look for bad changes
1964  (for example, more bytes per weight for common characters)
1965    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
1966    ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
1967
1968- CLDR root data files are checked into $CLDR_SRC/common/uca/
1969    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1970
1971- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1972    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1973- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1974    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1975    (note removing the underscore before "Rules")
1976    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1977- restore TODO diffs in UCARules.txt
1978    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1979- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1980  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1981  from the CLDR root files (..._CLDR_..._SHORT.txt)
1982    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1983    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1984    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1985- if CLDR common/uca/unihan-index.txt changes, then update
1986  CLDR common/collation/root.xml <collation type="private-unihan">
1987  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1988
1989- run genuca, see command line above;
1990  deal with
1991    Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1992    FDD1 1180B;	[71 CC 02, 05, 05]	# Dogra first primary (compressible)
1993        (add the character to genuca.cpp sampleCharsToScripts[])
1994  + look up the USCRIPT_ code for the new sample characters
1995    (should be obvious from the comment in the error output)
1996  + *add* mappings to sampleCharsToScripts[], do not replace them
1997    (in case the script sample characters flip-flop)
1998  + insert new scripts in DUCET script order, see the top_byte table
1999    at the beginning of FractionalUCA.txt
2000- rebuild ICU4C
2001
2002* Unihan collators
2003    https://sites.google.com/site/unicodetools/unihan
2004- run Unicode Tools
2005    org.unicode.draft.GenerateUnihanCollators
2006  with VM arguments
2007    -ea
2008    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
2009    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
2010    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
2011    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
2012    -DUVERSION=11.0.0
2013- run Unicode Tools
2014    org.unicode.draft.GenerateUnihanCollatorFiles
2015  with the same arguments
2016- check CLDR diffs
2017    cd $CLDR_SRC
2018    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2019    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2020- copy to CLDR
2021    cd $CLDR_SRC
2022    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2023    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2024- run CLDR unit tests, commit to CLDR
2025- generate ICU zh collation data: run CLDR
2026    org.unicode.cldr.icu.NewLdml2IcuConverter
2027  with program arguments
2028    -t collation
2029    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
2030    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
2031    -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
2032    -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
2033    zh
2034  and VM arguments
2035    -ea
2036    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
2037- rebuild ICU4C
2038
2039* run & fix ICU4C tests, now with new CLDR collation root data
2040- run all tests with the collation test data *_SHORT.txt or the full files
2041  (the full ones have comments, useful for debugging)
2042- note on intltest: if collate/UCAConformanceTest fails, then
2043  utility/MultithreadTest/TestCollators will fail as well;
2044  fix the conformance test before looking into the multi-thread test
2045
2046* update Java data files
2047- refresh just the UCD/UCA-related/derived files, just to be safe
2048- see (ICU4C)/source/data/icu4j-readme.txt
2049- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2050- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2051  output:
2052    ...
2053    Unicode .icu files built to ./out/build/icudt61l
2054    echo timestamp > uni-core-data
2055    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
2056    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
2057    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2058    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
2059    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
2060    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
2061    mkdir -p /tmp/icu4j/main/shared/data
2062    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2063    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
2064    mkdir -p /tmp/icu4j/main/shared/data
2065    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2066    make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
2067- copy the big-endian Unicode data files to another location,
2068  separate from the other data files,
2069  and then refresh ICU4J
2070    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
2071    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2072    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2073    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2074    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2075    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2076    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2077    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2078    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2079    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2080
2081* When refreshing all of ICU4J data from ICU4C
2082- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2083- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
2084or
2085- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
2086
2087* update CollationFCD.java
2088  + copy & paste the initializers of lcccIndex[] etc. from
2089    ICU4C/source/i18n/collationfcd.cpp to
2090    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2091
2092* refresh Java test .txt files
2093- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2094    cd $ICU_SRC/icu4c/source/data/unidata
2095    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2096    cd ../../test/testdata
2097    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2098    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2099
2100* run & fix ICU4J tests
2101
2102*** API additions
2103- send notice to icu-design about new born-@stable API (enum constants etc.)
2104
2105*** CLDR numbering systems
2106- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
2107  Unicode 11: using Unicode 11 CLDR ticket #10978
2108    rohg 10D30..10D39 Hanifi_Rohingya
2109    gong 11DA0..11DA9 Gunjala_Gondi
2110  Earlier: CLDR tickets specific to adding new numbering systems.
2111  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
2112  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
2113
2114*** merge the Unicode update branches back onto the trunk
2115- do not merge the icudata.jar and testdata.jar,
2116  instead rebuild them from merged & tested ICU4C
2117- make sure that changes to Unicode tools are checked in:
2118  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2119
2120---------------------------------------------------------------------------- ***
2121
2122Unicode 10.0 update for ICU 60
2123
2124http://www.unicode.org/versions/Unicode10.0.0/
2125http://www.unicode.org/versions/beta-10.0.0.html
2126http://blog.unicode.org/2017/03/unicode-100-beta-review.html
2127http://www.unicode.org/review/pri350/
2128http://www.unicode.org/reports/uax-proposed-updates.html
2129http://www.unicode.org/reports/tr44/tr44-19.html
2130
2131* Command-line environment setup
2132
2133UNICODE_DATA=~/unidata/uni10/20170605
2134CLDR_SRC=~/svn.cldr/uni10
2135ICU_ROOT=~/svn.icu/uni10
2136ICU_SRC=$ICU_ROOT/src
2137ICUDT=icudt60b
2138ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
2139ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
2140export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
2141
2142*** ICU Trac
2143
2144- ticket:12985: Unicode 10
2145- ticket:13061: undo hacks from emoji 5.0 update
2146- ticket:13062: add Emoji_Component property
2147- ^/branches/markus/uni10
2148
2149*** CLDR Trac
2150
2151- cldrbug 10055: Unicode 10
2152- cldrbug 9882: Unicode 10 script metadata
2153- cldrbug 10219: numbering systems for Unicode 10
2154
2155*** Unicode version numbers
2156- makedata.mak
2157- uchar.h
2158- com.ibm.icu.util.VersionInfo
2159- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2160
2161- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2162  so that the makefiles see the new version number.
2163
2164*** data files & enums & parser code
2165
2166* download files
2167- mkdir -p $UNICODE_DATA
2168- download Unicode 10.0 files into $UNICODE_DATA
2169  + subfolders: ucd, uca, idna, security
2170  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2171- download emoji 5.0 files into $UNICODE_DATA/emoji
2172
2173* for manual diffs: remove version suffixes from the file names
2174  ~$ unidata/desuffixucd.py $UNICODE_DATA
2175  (see https://sites.google.com/site/unicodetools/inputdata)
2176
2177* process and/or copy files
2178- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
2179  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2180  + For debugging, and tweaking how ppucd.txt is written,
2181    the tool has an --only_ppucd option:
2182    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
2183
2184- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
2185
2186* build ICU (make install)
2187  so that the tools build can pick up the new definitions from the installed header files.
2188
2189  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2190
2191* preparseucd.py changes
2192- remove or add new Unicode scripts from/to the
2193  only-in-ISO-15924 list according to the error messages:
2194    ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
2195  -> adjust _scripts_only_in_iso15924 as indicated
2196- fix other errors
2197    Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
2198  -> add vo=Vertical_Orientation to _ignored_properties
2199  -> later removed again, parsing the file, even though we do not yet store data for runtime use
2200
2201* new constants for new property values
2202- preparseucd.py error:
2203    ValueError: missing uchar.h enum constants for some property values:
2204    [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
2205                   u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
2206     (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
2207                  u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
2208                  u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
2209     (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
2210  = PropertyValueAliases.txt new property values (diff old & new .txt files)
2211    blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
2212    blk; Kana_Ext_A                       ; Kana_Extended_A
2213    blk; Masaram_Gondi                    ; Masaram_Gondi
2214    blk; Nushu                            ; Nushu
2215    blk; Soyombo                          ; Soyombo
2216    blk; Syriac_Sup                       ; Syriac_Supplement
2217    blk; Zanabazar_Square                 ; Zanabazar_Square
2218  -> add to uchar.h
2219    use long property names for enum constants,
2220    for the trailing comment get the block start code point: diff old & new Blocks.txt
2221  -> add to UCharacter.UnicodeBlock IDs
2222    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2223            replace  public static final int \1_ID = \2; \3
2224  -> add to UCharacter.UnicodeBlock objects
2225    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2226            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2227
2228    jg ; Malayalam_Bha                    ; Malayalam_Bha
2229    jg ; Malayalam_Ja                     ; Malayalam_Ja
2230    jg ; Malayalam_Lla                    ; Malayalam_Lla
2231    jg ; Malayalam_Llla                   ; Malayalam_Llla
2232    jg ; Malayalam_Nga                    ; Malayalam_Nga
2233    jg ; Malayalam_Nna                    ; Malayalam_Nna
2234    jg ; Malayalam_Nnna                   ; Malayalam_Nnna
2235    jg ; Malayalam_Nya                    ; Malayalam_Nya
2236    jg ; Malayalam_Ra                     ; Malayalam_Ra
2237    jg ; Malayalam_Ssa                    ; Malayalam_Ssa
2238    jg ; Malayalam_Tta                    ; Malayalam_Tta
2239  -> uchar.h & UCharacter.JoiningGroup
2240
2241    sc ; Gonm                             ; Masaram_Gondi
2242    sc ; Nshu                             ; Nushu
2243    sc ; Soyo                             ; Soyombo
2244    sc ; Zanb                             ; Zanabazar_Square
2245  -> uscript.h & com.ibm.icu.lang.UScript
2246  -> Nushu had been added already
2247  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2248      and in com.ibm.icu.dev.test.lang.TestUScript.java
2249
2250* New properties as shown in PropertyValueAliases.txt changes
2251- boolean Emoji_Component from emoji 5
2252  -> uchar.h & UProperty.java
2253- boolean
2254    # Regional_Indicator (RI)
2255
2256    RI ; N                                ; No                               ; F                                ; False
2257    RI ; Y                                ; Yes                              ; T                                ; True
2258  -> uchar.h & UProperty.java
2259  -> single immutable range, to be hardcoded
2260- boolean
2261    # Prepended_Concatenation_Mark (PCM)
2262
2263    PCM; N                                ; No                               ; F                                ; False
2264    PCM; Y                                ; Yes                              ; T                                ; True
2265  -> was new in Unicode 9
2266  -> uchar.h & UProperty.java
2267- enumerated
2268    # Vertical_Orientation (vo)
2269
2270    vo ; R                                ; Rotated
2271    vo ; Tr                               ; Transformed_Rotated
2272    vo ; Tu                               ; Transformed_Upright
2273    vo ; U                                ; Upright
2274  -> only pre-parsed for now, but not yet stored for runtime use
2275
2276* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2277    (not strictly necessary for NOT_ENCODED scripts)
2278  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
2279
2280* generate normalization data files
2281  cd $ICU_ROOT/dbg/icu4c
2282  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
2283  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
2284  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
2285  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2286  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
2287
2288* build ICU (make install)
2289  so that the tools build can pick up the new definitions from the installed header files.
2290
2291  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2292
2293* build Unicode tools using CMake+make
2294
2295$ICU_SRC/tools/unicode/c/icudefs.txt:
2296
2297# Location (--prefix) of where ICU was installed.
2298set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
2299# Location of the ICU4C source tree.
2300set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
2301
2302  $ICU_ROOT/dbg/tools/unicode/c$
2303    cmake ../../../../src/tools/unicode/c
2304    make
2305
2306* generate core properties data files
2307  $ICU_ROOT/dbg/tools/unicode/c$
2308    genprops/genprops $ICU_SRC/icu4c
2309    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
2310    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
2311- rebuild ICU (make install) & tools
2312
2313* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2314  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2315- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2316- Unicode 6.0..10.0: U+2260, U+226E, U+226F
2317- nothing new in this Unicode version, no test file to update
2318
2319* run & fix ICU4C tests
2320- Andy handles RBBI & spoof check test failures
2321
2322* collation: CLDR collation root, UCA DUCET
2323
2324- UCA DUCET goes into Mark's Unicode tools, see
2325  https://sites.google.com/site/unicodetools/home#TOC-UCA
2326- CLDR root data files are checked into $CLDR_SRC/common/uca/
2327    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
2328
2329- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2330    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
2331- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2332    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
2333    (note removing the underscore before "Rules")
2334    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
2335- restore TODO diffs in UCARules.txt
2336    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
2337- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2338  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2339  from the CLDR root files (..._CLDR_..._SHORT.txt)
2340    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2341    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2342    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
2343- if CLDR common/uca/unihan-index.txt changes, then update
2344  CLDR common/collation/root.xml <collation type="private-unihan">
2345  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
2346
2347- run genuca, see command line above;
2348  deal with
2349    Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
2350    FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
2351        (add the character to genuca.cpp sampleCharsToScripts[])
2352  + look up the USCRIPT_ code for the new sample characters
2353    (should be obvious from the comment in the error output)
2354  + *add* mappings to sampleCharsToScripts[], do not replace them
2355    (in case the script sample characters flip-flop)
2356  + insert new scripts in DUCET script order, see the top_byte table
2357    at the beginning of FractionalUCA.txt
2358- rebuild ICU4C
2359
2360* Unihan collators
2361    https://sites.google.com/site/unicodetools/unihan
2362- run Unicode Tools
2363    org.unicode.draft.GenerateUnihanCollators
2364  with VM arguments
2365    -ea
2366    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
2367    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
2368    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
2369    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
2370    -DUVERSION=10.0.0
2371- run Unicode Tools
2372    org.unicode.draft.GenerateUnihanCollatorFiles
2373  with the same arguments
2374- check CLDR diffs
2375    cd $CLDR_SRC
2376    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2377    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2378- copy to CLDR
2379    cd $CLDR_SRC
2380    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2381    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2382- run CLDR unit tests, commit to CLDR
2383- generate ICU zh collation data: run CLDR
2384    org.unicode.cldr.icu.NewLdml2IcuConverter
2385  with program arguments
2386    -t collation
2387    -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
2388    -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
2389    -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
2390    -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
2391    zh
2392  and VM arguments
2393    -ea
2394    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
2395- rebuild ICU4C
2396
2397* run & fix ICU4C tests, now with new CLDR collation root data
2398- run all tests with the collation test data *_SHORT.txt or the full files
2399  (the full ones have comments, useful for debugging)
2400- note on intltest: if collate/UCAConformanceTest fails, then
2401  utility/MultithreadTest/TestCollators will fail as well;
2402  fix the conformance test before looking into the multi-thread test
2403
2404* update Java data files
2405- refresh just the UCD/UCA-related/derived files, just to be safe
2406- see (ICU4C)/source/data/icu4j-readme.txt
2407- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2408- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2409  output:
2410    ...
2411    Unicode .icu files built to ./out/build/icudt60l
2412    echo timestamp > uni-core-data
2413    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
2414    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
2415    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2416    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
2417    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
2418    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
2419    mkdir -p /tmp/icu4j/main/shared/data
2420    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2421    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
2422    mkdir -p /tmp/icu4j/main/shared/data
2423    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2424    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
2425- copy the big-endian Unicode data files to another location,
2426  separate from the other data files,
2427  and then refresh ICU4J
2428    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
2429    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2430    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2431    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2432    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2433    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2434    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2435    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2436    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2437    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2438
2439* When refreshing all of ICU4J data from ICU4C
2440- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2441- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
2442or
2443- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
2444
2445* update CollationFCD.java
2446  + copy & paste the initializers of lcccIndex[] etc. from
2447    ICU4C/source/i18n/collationfcd.cpp to
2448    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2449
2450* refresh Java test .txt files
2451- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2452    cd $ICU_SRC/icu4c/source/data/unidata
2453    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2454    cd ../../test/testdata
2455    cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2456    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2457
2458* run & fix ICU4J tests
2459
2460*** API additions
2461- send notice to icu-design about new born-@stable API (enum constants etc.)
2462
2463*** CLDR numbering systems
2464- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
2465  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
2466  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
2467
2468*** merge the Unicode update branches back onto the trunk
2469- do not merge the icudata.jar and testdata.jar,
2470  instead rebuild them from merged & tested ICU4C
2471- make sure that changes to Unicode tools are checked in:
2472  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2473
2474---------------------------------------------------------------------------- ***
2475
2476Emoji 5.0 update for ICU 59
2477- ICU 59 mostly remains on Unicode 9.0
2478- except updates bidi and segmentation data to Unicode 10 beta
2479
2480First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
2481
2482* Command-line environment setup
2483
2484ICU_ROOT=~/svn.icu/trunk
2485ICU_SRC_DIR=$ICU_ROOT/src
2486ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
2487ICUDT=icudt59b
2488export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2489SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
2490UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
2491
2492*** ICU Trac
2493
2494- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
2495- changes directly on trunk
2496
2497*** data files & enums & parser code
2498
2499* download files
2500
2501- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
2502- download emoji 5.0 beta files into the same uni90e50 folder
2503- download Unicode 10.0 beta files: ucd
2504  + copy Unicode 10 bidi files to the uni90e50/ucd folder:
2505    BidiBrackets.txt
2506    BidiCharacterTest.txt
2507    BidiMirroring.txt
2508    BidiTest.txt
2509    extracted/DerivedBidiClass.txt
2510  + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
2511    LineBreak.txt
2512    auxiliary/*
2513
2514* preparseucd.py changes
2515- adjust for combined trunks
2516- write new copyright lines
2517- ignore new Emoji_Component property for now
2518
2519* process and/or copy files
2520- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
2521  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2522
2523- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
2524
2525* build ICU (make install)
2526  so that the tools build can pick up the new definitions from the installed header files.
2527
2528  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2529
2530* build Unicode tools using CMake+make
2531
2532~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
2533
2534# Location (--prefix) of where ICU was installed.
2535set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
2536# Location of the ICU4C source tree.
2537set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
2538
2539  ~/svn.icu/trunk/dbg/tools/unicode/c$
2540    cmake ../../../../src/tools/unicode/c
2541    make
2542
2543* generate core properties data files
2544  ~/svn.icu/trunk/dbg/tools/unicode/c$
2545    genprops/genprops $ICU4C_SRC_DIR
2546- rebuild ICU (make install) & tools
2547
2548* run & fix ICU4C tests
2549- Andy handles RBBI & spoof check test failures
2550
2551* update Java data files
2552- refresh just the UCD/UCA-related/derived files, just to be safe
2553- see (ICU4C)/source/data/icu4j-readme.txt
2554- mkdir /tmp/icu4j
2555- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2556  output:
2557    ...
2558    Unicode .icu files built to ./out/build/icudt59l
2559    echo timestamp > uni-core-data
2560    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
2561    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
2562    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2563    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
2564    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
2565    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
2566    mkdir -p /tmp/icu4j/main/shared/data
2567    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2568    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
2569    mkdir -p /tmp/icu4j/main/shared/data
2570    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2571    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
2572- copy the big-endian Unicode data files to another location,
2573  separate from the other data files,
2574  and then refresh ICU4J
2575    cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
2576    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2577    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2578    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2579    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2580    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2581    jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2582
2583* When refreshing all of ICU4J data from ICU4C
2584- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2585- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
2586or
2587- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
2588
2589* refresh Java test .txt files
2590- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2591    cd $ICU4C_SRC_DIR/source/data/unidata
2592    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2593    cd ../../test/testdata
2594    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2595    cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2596
2597* run & fix ICU4J tests
2598
2599---------------------------------------------------------------------------- ***
2600
2601Unicode 9.0 update for ICU 58
2602
2603* Command-line environment setup
2604
2605ICU_ROOT=~/svn.icu/trunk
2606ICU_SRC_DIR=$ICU_ROOT/src
2607ICUDT=icudt58b
2608export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2609SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2610UNIDATA=$ICU_SRC_DIR/source/data/unidata
2611
2612http://www.unicode.org/review/pri323/  -- beta review
2613http://www.unicode.org/reports/uax-proposed-updates.html
2614http://www.unicode.org/versions/beta-9.0.0.html
2615http://www.unicode.org/versions/Unicode9.0.0/
2616http://www.unicode.org/reports/tr44/tr44-17.html
2617
2618*** ICU Trac
2619
2620- ticket:12526: integrate Unicode 9
2621- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
2622- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
2623
2624*** CLDR Trac
2625
2626- cldrbug 9414: UCA 9
2627- ^/branches/markus/uni90 at r11518 from trunk at r11517
2628
2629- cldrbug 8745: Unicode 9.0 script metadata
2630
2631*** Unicode version numbers
2632- makedata.mak
2633- uchar.h
2634- com.ibm.icu.util.VersionInfo
2635- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2636
2637- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2638  so that the makefiles see the new version number.
2639
2640*** data files & enums & parser code
2641
2642* file preparation
2643
2644- download UCD & IDNA files
2645- make sure that the Unicode data folder passed into preparseucd.py
2646  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2647- only for manual diffs: remove version suffixes from the file names
2648  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2649  (see https://sites.google.com/site/unicodetools/inputdata)
2650- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2651- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2652- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2653
2654- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
2655  and copy to $UNIDATA
2656    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
2657
2658* preparseucd.py changes
2659- remove or add new Unicode scripts from/to the
2660  only-in-ISO-15924 list according to the error messages:
2661    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
2662    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
2663    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
2664    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
2665  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2666      and in com.ibm.icu.dev.test.lang.TestUScript.java
2667- DerivedNumericValues.txt new numeric values
2668    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
2669    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
2670    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
2671    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
2672    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
2673  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
2674     uchar.c, UCharacterProperty.java
2675     to support a new series of values
2676- adjust preparseucd.py for Tangut algorithmic names
2677  in ppucd.txt:
2678    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
2679  ->
2680    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
2681- avoid block-compressing most String/Miscellaneous property values,
2682  triggered by genprops not coping with a multi-code point Case_Folding on
2683    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
2684  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
2685
2686* PropertyAliases.txt changes
2687- 1 new property PCM=Prepended_Concatenation_Mark
2688  Ignore: Only useful for layout engines.
2689  Ok to list in ppucd.txt.
2690
2691* PropertyValueAliases.txt new property values
2692    blk; Adlam                            ; Adlam
2693    blk; Bhaiksuki                        ; Bhaiksuki
2694    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
2695    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
2696    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
2697    blk; Marchen                          ; Marchen
2698    blk; Mongolian_Sup                    ; Mongolian_Supplement
2699    blk; Newa                             ; Newa
2700    blk; Osage                            ; Osage
2701    blk; Tangut                           ; Tangut
2702    blk; Tangut_Components                ; Tangut_Components
2703  -> add to uchar.h
2704    use long property names for enum constants
2705  -> add to UCharacter.UnicodeBlock IDs
2706    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2707            replace  public static final int \1_ID = \2; \3
2708  -> add to UCharacter.UnicodeBlock objects
2709    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2710            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2711
2712    GCB; EB                               ; E_Base
2713    GCB; EBG                              ; E_Base_GAZ
2714    GCB; EM                               ; E_Modifier
2715    GCB; GAZ                              ; Glue_After_Zwj
2716    GCB; ZWJ                              ; ZWJ
2717  -> uchar.h & UCharacter.GraphemeClusterBreak
2718
2719    jg ; African_Feh                      ; African_Feh
2720    jg ; African_Noon                     ; African_Noon
2721    jg ; African_Qaf                      ; African_Qaf
2722  -> uchar.h & UCharacter.JoiningGroup
2723
2724    lb ; EB                               ; E_Base
2725    lb ; EM                               ; E_Modifier
2726    lb ; ZWJ                              ; ZWJ
2727  -> uchar.h & UCharacter.LineBreak
2728
2729    sc ; Adlm                             ; Adlam
2730    sc ; Bhks                             ; Bhaiksuki
2731    sc ; Marc                             ; Marchen
2732    sc ; Newa                             ; Newa
2733    sc ; Osge                             ; Osage
2734    sc ; Tang                             ; Tangut
2735  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2736
2737    WB ; EB                               ; E_Base
2738    WB ; EBG                              ; E_Base_GAZ
2739    WB ; EM                               ; E_Modifier
2740    WB ; GAZ                              ; Glue_After_Zwj
2741    WB ; ZWJ                              ; ZWJ
2742  -> uchar.h & UCharacter.WordBreak
2743
2744* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2745    (not strictly necessary for NOT_ENCODED scripts)
2746  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2747
2748* generate normalization data files
2749  cd $ICU_ROOT/dbg
2750  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2751  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2752  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2753  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2754  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2755
2756* build ICU (make install)
2757  so that the tools build can pick up the new definitions from the installed header files.
2758
2759  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
2760
2761* build Unicode tools using CMake+make
2762
2763~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2764
2765  # Location (--prefix) of where ICU was installed.
2766  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2767  # Location of the ICU source tree.
2768  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2769
2770  ~/svn.icutools/trunk/dbg/unicode/c$
2771    cmake ../../../src/unicode/c
2772    make
2773
2774* generate core properties data files
2775  ~/svn.icutools/trunk/dbg/unicode/c$
2776    genprops/genprops $ICU_SRC_DIR
2777    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2778    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2779- rebuild ICU (make install) & tools
2780
2781* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2782  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2783- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2784- Unicode 6.0..9.0: U+2260, U+226E, U+226F
2785- nothing new in 9.0, no test file to update
2786
2787* run & fix ICU4C tests
2788- Andy handles RBBI & spoof check test failures
2789
2790* collation: CLDR collation root, UCA DUCET
2791
2792- UCA DUCET goes into Mark's Unicode tools, see
2793  https://sites.google.com/site/unicodetools/home#TOC-UCA
2794- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2795    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2796
2797- cd (CLDR UCA branch)/common/uca/
2798- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2799    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2800- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2801    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2802    (note removing the underscore before "Rules")
2803    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2804- restore TODO diffs in UCARules.txt
2805    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2806- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2807  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2808  from the CLDR root files (..._CLDR_..._SHORT.txt)
2809    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2810    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2811    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2812- if CLDR common/uca/unihan-index.txt changes, then update
2813  CLDR common/collation/root.xml <collation type="private-unihan">
2814  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2815
2816- run genuca, see command line above;
2817  deal with
2818    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
2819    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
2820        (add the character to genuca.cpp sampleCharsToScripts[])
2821  + look up the USCRIPT_ code for the new sample characters
2822    (should be obvious from the comment in the error output)
2823  + *add* mappings to sampleCharsToScripts[], do not replace them
2824    (in case the script sample characters flip-flop)
2825  + insert new scripts in DUCET script order, see the top_byte table
2826    at the beginning of FractionalUCA.txt
2827- rebuild ICU4C
2828
2829* Unihan collators
2830- run Unicode Tools
2831    org.unicode.draft.GenerateUnihanCollators
2832  with VM arguments
2833    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
2834    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
2835    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
2836    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2837    -DUVERSION=9.0.0
2838    -ea
2839- run Unicode Tools
2840    org.unicode.draft.GenerateUnihanCollatorFiles
2841  with the same arguments
2842- check CLDR diffs
2843    cd ~/svn.cldr/trunk
2844    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2845    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2846- copy to CLDR
2847    cd ~/svn.cldr/trunk
2848    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2849    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2850- commit to CLDR
2851- generate ICU zh collation data: run CLDR
2852    org.unicode.cldr.icu.NewLdml2IcuConverter
2853  with program arguments
2854    -t collation
2855    -s /home/mscherer/svn.cldr/trunk/common/collation
2856    -m /home/mscherer/svn.cldr/trunk/common/supplemental
2857    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
2858    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
2859    zh
2860  and VM arguments
2861    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2862- rebuild ICU4C
2863
2864* run & fix ICU4C tests, now with new CLDR collation root data
2865- run all tests with the collation test data *_SHORT.txt or the full files
2866  (the full ones have comments, useful for debugging)
2867- note on intltest: if collate/UCAConformanceTest fails, then
2868  utility/MultithreadTest/TestCollators will fail as well;
2869  fix the conformance test before looking into the multi-thread test
2870
2871* update Java data files
2872- refresh just the UCD/UCA-related/derived files, just to be safe
2873- see (ICU4C)/source/data/icu4j-readme.txt
2874- mkdir /tmp/icu4j
2875- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2876  output:
2877    ...
2878    Unicode .icu files built to ./out/build/icudt58l
2879    echo timestamp > uni-core-data
2880    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2881    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
2882    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2883    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2884    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
2885    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
2886    mkdir -p /tmp/icu4j/main/shared/data
2887    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2888    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
2889    mkdir -p /tmp/icu4j/main/shared/data
2890    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2891    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2892- copy the big-endian Unicode data files to another location,
2893  separate from the other data files,
2894  and then refresh ICU4J
2895    cd ~/svn.icu/trunk/dbg/data/out/icu4j
2896    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2897    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2898    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2899    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2900    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2901    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2902    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2903    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2904    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2905
2906* When refreshing all of ICU4J data from ICU4C
2907- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2908- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2909or
2910- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2911
2912* update CollationFCD.java
2913  + copy & paste the initializers of lcccIndex[] etc. from
2914    ICU4C/source/i18n/collationfcd.cpp to
2915    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2916
2917* refresh Java test .txt files
2918- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2919    cd $ICU_SRC_DIR/source/data/unidata
2920    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2921    cd ../../test/testdata
2922    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2923    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2924
2925* run & fix ICU4J tests
2926
2927*** LayoutEngine script information
2928
2929* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2930  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2931  in the working directory.
2932
2933  (It also generates ScriptRunData.cpp, which is no longer needed.)
2934
2935  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2936  (a plain text file)
2937  which maps ICU versions to the numbers of script/language constants
2938  that were added then.
2939  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2940
2941  The generated files have a current copyright date and "@deprecated" statement.
2942
2943* Review changes, fix Java tool if necessary, and copy to ICU4C
2944  cd ~/svn.icu4j/trunk/src
2945  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2946  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2947  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2948
2949*** API additions
2950- send notice to icu-design about new born-@stable API (enum constants etc.)
2951
2952*** merge the Unicode update branches back onto the trunk
2953- do not merge the icudata.jar and testdata.jar,
2954  instead rebuild them from merged & tested ICU4C
2955- make sure that changes to Unicode tools & ICU tools are checked in
2956  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2957  http://bugs.icu-project.org/trac/log/tools/trunk
2958
2959---------------------------------------------------------------------------- ***
2960
2961New script codes early in ICU 58: https://unicode-org.atlassian.net/browse/ICU-11764
2962
2963Adding
2964- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
2965- new combination/alias codes: Hanb, Jamo
2966  - used in CLDR 29 and in spoof checker
2967- new Z* code: Zsye
2968
2969Add new codes to uscript.h & UScript.java, see Unicode update logs.
2970  -> com.ibm.icu.lang.UScript
2971    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2972    replace  public static final int \1 = \2; \3
2973
2974Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
2975add new script codes.
2976"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
2977
2978Note: If we have to run preparseucd.py again before the Unicode 9 update,
2979then we need to manually keep/restore the new script codes.
2980
2981ICU_ROOT=~/svn.icu/trunk
2982ICU_SRC_DIR=$ICU_ROOT/src
2983ICUDT=icudt57b
2984export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2985SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2986UNIDATA=$ICU_SRC_DIR/source/data/unidata
2987
2988Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
2989see https://unicode-org.atlassian.net/browse/ICU-12141
2990
2991make install, then icutools cmake & make, then
2992~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2993
2994Generate Java data as usual, only update pnames.icu & uprops.icu.
2995
2996*** LayoutEngine script information
2997
2998* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2999  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3000  in the working directory.
3001
3002  (It also generates ScriptRunData.cpp, which is no longer needed.)
3003
3004  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
3005  (a plain text file)
3006  which maps ICU versions to the numbers of script/language constants
3007  that were added then.
3008  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
3009
3010  The generated files have a current copyright date and "@deprecated" statement.
3011
3012* Review changes, fix Java tool if necessary, and copy to ICU4C
3013  cd ~/svn.icu4j/trunk/src
3014  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3015  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
3016  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
3017
3018---------------------------------------------------------------------------- ***
3019
3020Emoji properties added in ICU 57: https://unicode-org.atlassian.net/browse/ICU-11802
3021
3022Edit preparseucd.py to add & parse new properties.
3023They share the UCD property namespace but are not listed in PropertyAliases.txt.
3024
3025Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
3026Initial data from emoji/2.0/
3027
3028ICU_ROOT=~/svn.icu/trunk
3029ICU_SRC_DIR=$ICU_ROOT/src
3030ICUDT=icudt56b
3031export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3032SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3033UNIDATA=$ICU_SRC_DIR/source/data/unidata
3034
3035Add binary-property constants to uchar.h enum UProperty & UProperty.java.
3036
3037~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3038(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
3039
3040Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
3041
3042make install, then icutools cmake & make, then
3043~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
3044
3045Generate Java data as usual, only update pnames.icu & uprops.icu.
3046
3047---------------------------------------------------------------------------- ***
3048
3049Unicode 8.0 update for ICU 56
3050
3051* Command-line environment setup
3052
3053ICU_ROOT=~/svn.icu/trunk
3054ICU_SRC_DIR=$ICU_ROOT/src
3055ICUDT=icudt56b
3056export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3057SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3058UNIDATA=$ICU_SRC_DIR/source/data/unidata
3059
3060http://www.unicode.org/review/pri297/  -- beta review
3061http://www.unicode.org/reports/uax-proposed-updates.html
3062http://unicode.org/versions/beta-8.0.0.html
3063http://www.unicode.org/versions/Unicode8.0.0/
3064http://www.unicode.org/reports/tr44/tr44-15.html
3065
3066*** ICU Trac
3067
3068- ticket:11574: Unicode 8
3069- C++ branches/markus/uni80 at r37351 from trunk at r37343
3070- Java branches/markus/uni80 at r37352 from trunk at r37338
3071
3072*** CLDR Trac
3073
3074- cldrbug 8311: UCA 8
3075- branches/markus/uni80 at r11518 from trunk at r11517
3076
3077- cldrbug 8109: Unicode 8.0 script metadata
3078- cldrbug 8418: Updated segmentation for Unicode 8.0
3079
3080*** Unicode version numbers
3081- makedata.mak
3082- uchar.h
3083- com.ibm.icu.util.VersionInfo
3084- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3085
3086- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3087  so that the makefiles see the new version number.
3088
3089*** data files & enums & parser code
3090
3091* file preparation
3092
3093- download UCD & IDNA files
3094- make sure that the Unicode data folder passed into preparseucd.py
3095  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3096- only for manual diffs: remove version suffixes from the file names
3097  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
3098  (see https://sites.google.com/site/unicodetools/inputdata)
3099- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
3100- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3101- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3102
3103- also: from http://unicode.org/Public/security/8.0.0/ download new
3104  confusables.txt & confusablesWholeScript.txt
3105  and copy to $UNIDATA
3106    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
3107    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
3108
3109* initial preparseucd.py changes
3110- remove new Unicode scripts from the
3111  only-in-ISO-15924 list according to the error message:
3112    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
3113    from _scripts_only_in_iso15924
3114  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3115      and in com.ibm.icu.dev.test.lang.TestUScript.java
3116- property and file name change:
3117    IndicMatraCategory -> IndicPositionalCategory
3118- UnicodeData.txt unusual numeric values (improper fractions)
3119    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
3120    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
3121    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
3122    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
3123    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
3124    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
3125    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
3126    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
3127    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
3128    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
3129  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
3130     which are listed in DerivedNumericValues.txt;
3131     keeps storage in data file simple
3132
3133* PropertyValueAliases.txt changes
3134- 10 new Block (blk) values:
3135    blk; Ahom                             ; Ahom
3136    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
3137    blk; Cherokee_Sup                     ; Cherokee_Supplement
3138    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
3139    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
3140    blk; Hatran                           ; Hatran
3141    blk; Multani                          ; Multani
3142    blk; Old_Hungarian                    ; Old_Hungarian
3143    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
3144    blk; Sutton_SignWriting               ; Sutton_SignWriting
3145  -> add to uchar.h
3146    use long property names for enum constants
3147  -> add to UCharacter.UnicodeBlock IDs
3148    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3149            replace  public static final int \1_ID = \2; \3
3150  -> add to UCharacter.UnicodeBlock objects
3151    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3152            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3153- 6 new Script (sc) values:
3154    sc ; Ahom                             ; Ahom
3155    sc ; Hatr                             ; Hatran
3156    sc ; Hluw                             ; Anatolian_Hieroglyphs
3157    sc ; Hung                             ; Old_Hungarian
3158    sc ; Mult                             ; Multani
3159    sc ; Sgnw                             ; SignWriting
3160  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
3161
3162* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3163    (not strictly necessary for NOT_ENCODED scripts)
3164  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
3165
3166* generate normalization data files
3167  cd $ICU_ROOT/dbg
3168  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
3169  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3170  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3171  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3172  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3173
3174* build ICU (make install)
3175  so that the tools build can pick up the new definitions from the installed header files.
3176
3177  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3178
3179* build Unicode tools using CMake+make
3180
3181~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3182
3183  # Location (--prefix) of where ICU was installed.
3184  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
3185  # Location of the ICU source tree.
3186  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
3187
3188  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3189  ~/svn.icutools/trunk/dbg/unicode/c$ make
3190
3191* generate core properties data files
3192- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
3193- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
3194- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
3195- rebuild ICU (make install) & tools
3196- run genuca again (see step above) so that it picks up the new nfc.nrm
3197- rebuild ICU (make install) & tools
3198
3199* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3200  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3201- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3202- Unicode 6.0..8.0: U+2260, U+226E, U+226F
3203- nothing new in 8.0, no test file to update
3204
3205* run & fix ICU4C tests
3206- bad Cherokee case folding due to difference in fallbacks:
3207  UCD case folding falls back to no mapping,
3208  ICU runtime case folding falls back to lowercasing;
3209  fixed casepropsbuilder.cpp to generate scf mappings to self
3210  when there is an slc mapping but no scf
3211- Andy handles RBBI & spoof check test failures
3212
3213* collation: CLDR collation root, UCA DUCET
3214
3215- UCA DUCET goes into Mark's Unicode tools, see
3216  https://sites.google.com/site/unicodetools/home#TOC-UCA
3217- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
3218- cd (CLDR UCA branch)/common/uca/
3219- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3220  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3221- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3222    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
3223    (note removing the underscore before "Rules")
3224    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3225- restore TODO diffs in UCARules.txt
3226    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3227- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3228  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3229  from the CLDR root files (..._CLDR_..._SHORT.txt)
3230    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3231    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3232    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3233- if CLDR common/uca/unihan-index.txt changes, then update
3234  CLDR common/collation/root.xml <collation type="private-unihan">
3235  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
3236- run genuca, see command line above;
3237  deal with
3238    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
3239        (add the character to genuca.cpp sampleCharsToScripts[])
3240  + look up the script for the new sample characters
3241    (e.g., in FractionalUCA.txt)
3242  + *add* mappings to sampleCharsToScripts[], do not replace them
3243    (in case the script sample characters flip-flop)
3244  + insert new scripts in DUCET script order, see the top_byte table
3245    at the beginning of FractionalUCA.txt
3246- rebuild ICU4C
3247
3248* run & fix ICU4C tests, now with new CLDR collation root data
3249- run all tests with the collation test data *_SHORT.txt or the full files
3250  (the full ones have comments, useful for debugging)
3251- note on intltest: if collate/UCAConformanceTest fails, then
3252  utility/MultithreadTest/TestCollators will fail as well;
3253  fix the conformance test before looking into the multi-thread test
3254- fixed bug in CollationWeights::getWeightRanges()
3255  exposed by new data and CollationTest::TestRootElements
3256
3257* update Java data files
3258- refresh just the UCD/UCA-related/derived files, just to be safe
3259- see (ICU4C)/source/data/icu4j-readme.txt
3260- mkdir /tmp/icu4j
3261- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3262  output:
3263    ...
3264    Unicode .icu files built to ./out/build/icudt56l
3265    echo timestamp > uni-core-data
3266    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
3267    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
3268    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
3269    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
3270    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
3271    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
3272    mkdir -p /tmp/icu4j/main/shared/data
3273    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3274    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
3275    mkdir -p /tmp/icu4j/main/shared/data
3276    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3277    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
3278- copy the big-endian Unicode data files to another location,
3279  separate from the other data files,
3280  and then refresh ICU4J
3281    cd ~/svn.icu/trunk/dbg/data/out/icu4j
3282    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3283    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3284    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3285    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3286    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
3287    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3288    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3289    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3290    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3291
3292* When refreshing all of ICU4J data from ICU4C
3293- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3294- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3295or
3296- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3297
3298* update CollationFCD.java
3299  + copy & paste the initializers of lcccIndex[] etc. from
3300    ICU4C/source/i18n/collationfcd.cpp to
3301    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
3302
3303* refresh Java test .txt files
3304- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3305    cd $ICU_SRC_DIR/source/data/unidata
3306    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3307    cd ../../test/testdata
3308    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3309    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3310
3311* run & fix ICU4J tests
3312
3313*** LayoutEngine script information
3314
3315* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
3316  because the layout engine was deprecated in ICU 54.
3317  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
3318  to write lines that we used to add manually.
3319
3320* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3321  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3322  in the working directory.
3323
3324  (It also generates ScriptRunData.cpp, which is no longer needed.)
3325
3326  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
3327  (a plain text file)
3328  which maps ICU versions to the numbers of script/language constants
3329  that were added then.
3330  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
3331
3332  The generated files have a current copyright date and "@deprecated" statement.
3333
3334* Review changes, fix Java tool if necessary, and copy to ICU4C
3335  cd ~/svn.icu4j/trunk/src
3336  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3337  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
3338  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
3339
3340*** API additions
3341- send notice to icu-design about new born-@stable API (enum constants etc.)
3342
3343*** merge the Unicode update branches back onto the trunk
3344- do not merge the icudata.jar and testdata.jar,
3345  instead rebuild them from merged & tested ICU4C
3346- make sure that changes to Unicode tools & ICU tools are checked in
3347  http://www.unicode.org/utility/trac/log/trunk/unicodetools
3348  http://bugs.icu-project.org/trac/log/tools/trunk
3349
3350---------------------------------------------------------------------------- ***
3351
3352Unicode 7.0 update for ICU 54
3353
3354http://www.unicode.org/review/pri271/  -- beta review
3355http://www.unicode.org/reports/uax-proposed-updates.html
3356http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
3357http://www.unicode.org/reports/tr44/tr44-13.html
3358
3359*** ICU Trac
3360
3361- ticket 10821: Unicode 7.0, UCA 7.0
3362- C++ branches/markus/uni70 at r35584 from trunk at r35580
3363- Java branches/markus/uni70 at r35587 from trunk at r35545
3364
3365*** CLDR Trac
3366
3367- ticket 7195: UCA 7.0 CLDR root collation
3368- branches/markus/uni70 at r10062 from trunk at r10061
3369
3370- ticket 6762: script metadata for Unicode 7.0 new scripts
3371
3372*** Unicode version numbers
3373- makedata.mak
3374- uchar.h
3375- com.ibm.icu.util.VersionInfo
3376- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3377
3378- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3379  so that the makefiles see the new version number.
3380
3381*** data files & enums & parser code
3382
3383* file preparation
3384
3385- download UCD & IDNA files
3386- make sure that the Unicode data folder passed into preparseucd.py
3387  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3388- only for manual diffs: remove version suffixes from the file names
3389  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
3390  (see https://sites.google.com/site/unicodetools/inputdata)
3391- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
3392- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3393- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3394- Restore TODO diffs in source/data/unidata/UCARules.txt
3395    cd $ICU_SRC_DIR
3396    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
3397- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
3398
3399- also: from http://unicode.org/Public/security/7.0.0/ download new
3400  confusables.txt & confusablesWholeScript.txt
3401  and copy to $ICU_ROOT/src/source/data/unidata/
3402
3403* initial preparseucd.py changes
3404- remove new Unicode scripts from the
3405  only-in-ISO-15924 list according to the error message:
3406    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
3407                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
3408                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
3409    from _scripts_only_in_iso15924
3410  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3411      and in com.ibm.icu.dev.test.lang.TestUScript.java
3412- NamesList.txt now has a heading with a non-ASCII character
3413  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
3414  + escape non-ASCII characters in heading comments
3415- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
3416  + get the copyright from the first file whose copyright line contains the current year
3417
3418* PropertyValueAliases.txt changes
3419- 32 new Block (blk) values:
3420    blk; Bassa_Vah                        ; Bassa_Vah
3421    blk; Caucasian_Albanian               ; Caucasian_Albanian
3422    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
3423    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
3424    blk; Duployan                         ; Duployan
3425    blk; Elbasan                          ; Elbasan
3426    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
3427    blk; Grantha                          ; Grantha
3428    blk; Khojki                           ; Khojki
3429    blk; Khudawadi                        ; Khudawadi
3430    blk; Latin_Ext_E                      ; Latin_Extended_E
3431    blk; Linear_A                         ; Linear_A
3432    blk; Mahajani                         ; Mahajani
3433    blk; Manichaean                       ; Manichaean
3434    blk; Mende_Kikakui                    ; Mende_Kikakui
3435    blk; Modi                             ; Modi
3436    blk; Mro                              ; Mro
3437    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
3438    blk; Nabataean                        ; Nabataean
3439    blk; Old_North_Arabian                ; Old_North_Arabian
3440    blk; Old_Permic                       ; Old_Permic
3441    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
3442    blk; Pahawh_Hmong                     ; Pahawh_Hmong
3443    blk; Palmyrene                        ; Palmyrene
3444    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
3445    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
3446    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
3447    blk; Siddham                          ; Siddham
3448    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
3449    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
3450    blk; Tirhuta                          ; Tirhuta
3451    blk; Warang_Citi                      ; Warang_Citi
3452  -> add to uchar.h
3453    use long property names for enum constants
3454  -> add to UCharacter.UnicodeBlock IDs
3455    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3456            replace  public static final int \1_ID = \2; \3
3457  -> add to UCharacter.UnicodeBlock objects
3458    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3459            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3460- 28 new Joining_Group (jg) values:
3461    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
3462    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
3463    jg ; Manichaean_Beth                  ; Manichaean_Beth
3464    jg ; Manichaean_Daleth                ; Manichaean_Daleth
3465    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
3466    jg ; Manichaean_Five                  ; Manichaean_Five
3467    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
3468    jg ; Manichaean_Heth                  ; Manichaean_Heth
3469    jg ; Manichaean_Hundred               ; Manichaean_Hundred
3470    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
3471    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
3472    jg ; Manichaean_Mem                   ; Manichaean_Mem
3473    jg ; Manichaean_Nun                   ; Manichaean_Nun
3474    jg ; Manichaean_One                   ; Manichaean_One
3475    jg ; Manichaean_Pe                    ; Manichaean_Pe
3476    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
3477    jg ; Manichaean_Resh                  ; Manichaean_Resh
3478    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
3479    jg ; Manichaean_Samekh                ; Manichaean_Samekh
3480    jg ; Manichaean_Taw                   ; Manichaean_Taw
3481    jg ; Manichaean_Ten                   ; Manichaean_Ten
3482    jg ; Manichaean_Teth                  ; Manichaean_Teth
3483    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
3484    jg ; Manichaean_Twenty                ; Manichaean_Twenty
3485    jg ; Manichaean_Waw                   ; Manichaean_Waw
3486    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
3487    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
3488    jg ; Straight_Waw                     ; Straight_Waw
3489  -> uchar.h & UCharacter.JoiningGroup
3490- 23 new Script (sc) values:
3491    sc ; Aghb                             ; Caucasian_Albanian
3492    sc ; Bass                             ; Bassa_Vah
3493    sc ; Dupl                             ; Duployan
3494    sc ; Elba                             ; Elbasan
3495    sc ; Gran                             ; Grantha
3496    sc ; Hmng                             ; Pahawh_Hmong
3497    sc ; Khoj                             ; Khojki
3498    sc ; Lina                             ; Linear_A
3499    sc ; Mahj                             ; Mahajani
3500    sc ; Mani                             ; Manichaean
3501    sc ; Mend                             ; Mende_Kikakui
3502    sc ; Modi                             ; Modi
3503    sc ; Mroo                             ; Mro
3504    sc ; Narb                             ; Old_North_Arabian
3505    sc ; Nbat                             ; Nabataean
3506    sc ; Palm                             ; Palmyrene
3507    sc ; Pauc                             ; Pau_Cin_Hau
3508    sc ; Perm                             ; Old_Permic
3509    sc ; Phlp                             ; Psalter_Pahlavi
3510    sc ; Sidd                             ; Siddham
3511    sc ; Sind                             ; Khudawadi
3512    sc ; Tirh                             ; Tirhuta
3513    sc ; Wara                             ; Warang_Citi
3514  -> uscript.h (many were added before)
3515    comment "Mende Kikakui" for USCRIPT_MENDE
3516    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
3517  -> com.ibm.icu.lang.UScript
3518    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3519    replace  public static final int \1 = \2; \3
3520- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3521  (added 2012-11-01)
3522    Ahom        338     Ahom
3523    Hatr        127     Hatran
3524    Mult        323     Multani
3525  (added 2013-10-12)
3526    Modi        324     Modi
3527    Pauc        263     Pau Cin Hau
3528    Sidd        302     Siddham
3529  -> uscript.h (some overlap with additions from Unicode)
3530  -> com.ibm.icu.lang.UScript
3531    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3532    replace  public static final int \1 = \2; \3
3533  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
3534  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3535      and in com.ibm.icu.dev.test.lang.TestUScript.java
3536
3537* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3538    (not strictly necessary for NOT_ENCODED scripts)
3539  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
3540
3541* generate normalization data files
3542- cd $ICU_ROOT/dbg
3543- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3544- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3545- UNIDATA=$ICU_SRC_DIR/source/data/unidata
3546- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
3547- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3548- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3549- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3550- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3551
3552* build ICU (make install)
3553  so that the tools build can pick up the new definitions from the installed header files.
3554
3555~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3556
3557* build Unicode tools using CMake+make
3558
3559~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3560
3561# Location (--prefix) of where ICU was installed.
3562set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
3563# Location of the ICU source tree.
3564set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
3565
3566~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3567~/svn.icutools/trunk/dbg/unicode/c$ make
3568
3569* genprops work
3570- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
3571  + add second array of Joining_Group values for at most 10800..10FFF
3572    icutools: unicode/c/genprops/bidipropsbuilder.cpp
3573    icu: source/common/ubidi_props.h/.c/_data.h
3574    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
3575
3576* generate core properties data files
3577- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
3578- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
3579- rebuild ICU (make install) & tools
3580- run genuca again (see step above) so that it picks up the new nfc.nrm
3581- rebuild ICU (make install) & tools
3582
3583* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3584  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3585- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3586- Unicode 6.0..7.0: U+2260, U+226E, U+226F
3587- nothing new in 7.0, no test file to update
3588
3589* run & fix ICU4C tests
3590
3591* update Java data files
3592- refresh just the UCD-related files, just to be safe
3593- see (ICU4C)/source/data/icu4j-readme.txt
3594- mkdir /tmp/icu4j
3595- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3596  output:
3597    ...
3598    Unicode .icu files built to ./out/build/icudt53l
3599    echo timestamp > uni-core-data
3600    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
3601    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
3602    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3603    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
3604    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
3605    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
3606    mkdir -p /tmp/icu4j/main/shared/data
3607    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3608    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
3609    mkdir -p /tmp/icu4j/main/shared/data
3610    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3611    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
3612- copy the big-endian Unicode data files to another location,
3613  separate from the other data files
3614    ICUDT=icudt54b
3615    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3616    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3617    cd ~/svn.icu/uni70/dbg/data/out/icu4j
3618    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3619    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3620    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
3621    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3622    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3623    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3624- refresh ICU4J
3625    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3626
3627* update CollationFCD.java
3628  + copy & paste the initializers of lcccIndex[] etc. from
3629    ICU4C/source/i18n/collationfcd.cpp to
3630    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
3631
3632* refresh Java test .txt files
3633- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3634    cd $ICU_SRC_DIR/source/data/unidata
3635    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3636    cd ../../test/testdata
3637    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3638    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3639
3640* UCA
3641
3642- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
3643- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
3644- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
3645- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
3646- output files are in ~/svn.unitools/Generated/uca/7.0.0/
3647- review data; compare files, use blankweights.sed or similar
3648  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
3649- cd ~/svn.unitools/Generated/uca/7.0.0/
3650- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3651  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3652- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3653    (note removing the underscore before "Rules")
3654    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3655- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3656  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3657  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3658    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3659    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3660    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3661- run genuca, see command line above
3662- rebuild ICU4C
3663- refresh ICU4J collation data:
3664  (subset of instructions above for properties data refresh, except copies all coll/*)
3665    ICUDT=icudt54b
3666    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3667    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3668    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3669    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3670- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3671- note on intltest: if collate/UCAConformanceTest fails, then
3672  utility/MultithreadTest/TestCollators will fail as well;
3673  fix the conformance test before looking into the multi-thread test
3674- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
3675- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
3676  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
3677
3678* When refreshing all of ICU4J data from ICU4C
3679- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3680- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3681or
3682- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3683
3684* run & fix ICU4J tests
3685
3686*** LayoutEngine script information
3687
3688(For details see the Unicode 5.2 change log below.)
3689
3690* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3691  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3692  in the working directory.
3693  (It also generates ScriptRunData.cpp, which is no longer needed.)
3694
3695  The generated files have a current copyright date and "@stable" statement.
3696  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
3697  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
3698  which may not contain dots any more.
3699
3700- diff current <icu>/source/layout files vs. generated ones
3701    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3702  review and manually merge desired changes;
3703  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
3704  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3705- if you just copy the above files, then
3706  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3707  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3708
3709*** API additions
3710- send notice to icu-design about new born-@stable API (enum constants etc.)
3711
3712*** merge the Unicode update branches back onto the trunk
3713- do not merge the icudata.jar and testdata.jar,
3714  instead rebuild them from merged & tested ICU4C
3715
3716---------------------------------------------------------------------------- ***
3717
3718Unicode 6.3 update
3719
3720http://www.unicode.org/review/pri249/  -- beta review
3721http://www.unicode.org/reports/uax-proposed-updates.html
3722http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
3723http://www.unicode.org/reports/tr44/tr44-11.html
3724
3725*** ICU Trac
3726
3727- ticket 10128: update ICU to Unicode 6.3 beta
3728- ticket 10168: update ICU to Unicode 6.3 final
3729- C++ branches/markus/uni63 at r33552 from trunk at r33551
3730- Java branches/markus/uni63 at r33550 from trunk at r33553
3731
3732- ticket 10142: implement Unicode 6.3 bidi algorithm additions
3733
3734*** Unicode version numbers
3735- makedata.mak
3736- uchar.h
3737  (configure.in & configure: have been modified to extract the version from uchar.h)
3738- com.ibm.icu.util.VersionInfo
3739- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3740
3741- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3742  so that the makefiles see the new version number.
3743
3744*** data files & enums & parser code
3745
3746* file preparation
3747
3748- download UCD, UCA & IDNA files
3749- make sure that the Unicode data folder passed into preparseucd.py
3750  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3751- modify preparseucd.py:
3752  parse new file BidiBrackets.txt
3753  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
3754- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
3755- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3756- Check test file diffs for previously commented-out, known-failing data lines;
3757  probably need to keep those commented out.
3758
3759* PropertyAliases.txt changes
3760- 1 new Enumerated Property
3761  bpt                      ; Bidi_Paired_Bracket_Type
3762  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
3763  -> ubidi_props.h & .c & UBiDiProps.java
3764  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
3765  -> uprops.cpp
3766  -> change ubidi.icu format version from 2.0 to 2.1
3767- 1 new Miscellaneous Property
3768  bpb                      ; Bidi_Paired_Bracket
3769  -> uchar.h & UProperty.java
3770  -> ppucd.h & .cpp
3771
3772* PropertyValueAliases.txt changes
3773- 3 Bidi_Paired_Bracket_Type (bpt) values:
3774  bpt; c                                ; Close
3775  bpt; n                                ; None
3776  bpt; o                                ; Open
3777  -> uchar.h & UCharacter.BidiPairedBracketType
3778  -> ubidi_props.h & .c & UBiDiProps.java
3779  -> change ubidi.icu format version from 2.0 to 2.1
3780- 4 new Bidi_Class (bc) values:
3781  bc ; FSI                              ; First_Strong_Isolate
3782  bc ; LRI                              ; Left_To_Right_Isolate
3783  bc ; RLI                              ; Right_To_Left_Isolate
3784  bc ; PDI                              ; Pop_Directional_Isolate
3785  -> uchar.h & UCharacterEnums.ECharacterDirection
3786  -> until the bidi code gets updated,
3787     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
3788- 3 new Word_Break (WB) values:
3789  WB ; HL                               ; Hebrew_Letter
3790  WB ; SQ                               ; Single_Quote
3791  WB ; DQ                               ; Double_Quote
3792  -> uchar.h & UCharacter.WordBreak
3793  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
3794- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3795  (added 2012-10-16)
3796  Aghb  239     Caucasian Albanian
3797  Mahj  314     Mahajani
3798  -> uscript.h
3799  -> com.ibm.icu.lang.UScript
3800    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3801    replace  public static final int \1 = \2;\3
3802  -> preparseucd.py _scripts_only_in_iso15924
3803  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3804      and in com.ibm.icu.dev.test.lang.TestUScript.java
3805  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3806     (not strictly necessary for NOT_ENCODED scripts)
3807
3808* generate normalization data files
3809- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
3810- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
3811- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
3812- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3813- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3814- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3815- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3816
3817* build ICU (make install)
3818  so that the tools build can pick up the new definitions from the installed header files.
3819
3820~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3821
3822* build Unicode tools using CMake+make
3823
3824~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3825
3826# Location (--prefix) of where ICU was installed.
3827set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
3828# Location of the ICU source tree.
3829set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
3830
3831~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3832~/svn.icutools/trunk/dbg/unicode/c$ make
3833
3834* generate core properties data files
3835- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
3836- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
3837- rebuild ICU (make install) & tools
3838- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3839- rebuild ICU (make install) & tools
3840
3841* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3842  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3843- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3844- Unicode 6.0..6.3: U+2260, U+226E, U+226F
3845- nothing new in 6.3, no test file to update
3846
3847* update Java data files
3848- refresh just the UCD-related files, just to be safe
3849- see (ICU4C)/source/data/icu4j-readme.txt
3850- mkdir /tmp/icu4j
3851- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3852  output:
3853    ...
3854    Unicode .icu files built to ./out/build/icudt52l
3855    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3856    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
3857    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3858    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3859    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
3860    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
3861    mkdir -p /tmp/icu4j/main/shared/data
3862    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3863    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
3864    mkdir -p /tmp/icu4j/main/shared/data
3865    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3866    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
3867- copy the big-endian Unicode data files to another location,
3868  separate from the other data files
3869    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3870    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3871    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3872    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
3873    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3874    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3875    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3876- refresh ICU4J
3877    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3878
3879* refresh Java test .txt files
3880- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3881
3882* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
3883
3884- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3885- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3886- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3887- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3888  (note removing the underscore before "Rules")
3889- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3890  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3891  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3892- check test file diffs for previously commented-out, known-failing data lines;
3893  probably need to keep those commented out
3894- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3895- run genuca, see command line above
3896- rebuild ICU4C
3897- refresh ICU4J collation data:
3898  (subset of instructions above for properties data refresh, except copies all coll/*)
3899    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3900    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3901    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3902    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3903- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3904- note on intltest: if collate/UCAConformanceTest fails, then
3905  utility/MultithreadTest/TestCollators will fail as well;
3906  fix the conformance test before looking into the multi-thread test
3907
3908* test ICU, fix test code where necessary
3909
3910* When refreshing all of ICU4J data from ICU4C
3911- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3912- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3913or
3914- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3915
3916*** LayoutEngine script information
3917- skipped for Unicode 6.3: no new scripts
3918
3919*** merge the Unicode update branches back onto the trunk
3920- do not merge the icudata.jar and testdata.jar,
3921  instead rebuild them from merged & tested ICU4C
3922
3923---------------------------------------------------------------------------- ***
3924
3925Unicode 6.2 update
3926
3927http://www.unicode.org/review/pri230/
3928http://www.unicode.org/versions/beta-6.2.0.html
3929http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
3930http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
3931http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
3932http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
3933http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
3934http://unicode.org/Public/idna/6.2.0/
3935
3936*** ICU Trac
3937
3938- ticket 9515: Unicode 6.2: final ICU update
3939
3940- ticket 9514: UCA 6.2: fix UCARules.txt
3941
3942- ticket 9437: update ICU to Unicode 6.2
3943- C++ branches/markus/uni62 at r32050 from trunk at r32041
3944- Java branches/markus/uni62 at r32068 from trunk at r32066
3945
3946*** Unicode version numbers
3947- makedata.mak
3948- uchar.h
3949  (configure.in & configure: have been modified to extract the version from uchar.h)
3950- com.ibm.icu.util.VersionInfo
3951- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3952
3953*** data files & enums & parser code
3954
3955* file preparation
3956
3957- download UCD, UCA & IDNA files
3958- make sure that the Unicode data folder passed into preparseucd.py
3959  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3960- modify preparseucd.py: NamesList.txt is now in UTF-8
3961- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
3962- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3963- Check test file diffs for previously commented-out, known-failing data lines;
3964  probably need to keep those commented out.
3965
3966* PropertyValueAliases.txt changes
3967- 1 new Line_Break (lb) value:
3968  lb ; RI                               ; Regional_Indicator
3969  -> uchar.h & UCharacter.LineBreak
3970- 1 new Word_Break (WB) value:
3971  WB ; RI                               ; Regional_Indicator
3972  -> uchar.h & UCharacter.WordBreak
3973- 1 new Grapheme_Cluster_Break (GCB) value:
3974  GCB; RI                               ; Regional_Indicator
3975  -> uchar.h & UCharacter.GraphemeClusterBreak
3976
3977* 3 new numeric values
3978  The new value -1, which was really supposed to be NaN but that would have required
3979  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
3980  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
3981    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
3982    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
3983  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
3984    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
3985    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
3986  -> uprops.h, uchar.c & UCharacterProperty.java
3987  -> cucdtst.c & UCharacterTest.java
3988
3989* generate normalization data files
3990- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
3991- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
3992- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
3993- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3994- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3995- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3996- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3997
3998* build ICU (make install)
3999  so that the tools build can pick up the new definitions from the installed header files.
4000* build Unicode tools using CMake+make
4001
4002* generate core properties data files
4003- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
4004- in initial bootstrapping, change the UCA version
4005  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
4006- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
4007- rebuild ICU (make install) & tools
4008  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
4009    check if the UCA version in FractionalUCA.txt matches the new Unicode version
4010    (see step above)
4011- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
4012- rebuild ICU (make install) & tools
4013
4014* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4015  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4016- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4017- Unicode 6.0..6.2: U+2260, U+226E, U+226F
4018- nothing new in 6.2, no test file to update
4019
4020* update Java data files
4021- refresh just the UCD-related files, just to be safe
4022- see (ICU4C)/source/data/icu4j-readme.txt
4023- mkdir /tmp/icu4j
4024- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4025  output:
4026    ...
4027    Unicode .icu files built to ./out/build/icudt50l
4028    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
4029    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
4030    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4031    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
4032    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
4033    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
4034    mkdir -p /tmp/icu4j/main/shared/data
4035    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4036    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
4037    mkdir -p /tmp/icu4j/main/shared/data
4038    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
4039    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
4040- copy the big-endian Unicode data files to another location,
4041  separate from the other data files
4042    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4043    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
4044    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
4045    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
4046    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
4047    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4048    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
4049- refresh ICU4J
4050    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
4051
4052* refresh Java test .txt files
4053- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4054
4055* UCA
4056
4057- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
4058- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
4059- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4060- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4061  (note removing the underscore before "Rules")
4062- update (ICU4C)/source/test/testdata/CollationTest_*.txt
4063  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4064  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
4065- check test file diffs for previously commented-out, known-failing data lines;
4066  probably need to keep those commented out
4067- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
4068- run genuca, see command line above
4069- rebuild ICU4C
4070- refresh ICU4J collation data:
4071  (subset of instructions above for properties data refresh, except copies all coll/*)
4072    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4073    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4074    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4075    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
4076- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
4077- note on intltest: if collate/UCAConformanceTest fails, then
4078  utility/MultithreadTest/TestCollators will fail as well;
4079  fix the conformance test before looking into the multi-thread test
4080
4081* test ICU, fix test code where necessary
4082
4083* When refreshing all of ICU4J data from ICU4C
4084- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4085- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4086or
4087- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4088
4089*** LayoutEngine script information
4090- skipped for Unicode 6.2: no new scripts
4091
4092*** merge the Unicode update branches back onto the trunk
4093- do not merge the icudata.jar and testdata.jar,
4094  instead rebuild them from merged & tested ICU4C
4095
4096---------------------------------------------------------------------------- ***
4097
4098Future Unicode update
4099
4100Tools simplified since the Unicode 6.1 update. See
4101- https://icu.unicode.org/design/props/ppucd
4102- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
4103
4104* Unicode version numbers
4105- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
4106
4107* file preparation
4108- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
4109- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
4110- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
4111- Check test file diffs for previously commented-out, known-failing data lines;
4112  probably need to keep those commented out.
4113
4114* PropertyValueAliases.txt changes
4115- Script codes that are in ISO 15924 but not in Unicode are now listed in
4116  preparseucd.py, in the _scripts_only_in_iso15924 variable.
4117  If there are new ISO codes, then add them.
4118  If Unicode adds some of them, then remove them from the .py variable.
4119
4120* UnicodeData.txt changes
4121- No more manual changes for CJK ranges for algorithmic names;
4122  those are now written to ppucd.txt and genprops reads them from there.
4123
4124* generate core properties data files (makeprops.sh was deleted)
4125- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
4126
4127* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
4128- it is now generated by preparseucd.py
4129
4130* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
4131- it is now generated by preparseucd.py
4132- make sure that the Unicode data folder passed into preparseucd.py
4133  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
4134  (can be in some subfolder)
4135
4136* generate normalization data files
4137- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
4138- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
4139- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
4140- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
4141- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
4142- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
4143- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
4144
4145* build ICU (make install)
4146* build Unicode tools using CMake+make
4147
4148* new way to call genuca (makeuca.sh was deleted)
4149- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
4150
4151---------------------------------------------------------------------------- ***
4152
4153Unicode 6.1 update
4154
4155*** ICU Trac
4156
4157- ticket 8995 final update to Unicode 6.1
4158- ticket 8994 regenerate source/layout/CanonData.cpp
4159
4160- ticket 8961 support Unicode "Age" value *names*
4161- ticket 8963 support multiple character name aliases & types
4162
4163- ticket 8827 "update ICU to Unicode 6.1"
4164- C++ branches/markus/uni61 at r30864 from trunk at r30843
4165- Java branches/markus/uni61 at r30865 from trunk at r30863
4166
4167*** Unicode version numbers
4168- makedata.mak
4169- uchar.h
4170  (configure.in & configure: have been modified to extract the version from uchar.h)
4171- com.ibm.icu.util.VersionInfo
4172- icutools/unicode/makedefs.sh
4173  + also review & update other definitions in that file,
4174    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
4175
4176*** data files & enums & parser code
4177
4178* file preparation
4179
4180~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
4181- This prepares both unidata and testdata files in respective output subfolders.
4182- Check test file diffs for previously commented-out, known-failing data lines;
4183  probably need to keep those commented out.
4184
4185* PropertyValueAliases.txt changes
4186- 11 new block names:
4187  Arabic_Extended_A
4188  Arabic_Mathematical_Alphabetic_Symbols
4189  Chakma
4190  Meetei_Mayek_Extensions
4191  Meroitic_Cursive
4192  Meroitic_Hieroglyphs
4193  Miao
4194  Sharada
4195  Sora_Sompeng
4196  Sundanese_Supplement
4197  Takri
4198  -> add to uchar.h
4199  -> add to UCharacter.UnicodeBlock IDs
4200    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
4201            replace  public static final int \1_ID = \2; \3
4202  -> add to UCharacter.UnicodeBlock objects
4203    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
4204            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4205- 1 new Joining_Group (jg) value:
4206  Rohingya_Yeh
4207  -> uchar.h & UCharacter.JoiningGroup
4208- 2 new Line_Break (lb) values:
4209  CJ=Conditional_Japanese_Starter
4210  HL=Hebrew_Letter
4211  -> uchar.h & UCharacter.LineBreak
4212- 7 new scripts:
4213  sc ; Cakm      ; Chakma
4214  sc ; Merc      ; Meroitic_Cursive
4215  sc ; Mero      ; Meroitic_Hieroglyphs
4216  sc ; Plrd      ; Miao
4217  sc ; Shrd      ; Sharada
4218  sc ; Sora      ; Sora_Sompeng
4219  sc ; Takr      ; Takri
4220  -> remove these from SyntheticPropertyValueAliases.txt
4221  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
4222      and in com.ibm.icu.dev.test.lang.TestUScript.java
4223- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4224  (added 2011-06-21)
4225  Khoj        322     Khojki
4226  Tirh        326     Tirhuta
4227    and another one added 2011-12-09
4228  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
4229  -> uscript.h
4230  -> com.ibm.icu.lang.UScript
4231    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4232    replace  public static final int \1 = \2;\3
4233  -> SyntheticPropertyValueAliases.txt
4234  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4235      and in com.ibm.icu.dev.test.lang.TestUScript.java
4236
4237* UnicodeData.txt changes
4238- the last Unihan code point changes from U+9FCB to U+9FCC
4239  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
4240  + do change gennames.c
4241  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4242
4243* DerivedBidiClass.txt changes
4244- 2 new default-AL blocks:
4245#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
4246#     Arabic Mathematical Alphabetic Symbols:
4247#                       U+1EE00  - U+1EEFF  (was default-R)
4248- 2 new default-R blocks:
4249#     Meroitic Hieroglyphs:
4250#                        U+10980 - U+1099F
4251#     Meroitic Cursive:  U+109A0 - U+109FF
4252  -> should be picked up by the explicit data in the file
4253
4254* NameAliases.txt changes
4255- from
4256    # Each line has two fields
4257    # First field: Code point
4258    # Second field: Alias
4259- to
4260    # Each line has three fields, as described here:
4261    #
4262    # First field:  Code point
4263    # Second field: Alias
4264    # Third field:  Type
4265- Also, the file previously allowed multiple aliases but only now does it
4266  actually provide multiple, even multiple of the same type. For example,
4267    FEFF;BYTE ORDER MARK;alternate
4268    FEFF;BOM;abbreviation
4269    FEFF;ZWNBSP;abbreviation
4270- This breaks our gennames parser, unames.icu data structure, and API.
4271  Fix gennames to only pick up "correction" aliases.
4272  New ticket #8963 for further changes.
4273
4274* run genpname/preparse.pl (on Linux)
4275  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4276  + make sure that data.h is writable
4277  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4278  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4279
4280* build ICU (make install)
4281  so that the tools build can pick up the new definitions from the installed header files.
4282* build Unicode tools (at least genpname) using CMake+make
4283
4284* run genpname
4285  (builds both pnames.icu and propname_data.h)
4286- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4287- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
4288
4289* build ICU (make install)
4290* build Unicode tools using CMake+make
4291
4292* update source/data/unidata/norm2/nfkc_cf.txt
4293- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
4294
4295* update source/data/unidata/norm2/uts46.txt
4296- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
4297  to ~/svn.icu/tools/trunk/src/unicode/py
4298- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
4299- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
4300- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
4301
4302* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4303  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4304- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4305- Unicode 6.0..6.1: U+2260, U+226E, U+226F
4306- nothing new in 6.1, no test file to update
4307
4308* generate core properties data files
4309- in initial bootstrapping, change the UCA version
4310  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
4311- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4312- rebuild ICU & tools
4313  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
4314    check if the UCA version in FractionalUCA.txt matches the new Unicode version
4315    (see step above)
4316- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
4317  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4318- rebuild ICU & tools
4319
4320* update Java data files
4321- refresh just the UCD-related files, just to be safe
4322- see (ICU4C)/source/data/icu4j-readme.txt
4323- mkdir /tmp/icu4j
4324- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4325  output:
4326    ...
4327    Unicode .icu files built to ./out/build/icudt49l
4328    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
4329    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
4330    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4331    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
4332    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
4333    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
4334    mkdir -p /tmp/icu4j/main/shared/data
4335    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4336    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
4337    mkdir -p /tmp/icu4j/main/shared/data
4338    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
4339    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
4340- copy the big-endian Unicode data files to another location,
4341  separate from the other data files
4342    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4343    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
4344    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
4345    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
4346    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
4347    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4348    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
4349- refresh ICU4J
4350    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
4351
4352* refresh Java test .txt files
4353- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4354
4355* test ICU so far, fix test code where necessary
4356- temporarily ignore collation issues that look like UCA/UCD mismatches,
4357  until UCA data is updated
4358
4359* UCA
4360
4361- get output from Mark's tools; look in
4362    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
4363- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4364- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4365  (note removing the underscore before "Rules")
4366- update (ICU)/source/test/testdata/CollationTest_*.txt
4367  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4368  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
4369- check test file diffs for previously commented-out, known-failing data lines;
4370  probably need to keep those commented out
4371- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
4372- run makeuca.sh:
4373  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4374- rebuild ICU4C
4375- refresh ICU4J collation data:
4376  (subset of instructions above for properties data refresh, except copies all coll/*)
4377    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4378    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4379    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4380    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
4381- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
4382- note on intltest: if collate/UCAConformanceTest fails, then
4383  utility/MultithreadTest/TestCollators will fail as well;
4384  fix the conformance test before looking into the multi-thread test
4385
4386* When refreshing all of ICU4J data from ICU4C
4387- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4388- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4389or
4390- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4391
4392*** LayoutEngine script information
4393
4394(For details see the Unicode 5.2 change log below.)
4395
4396* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
4397  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
4398  in the working directory.
4399  (It also generates ScriptRunData.cpp, which is no longer needed.)
4400
4401  The generated files have a current copyright date and "@draft" statement.
4402
4403- diff current <icu>/source/layout files vs. generated ones
4404    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
4405  review and manually merge desired changes;
4406  fix gratuitous changes, incorrect @draft and missing aliases;
4407  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4408- if you just copy the above files, then
4409  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
4410  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4411
4412*** merge the Unicode update branches back onto the trunk
4413- do not merge the icudata.jar and testdata.jar,
4414  instead rebuild them from merged & tested ICU4C
4415
4416---------------------------------------------------------------------------- ***
4417
4418ICU 4.8 (no Unicode update, just new script codes)
4419
4420* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4421  (added 2010-12-21)
4422    Afak    439     Afaka
4423    Jurc    510     Jurchen
4424    Mroo    199     Mro, Mru
4425    Nshu    499     Nüshu
4426    Shrd    319     Sharada, Śāradā
4427    Sora    398     Sora Sompeng
4428    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
4429    Tang    520     Tangut
4430    Wole    480     Woleai
4431  -> uscript.h
4432  -> com.ibm.icu.lang.UScript
4433    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4434    replace  public static final int \1 = \2;\3
4435  -> genpname/SyntheticPropertyValueAliases.txt
4436  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4437      and in com.ibm.icu.dev.test.lang.TestUScript.java
4438
4439* run genpname/preparse.pl (on Linux)
4440  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4441  + make sure that data.h is writable
4442  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4443  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4444
4445* rebuild Unicode tools (at least genpname) using make
4446- You might first need to "make install" ICU so that the tools build can pick
4447  up the new definitions from the installed header files.
4448
4449* run genpname
4450  (builds both pnames.icu and propname_data.h)
4451- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4452- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
4453- rebuild ICU & tools
4454
4455* run genprops
4456- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
4457- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
4458- rebuild ICU & tools
4459
4460* update Java data files
4461- refresh just the UCD-related files, just to be safe
4462- see (ICU4C)/source/data/icu4j-readme.txt
4463- mkdir /tmp/icu4j
4464- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4465- copy the big-endian Unicode data files to another location,
4466  separate from the other data files
4467    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4468    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4469    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4470- refresh ICU4J
4471    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
4472
4473* should have updated the layout engine script codes but forgot
4474
4475---------------------------------------------------------------------------- ***
4476
4477Unicode 6.0 update
4478
4479*** related ICU Trac tickets
4480
44817264 Unicode 6.0 Update
4482
4483*** Unicode version numbers
4484- makedata.mak
4485- uchar.h
4486  (configure.in & configure: have been modified to extract the version from uchar.h)
4487- com.ibm.icu.util.VersionInfo
4488
4489*** data files & enums & parser code
4490
4491* file preparation
4492
4493~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
4494- This now prepares both unidata and testdata files in respective output subfolders.
4495
4496* PropertyAliases.txt changes
4497- new Script_Extensions property defined in the new ScriptExtensions.txt file
4498  but not listed in PropertyAliases.txt; reported to unicode.org;
4499  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
4500    scx; Script_Extensions
4501  -> uchar.h with new UProperty section
4502  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
4503
4504* PropertyValueAliases.txt changes
4505- 12 new block names:
4506  Alchemical_Symbols
4507  Bamum_Supplement
4508  Batak
4509  Brahmi
4510  CJK_Unified_Ideographs_Extension_D
4511  Emoticons
4512  Ethiopic_Extended_A
4513  Kana_Supplement
4514  Mandaic
4515  Miscellaneous_Symbols_And_Pictographs
4516  Playing_Cards
4517  Transport_And_Map_Symbols
4518  -> add to uchar.h
4519  -> add to UCharacter.UnicodeBlock
4520    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
4521            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4522- Joining_Group (jg) values:
4523  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
4524  -> uchar.h & UCharacter.JoiningGroup
4525- 3 new scripts:
4526  sc ; Batk      ; Batak
4527  sc ; Brah      ; Brahmi
4528  sc ; Mand      ; Mandaic
4529  -> remove these from SyntheticPropertyValueAliases.txt
4530  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
4531  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
4532      and in com.ibm.icu.dev.test.lang.TestUScript.java
4533- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4534  (added 2009-11-11..2010-07-18)
4535  Bass        259     Bassa Vah
4536  Dupl        755     Duployan shortand
4537  Elba        226     Elbasan
4538  Gran        343     Grantha
4539  Kpel        436     Kpelle
4540  Loma        437     Loma
4541  Mend        438     Mende
4542  Merc        101     Meroitic Cursive
4543  Narb        106     Old North Arabian
4544  Nbat        159     Nabataean
4545  Palm        126     Palmyrene
4546  Sind        318     Sindhi
4547  Wara        262     Warang Citi
4548  -> uscript.h
4549  -> com.ibm.icu.lang.UScript
4550    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4551    replace  public static final int \1 = \2;\3
4552  -> SyntheticPropertyValueAliases.txt
4553  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4554      and in com.ibm.icu.dev.test.lang.TestUScript.java
4555- ISO 15924 name change
4556  Mero        100     Meroitic Hieroglyphs (was Meroitic)
4557  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
4558- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
4559
4560* UnicodeData.txt changes
4561- new CJK block:
4562  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
4563  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
4564  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
4565
4566* build Unicode tools using CMake+make
4567
4568* run genpname/preparse.pl (on Linux)
4569  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4570  + make sure that data.h is writable
4571  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4572  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4573
4574* rebuild Unicode tools (at least genpname) using make
4575- You might first need to "make install" ICU so that the tools build can pick
4576  up the new definitions from the installed header files.
4577
4578* run genpname
4579- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4580- rebuild ICU & tools
4581
4582* update source/data/unidata/norm2/nfkc_cf.txt
4583- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
4584
4585* update source/data/unidata/norm2/uts46.txt
4586- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
4587  to ~/svn.icu/tools/trunk/src/unicode/py
4588- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
4589- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
4590- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
4591
4592* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4593  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4594- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4595- Unicode 6.0: U+2260, U+226E, U+226F
4596
4597* generate core properties data files
4598- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4599- rebuild ICU & tools
4600- run makeuca.sh so that genuca picks up the new nfc.nrm:
4601  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4602- rebuild ICU & tools
4603
4604* implement new Script_Extensions property (provisional)
4605- parser & generator: genprops & uprops.icu
4606- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
4607- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
4608
4609* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
4610- (one-time change)
4611- genbidi/gencase/genprops tools changes
4612- re-run makeprops.sh (see above)
4613- UCharacterProperty.java, UCharacterTypeIterator.java,
4614  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
4615  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
4616
4617* update Java data files
4618- refresh just the UCD-related files, just to be safe
4619- see (ICU4C)/source/data/icu4j-readme.txt
4620- mkdir /tmp/icu4j
4621- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4622  output:
4623    ...
4624    Unicode .icu files built to ./out/build/icudt45l
4625    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
4626    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4627    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
4628    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
4629    mkdir -p /tmp/icu4j/main/shared/data
4630    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4631- copy the big-endian Unicode data files to another location,
4632  separate from the other data files
4633    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4634    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
4635    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
4636    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
4637    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
4638    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4639    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
4640- refresh ICU4J
4641    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4642
4643* refresh Java test .txt files
4644- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4645
4646* un-hardcode normalization skippable (NF*_Inert) test data
4647- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
4648
4649* copy updated break iterator test files
4650- now handled by early ucdcopy.py and
4651  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
4652  (old instructions:
4653   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
4654   to ~/svn.icu/trunk/src/source/test/testdata)
4655- they are not used in ICU4J
4656
4657* UCA
4658
4659- get output from Mark's tools; look in
4660    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
4661    http://www.macchiato.com/unicode/utc/additional-uca-files
4662    http://www.unicode.org/Public/UCA/6.0.0/
4663    http://www.unicode.org/~mdavis/uca/
4664- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4665- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4666- update Han-implicit ranges for new CJK extensions:
4667  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4668- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
4669  do not add it into invuca so that tailoring primary-after an ignorable works
4670- genuca: permit space between [variable top] bytes
4671- ucol.cpp: treat noncharacters like unassigned rather than ignorable
4672- run makeuca.sh:
4673  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4674- rebuild ICU4C
4675- refresh ICU4J collation data:
4676  (subset of instructions above for properties data refresh, except copies all coll/*)
4677    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4678    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4679    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4680    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4681- update (ICU)/source/test/testdata/CollationTest_*.txt
4682  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4683  with output from Mark's Unicode tools
4684- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4685- note on intltest: if collate/UCAConformanceTest fails, then
4686  utility/MultithreadTest/TestCollators will fail as well;
4687  fix the conformance test before looking into the multi-thread test
4688
4689* When refreshing all of ICU4J data from ICU4C
4690- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4691- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4692or
4693- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4694
4695*** LayoutEngine script information
4696
4697(For details see the Unicode 5.2 change log below.)
4698
4699* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4700ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4701ScriptRunData.cpp, which is no longer needed.)
4702
4703The generated files have a current copyright date and "@draft" statement.
4704
4705* copy the above files into <icu>/source/layout, replacing the old files.
4706* fix mixed line endings
4707* review the diffs and fix incorrect @draft and missing aliases;
4708  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4709* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4710
4711---------------------------------------------------------------------------- ***
4712
4713Unicode 5.2 update
4714
4715*** related ICU Trac tickets
4716
47177084 Unicode 5.2
4718
47197167 verify collation bytes
47207235 Java test NAME_ALIAS
47217236 Java DerivedCoreProperties.txt test
47227237 Java BidiTest.txt
47237238 UTrie2 in core unidata
47247239 test for tailoring gaps
47257240 Java fix CollationMiscTest
47267243 update layout engine for Unicode 5.2
4727
4728*** Unicode version numbers
4729- makedata.mak
4730- uchar.h
4731- configure.in & configure
4732- update ucdVersion in gennames.c if an algorithmic range changes
4733
4734*** data files & enums & parser code
4735
4736* file preparation
4737
4738python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
4739- includes finding files regardless of version numbers,
4740  copying them, and performing the equivalent processing of the
4741  ucdstrip and ucdmerge tools on the desired set of files
4742
4743* notes on changes
4744- PropertyAliases.txt
4745  moved from numeric to enumerated:
4746    ccc       ; Canonical_Combining_Class
4747  new string properties:
4748    NFKC_CF   ; NFKC_Casefold
4749    Name_Alias; Name_Alias
4750  new binary properties:
4751    Cased     ; Cased
4752    CI        ; Case_Ignorable
4753    CWCF      ; Changes_When_Casefolded
4754    CWCM      ; Changes_When_Casemapped
4755    CWKCF     ; Changes_When_NFKC_Casefolded
4756    CWL       ; Changes_When_Lowercased
4757    CWT       ; Changes_When_Titlecased
4758    CWU       ; Changes_When_Uppercased
4759  new CJK Unihan properties (not supported by ICU)
4760- PropertyValueAliases.txt
4761  new block names
4762  new scripts
4763  one script code change:
4764    sc ; Qaai      ; Inherited
4765    ->
4766    sc ; Zinh      ; Inherited                        ; Qaai
4767  new Line_Break (lb) value:
4768    lb ; CP        ; Close_Parenthesis
4769  new Joining_Group (jg) values: Farsi_Yeh, Nya
4770  other new values:
4771    ccc; 214; ATA  ; Attached_Above
4772- DerivedBidiClass.txt
4773  new default-R range: U+1E800 - U+1EFFF
4774- UnicodeData.txt
4775  all of the ISO comments are gone
4776  new CJK block end:
4777    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
4778  new CJK block:
4779    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
4780    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
4781
4782* genpname
4783- run preparse.pl
4784  + cd \svn\icuproj\icu\trunk\source\tools\genpname
4785  + make sure that data.h is writable
4786  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
4787  + preparse.pl complains with errors like the following:
4788      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
4789    This is because ICU 4.0 had scripts from ISO 15924 which are now
4790    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
4791    and PropertyValueAliases.txt.
4792    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4793       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
4794  + preparse.pl complains with errors about block names missing from uchar.h; add them
4795
4796* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4797- new block & script values
4798  + 26 new blocks
4799    copy new blocks from Blocks.txt
4800    MS VC++ 2008 regular expression:
4801      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
4802      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
4803  + several new script values already added in ICU 4.0 for ISO 15924 coverage
4804    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
4805  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
4806  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
4807    (added to SyntheticPropertyValueAliases.txt)
4808- new Joining Group (JG) values: Farsi_Yeh, Nya
4809- new Line_Break (lb) value:
4810    lb ; CP        ; Close_Parenthesis
4811
4812* hardcoded Unihan range end/limit
4813- Unihan range end moves from 9FC3 to 9FCB
4814  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
4815  + do change gennames.c
4816
4817* Compare definitions of new binary properties with what we used to use
4818  in algorithms, to see if the definitions changed.
4819- Verified that definitions for Cased and Case_Ignorable are unchanged.
4820  The gencase tool now parses the newly public Case_Ignorable values
4821  in case the definition changes in the future.
4822
4823* uchar.c & uprops.h & uprops.c & genprops
4824- new numeric values that didn't exist in Unicode data before:
4825    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
4826  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
4827  therefore redesign the encoding of numeric types and values for formatVersion 6;
4828  design for simple numbers up to at least 144 ("one gross"),
4829  large values up to at least 10^20,
4830  and fractions with numerators -1..17 and denominators 1..16
4831  to cover current and expected future values
4832  (e.g., more Han numeric values, Meroitic twelfths)
4833
4834* reimplement Hangul_Syllable_Type for new Jamo characters
4835- the old code assumed that all Jamo characters are in the 11xx block
4836- Unicode 5.2 fills holes there and adds new Jamo characters in
4837    A960..A97F; Hangul Jamo Extended-A
4838  and in
4839    D7B0..D7FF; Hangul Jamo Extended-B
4840- Hangul_Syllable_Type can be trivially derived from a subset of
4841  Grapheme_Cluster_Break values
4842
4843* build Unicode data source code for hardcoding core data
4844C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
4845
4846ICU data make path is \svn\icuproj\icu\trunk\source\data\
4847ICU root path is \svn\icuproj\icu\trunk
4848Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4849Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4850Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4851Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4852Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4853Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4854Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4855Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
4856Creating data file for Unicode Property Names
4857Creating data file for Unicode Character Properties
4858Creating data file for Unicode Case Mapping Properties
4859Creating data file for Unicode BiDi/Shaping Properties
4860Creating data file for Unicode Normalization
4861Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
4862Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
4863
4864- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
4865  and rebuild the common library
4866
4867*** UCA
4868
4869- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
4870- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
4871- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
4872[ Begin obsolete instructions:
4873  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
4874    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
4875      on Windows:
4876        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
4877        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
4878  End obsolete instructions]
4879- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4880  not just the *_STUB.txt files
4881- note on intltest: if collate/UCAConformanceTest fails, then
4882  utility/MultithreadTest/TestCollators will fail as well;
4883  fix the conformance test before looking into the multi-thread test
4884
4885*** Implement Cased & Case_Ignorable properties
4886- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
4887- Problem: These properties should be disjoint, but aren't
4888- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
4889- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
4890
4891*** Implement Changes_When_Xyz properties
4892- without stored data
4893
4894*** Implement Name_Alias property
4895- add it as another name field in unames.icu
4896- make it available via u_charName() and UCharNameChoice and
4897- consider it in u_charFromName()
4898
4899*** Break iterators
4900
4901* Update break iterator rules to new UAX versions and new property values
4902* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
4903
4904*** new BidiTest file
4905- review format and data
4906- copy BidiTest.txt to source/test/testdata
4907- write test code using this data
4908- fix ICU code where it fails the conformance test
4909
4910*** Java
4911- generally, find and update code corresponding to C/C++
4912- UCharacter.UnicodeBlock constants:
4913  a) add an _ID integer per new block, update COUNT
4914  b) add a class instance per new block
4915     Visual Studio regex:
4916        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
4917        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4918- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
4919
4920- port test changes to Java
4921
4922*** LayoutEngine script information
4923
4924(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
4925
4926* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4927ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4928ScriptRunData.cpp, which is no longer needed.)
4929
4930The generated files have a current copyright date and "@draft" statement.
4931
4932-> Eric Mader wrote in email on 20090930:
4933    "I think the tool has been modified to update @draft to @stable for
4934     older scripts and to add @draft for new scripts.
4935     (I worked with an intern on this last year.)
4936     You should check the output after you run it."
4937
4938* copy the above files into <icu>/source/layout, replacing the old files.
4939* fix mixed line endings
4940* review the diffs and fix incorrect @draft and missing aliases
4941* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4942
4943Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4944and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4945
4946-> Eric Mader wrote in email on 20090930:
4947    "This is just a matter of making sure that all the per-script tables have
4948     entries for any new scripts that were added.
4949     If any new Indic characters were added, then the class tables in
4950     IndicClassTables.cpp should be updated to reflect this.
4951     John Emmons should know how to do this if it's required."
4952
4953* rebuild the layout and layoutex libraries.
4954
4955*** Documentation
4956- Update User Guide
4957  + Jamo_Short_Name, sfc->scf, binary property value aliases
4958
4959---------------------------------------------------------------------------- ***
4960
4961Unicode 5.1 update
4962
4963*** related ICU Trac tickets
4964
49655696 Update to Unicode 5.1
4966
4967*** Unicode version numbers
4968- makedata.mak
4969- uchar.h
4970- configure.in & configure
4971- update ucdVersion in gennames.c if an algorithmic range changes
4972
4973*** data files & enums & parser code
4974
4975* file preparation
4976- ucdstrip:
4977    DerivedCoreProperties.txt
4978    DerivedNormalizationProps.txt
4979    NormalizationTest.txt
4980    PropList.txt
4981    Scripts.txt
4982    GraphemeBreakProperty.txt
4983    SentenceBreakProperty.txt
4984    WordBreakProperty.txt
4985- ucdstrip and ucdmerge:
4986    EastAsianWidth.txt
4987    LineBreak.txt
4988
4989* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4990copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
4991copy 5.1.0\ucd\Blocks.txt ..\unidata\
4992copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
4993copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
4994copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4995copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4996copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4997copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4998copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
4999copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
5000copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
5001copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
5002copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
5003
5004ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
5005ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
5006ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
5007ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
5008ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
5009ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
5010ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
5011ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
5012ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
5013ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
5014
5015* genpname
5016- run preparse.pl
5017  + cd \svn\icuproj\icu\uni51\source\tools\genpname
5018  + make sure that data.h is writable
5019  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
5020  + preparse.pl complains with errors like the following:
5021      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
5022    This is because ICU 3.8 had scripts from ISO 15924 which are now
5023    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
5024    and PropertyValueAliases.txt.
5025    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
5026       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
5027  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
5028      N/Y, No/Yes, F/T, False/True
5029    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
5030       It will use further values from the file if present.
5031
5032* uchar.h & uscript.h & uprops.h & uprops.c & genprops
5033- new block & script values
5034  + 17 new blocks
5035  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
5036    (removed from SyntheticPropertyValueAliases.txt)
5037  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
5038    (added to SyntheticPropertyValueAliases.txt)
5039- uprops.icu (uprops.h) only provides 7 bits for script codes.
5040  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
5041  There is none above 127 yet which is the script code for an
5042  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
5043  script code values greater than 127.
5044  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
5045  in a parallel bit field, and that overflows now.
5046  Also, future values >=128 would be incompatible anyway.
5047  uprops.h is modified to move around several of the bit fields
5048  in the properties vector words, and now uses 8 bits for the script code.
5049  Two other bit fields also grow to accommodate future growth:
5050  Block (current count: 172) grows from 8 to 9 bits,
5051  and Word_Break grows from 4 to 5 bits.
5052- renamed property Simple_Case_Folding (sfc->scf)
5053  + nothing to be done: handled as normal alias
5054- new property JSN Jamo_Short_Name
5055  + no new API: only contributes to the Name property
5056- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
5057- new Joining Group (JG) value: Burushashki_Yeh_Barree
5058- new Sentence_Break (SB) values:
5059    SB ; CR        ; CR
5060    SB ; EX        ; Extend
5061    SB ; LF        ; LF
5062    SB ; SC        ; SContinue
5063- new Word_Break (WB) values:
5064    WB ; CR        ; CR
5065    WB ; Extend    ; Extend
5066    WB ; LF        ; LF
5067    WB ; MB        ; MidNumLet
5068
5069* Further changes in the 2008-02-29 update:
5070- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
5071  because they should not normally be invisible.
5072- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
5073- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
5074- new Word_Break (WB) value: NL=Newline
5075
5076* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
5077- Unihan range end moves from 9FBB to 9FC3
5078  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
5079  + do change gennames.c
5080
5081* build Unicode data source code for hardcoding core data
5082C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
5083
5084ICU data make path is \svn\icuproj\icu\uni51\source\data\
5085ICU root path is \svn\icuproj\icu\uni51
5086Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
5087Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
5088Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
5089Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
5090Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
5091Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
5092Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
5093Creating data file for Unicode Character Properties
5094Creating data file for Unicode Case Mapping Properties
5095Creating data file for Unicode BiDi/Shaping Properties
5096Creating data file for Unicode Normalization
5097Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
5098Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
5099
5100- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
5101  and rebuild the common library
5102
5103*** Break iterators
5104
5105* Update break iterator rules to new UAX versions and new property values
5106
5107*** UCA
5108
5109* update FractionalUCA.txt and UCARules.txt with new canonical closure
5110
5111*** Test suites
5112- Test that APIs using Unicode property value aliases (like UnicodeSet)
5113  support all of the boolean values N/Y, No/Yes, F/T, False/True
5114  -> TestBinaryValues() tests in both cintltst and intltest
5115
5116*** LayoutEngine script information
5117* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
5118ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
5119ScriptRunData.cpp, which is no longer needed.)
5120
5121The generated files have a current copyright date and "@draft" statement.
5122
5123* copy the above files into <icu>/source/layout, replacing the old files.
5124
5125Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
5126and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
5127
5128* rebuild the layout and layoutex libraries.
5129
5130*** Documentation
5131- Update User Guide
5132  + Jamo_Short_Name, sfc->scf, binary property value aliases
5133
5134---------------------------------------------------------------------------- ***
5135
5136Unicode 5.0 update
5137
5138*** related Jitterbugs
5139
51405084 RFE: Update to Unicode 5.0
5141
5142*** data files & enums & parser code
5143
5144* file preparation
5145- ucdstrip:
5146    DerivedCoreProperties.txt
5147    DerivedNormalizationProps.txt
5148    NormalizationTest.txt
5149    PropList.txt
5150    Scripts.txt
5151    GraphemeBreakProperty.txt
5152    SentenceBreakProperty.txt
5153    WordBreakProperty.txt
5154- ucdstrip and ucdmerge:
5155    EastAsianWidth.txt
5156    LineBreak.txt
5157
5158* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
5159copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
5160copy 5.0.0\ucd\Blocks.txt ..\unidata\
5161copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
5162copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
5163copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
5164copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
5165copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
5166copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
5167copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
5168copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
5169copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
5170copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
5171copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
5172
5173ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
5174ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
5175ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
5176ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
5177ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
5178ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
5179ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
5180ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
5181ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
5182ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
5183
5184* update FractionalUCA.txt and UCARules.txt with new canonical closure
5185
5186* genpname
5187- run preparse.pl
5188  + make sure that data.h is writable
5189  + perl preparse.pl \cvs\oss\icu > out.txt
5190
5191* uchar.h & uscript.h & uprops.h & uprops.c & genprops
5192- new block & script values
5193  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
5194
5195* build Unicode data source code for hardcoding core data
5196C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
5197
5198ICU data make path is \cvs\oss\icu\source\data\
5199ICU root path is \cvs\oss\icu
5200Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
5201[etc.]
5202Creating data file for Unicode Character Properties
5203Creating data file for Unicode Case Mapping Properties
5204Creating data file for Unicode BiDi/Shaping Properties
5205Creating data file for Unicode Normalization
5206Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
5207Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
5208
5209- copy the .c source files to C:\cvs\oss\icu\source\common
5210  and rebuild the common library
5211
5212*** Unicode version numbers
5213- makedata.mak
5214- uchar.h
5215- configure.in
5216
5217*** LayoutEngine script information
5218* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
5219ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
5220ScriptRunData.cpp, which is no longer needed.)
5221
5222The generated files have a current copyright date and "@draft" statement.
5223
5224* copy the above files into <icu>/source/layout, replacing the old files.
5225
5226Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
5227and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
5228
5229* rebuild the layout and layoutex libraries.
5230
5231---------------------------------------------------------------------------- ***
5232
5233Unicode 4.1 update
5234
5235*** related Jitterbugs
5236
52374332 RFE: Update to Unicode 4.1
52384157 RBBI, TR29 4.1 updates
5239
5240*** data files & enums & parser code
5241
5242* file preparation
5243- ucdstrip:
5244    DerivedCoreProperties.txt
5245    DerivedNormalizationProps.txt
5246    NormalizationTest.txt
5247    GraphemeBreakProperty.txt
5248    SentenceBreakProperty.txt
5249    WordBreakProperty.txt
5250- ucdstrip and ucdmerge:
5251    EastAsianWidth.txt
5252    LineBreak.txt
5253
5254* add new files to the repository
5255    GraphemeBreakProperty.txt
5256    SentenceBreakProperty.txt
5257    WordBreakProperty.txt
5258
5259* update FractionalUCA.txt and UCARules.txt with new canonical closure
5260
5261* genpname
5262- handle new enumerated properties in sub read_uchar
5263- run preparse.pl
5264
5265* uchar.h & uscript.h & uprops.h & uprops.c & genprops
5266- new binary properties
5267  + Pattern_Syntax
5268  + Pattern_White_Space
5269- new enumerated properties
5270  + Grapheme_Cluster_Break
5271  + Sentence_Break
5272  + Word_Break
5273- new block & script & line break values
5274
5275* gencase
5276- case-ignorable changes
5277  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
5278  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
5279
5280*** Unicode version numbers
5281- makedata.mak
5282- uchar.h
5283- configure.in
5284
5285*** tests
5286- verify that u_charMirror() round-trips
5287- test all new properties and some new values of old properties
5288
5289*** other code
5290
5291* hardcoded Unihan range end/limit
5292- Unihan range end moves from 9FA5 to 9FBB
5293  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
5294  + do not modify BOCU/BOCSU code because that would change the encoding
5295    and break binary compatibility!
5296  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
5297    NamePrepProfile.txt
5298  + ignore trietest.c: test data is arbitrary
5299  + ignore tstnorm.cpp: test optimization, not important
5300  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
5301  + do change line_th.txt and word_th.txt
5302    by replacing hardcoded ranges with the new property values
5303  + do change gennames.c
5304
5305source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
5306source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
5307source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
5308
5309* case mappings
5310- compare new special casing context conditions with previous ones
5311  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
5312
5313* genpname
5314- consider storing only the short name if it is the same as the long name
5315
5316*** other reviews
5317- UAX #29 changes (grapheme/word/sentence breaks)
5318- UAX #14 changes (line breaks)
5319- Pattern_Syntax & Pattern_White_Space
5320
5321---------------------------------------------------------------------------- ***
5322
5323Unicode 4.0.1 update
5324
5325*** related Jitterbugs
5326
53273170 RFE: Update to Unicode 4.0.1
53283171 Add new Unicode 4.0.1 properties
53293520 use Unicode 4.0.1 updates for break iteration
5330
5331*** data files & enums & parser code
5332
5333* file preparation
5334- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
5335- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
5336
5337* file fixes
5338- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
5339  according to PRI #26
5340  http://www.unicode.org/review/resolved-pri.html#pri26
5341- undone again because no corrigendum in sight;
5342  instead modified tests to not check consistency on this for Unicode 4.0.1
5343
5344* ucdterms.txt
5345- update from http://www.unicode.org/copyright.html
5346  formatted for plain text
5347
5348* uchar.h & uprops.h & uprops.c & genprops
5349- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
5350- add U_LB_INSEPARABLE due to a spelling fix
5351  + put short name comment only on line with new constant
5352    for genpname perl script parser
5353- new binary properties
5354  + STerm
5355  + Variation_Selector
5356
5357* genpname
5358- fix genpname perl script so that it doesn't choke on more than 2 names per property value
5359- perl script: correctly calculate the maximum number of fields per row
5360
5361* uscript.h
5362- new script code Hrkt=Katakana_Or_Hiragana
5363
5364* gennorm.c track changes in DerivedNormalizationProps.txt
5365- "FNC" -> "FC_NFKC"
5366- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
5367
5368* genprops/props2.c track changes in DerivedNumericValues.txt
5369- changed from 3 columns to 2, dropping the numeric type
5370  + assume that the type is always numeric for Han characters,
5371    and that only those are added in addition to what UnicodeData.txt lists
5372
5373*** Unicode version numbers
5374- makedata.mak
5375- uchar.h
5376- configure.in
5377
5378*** tests
5379- update test of default bidi classes according to PRI #28
5380  /tsutil/cucdtst/TestUnicodeData
5381  http://www.unicode.org/review/resolved-pri.html#pri28
5382- bidi tests: change exemplar character for ES depending on Unicode version
5383- change hardcoded expected property values where they change
5384
5385*** other code
5386
5387* name matching
5388- read UCD.html
5389
5390* scripts
5391- use new Hrkt=Katakana_Or_Hiragana
5392
5393* ZWJ & ZWNJ
5394- are now part of combining character sequences
5395- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
5396