• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2016 and later: Unicode, Inc. and others.
2* License & terms of use: http://www.unicode.org/copyright.html
3* Copyright (C) 2004-2016, International Business Machines
4* Corporation and others.  All Rights Reserved.
5*
6*   file name:  changes.txt
7*   encoding:   US-ASCII
8*   tab size:   8 (not used)
9*   indentation:4
10*
11*   created on: 2004may06
12*   created by: Markus W. Scherer
13
14* change log for Unicode updates
15
16For an overview, see https://unicode-org.github.io/icu/processes/unicode-update
17
18Notes:
19
20This log includes several command lines as used in the update process.
21Some of them include a console prompt with the present working directory (pwd) followed by a $ sign.
22Use a console window that is set to that directory, or cd to there,
23and then paste the command that follows the $ sign.
24
25Most command lines use environment variables to make them more portable across versions
26and machine configurations. When you set up a console window, copy & paste the `export` commands
27from near the top of the current section before pasting tool command lines.
28Adjust the environment variables to the current version and your machine setup.
29(The command lines are currently as used on Linux.)
30
31---------------------------------------------------------------------------- ***
32
33* New ISO 15924 script codes
34
35Normally, add new script codes as part of a Unicode update.
36See https://unicode-org.github.io/icu/processes/release/tasks/standards#update-script-code-enums
37and see the change logs below.
38
39---------------------------------------------------------------------------- ***
40
41CLDR 43 root collation update for ICU 73
42
43Partial update only for the root collation.
44See
45- https://unicode-org.atlassian.net/browse/CLDR-15946
46  Treat quote marks as equivalent when strength=UCOL_PRIMARY
47- https://github.com/unicode-org/cldr/pull/2691
48  CLDR-15946 make fancy quotes primary-equal to ASCII fallbacks
49- https://github.com/unicode-org/cldr/pull/2833
50  CLDR-15946 make fancy quotes secondary-different from each other
51
52The related changes to tailorings were already integrated in an earlier PR for
53https://unicode-org.atlassian.net/browse/ICU-22220 ICU 73rc BRS.
54
55This update is for the root collation,
56which is handled by different tools than the locale data updates.
57
58* Command-line environment setup
59
60export UNICODE_DATA=~/unidata/uni15/20220830
61export CLDR_SRC=~/cldr/uni/src
62export ICU_ROOT=~/icu/uni
63export ICU_SRC=$ICU_ROOT/src
64export ICUDT=icudt73b
65export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
66export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
67export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
68
69*** Configure: Build Unicode data for ICU4J
70  cd $ICU_ROOT/dbg/icu4c
71  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
72
73* Bazel build process
74
75See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
76for an overview and for setup instructions.
77
78Consider running `bazelisk --version` outside of the $ICU_SRC folder
79to find out the latest `bazel` version, and
80copying that version number into the $ICU_SRC/.bazeliskrc config file.
81(Revert if you find incompatibilities, or, better, update our build & config files.)
82
83* generate data files
84
85- remember to define the environment variables
86  (see the start of the section for this Unicode version)
87- cd $ICU_SRC
88- optional but not necessary:
89    bazelisk clean
90      or even
91    bazelisk clean --expunge
92- build/bootstrap/generate new files:
93    icu4c/source/data/unidata/generate.sh
94
95* collation: CLDR collation root, UCA DUCET
96
97- UCA DUCET goes into Mark's Unicode tools,
98  and a tool-tailored version goes into CLDR, see
99    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
100
101- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
102    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
103- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
104    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
105    (note removing the underscore before "Rules")
106    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
107- restore TODO diffs in UCARules.txt
108    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
109- update (ICU4C)/source/test/testdata/CollationTest_*.txt
110  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
111  from the CLDR root files (..._CLDR_..._SHORT.txt)
112    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
113    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
114    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
115- if CLDR common/uca/unihan-index.txt changes, then update
116  CLDR common/collation/root.xml <collation type="private-unihan">
117  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
118
119- generate data files, as above (generate.sh), now to pick up new collation data
120- rebuild ICU4C (make clean, make check, as usual)
121
122* run & fix ICU4C tests, now with new CLDR collation root data
123- run all tests with the collation test data *_SHORT.txt or the full files
124  (the full ones have comments, useful for debugging)
125- note on intltest: if collate/UCAConformanceTest fails, then
126  utility/MultithreadTest/TestCollators will fail as well;
127  fix the conformance test before looking into the multi-thread test
128
129* update Java data files
130- refresh just the UCD/UCA-related/derived files, just to be safe
131- see (ICU4C)/source/data/icu4j-readme.txt
132- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
133- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
134    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
135    you need to reconfigure with unicore data; see the "configure" line above.
136  output:
137    ...
138    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
139    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt73b
140    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt73b
141    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt73l.dat ./out/icu4j/icudt73b.dat -s ./out/build/icudt73l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt73b
142    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt73b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt73b"
143    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt73b/
144    mkdir -p /tmp/icu4j/main/shared/data
145    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
146    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt73b/
147    mkdir -p /tmp/icu4j/main/shared/data
148    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
149    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
150- copy the big-endian Unicode data files to another location,
151  separate from the other data files,
152  and then refresh ICU4J
153    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
154    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
155    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
156    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
157- new for ICU 73: also copy the binary data files directly into the ICU4J tree
158    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* $ICU_SRC/icu4j/maven-build/maven-icu4j-datafiles/src/main/resources/com/ibm/icu/impl/data/$ICUDT/coll
159
160* When refreshing all of ICU4J data from ICU4C
161- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
162- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
163or
164- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
165
166* refresh Java test .txt files
167- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
168    cd $ICU_SRC/icu4c/source/data/unidata
169    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
170    cd ../../test/testdata
171    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
172    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
173
174* run & fix ICU4J tests
175
176*** merge the Unicode update branch back onto the main branch
177- do not merge the icudata.jar and testdata.jar,
178  instead rebuild them from merged & tested ICU4C
179- if there is a merge conflict in icudata.jar, here is one way to deal with it:
180  +   remove icudata.jar from the commit so that rebasing is trivial
181  + ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
182  + ~/icu/uni/src$ git commit -a --amend
183  +   switch to main, pull updates, switch back to the dev branch
184  + ~/icu/uni/src$ git rebase main
185  +   rebuild icudata.jar
186  + ~/icu/uni/src$ git commit -a --amend
187  + ~/icu/uni/src$ git push -f
188- make sure that changes to Unicode tools are checked in:
189  https://github.com/unicode-org/unicodetools
190
191---------------------------------------------------------------------------- ***
192
193Unicode 15.0 update for ICU 72
194
195https://www.unicode.org/versions/Unicode15.0.0/
196https://www.unicode.org/versions/beta-15.0.0.html
197https://www.unicode.org/Public/15.0.0/ucd/
198https://www.unicode.org/reports/uax-proposed-updates.html
199https://www.unicode.org/reports/tr44/tr44-29.html
200
201https://unicode-org.atlassian.net/browse/ICU-21980 Unicode 15
202https://unicode-org.atlassian.net/browse/CLDR-15516 Unicode 15
203https://unicode-org.atlassian.net/browse/CLDR-15253 Unicode 15 script metadata (in CLDR 41)
204
205* Command-line environment setup
206
207export UNICODE_DATA=~/unidata/uni15/20220830
208export CLDR_SRC=~/cldr/uni/src
209export ICU_ROOT=~/icu/uni
210export ICU_SRC=$ICU_ROOT/src
211export ICUDT=icudt72b
212export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
213export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
214export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
215
216*** Unicode version numbers
217- makedata.mak
218- uchar.h
219- com.ibm.icu.util.VersionInfo
220- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
221
222- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
223    so that the makefiles see the new version number.
224  cd $ICU_ROOT/dbg/icu4c
225  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
226
227*** data files & enums & parser code
228
229* download files
230- same as for the early Unicode Tools setup and data refresh:
231  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
232  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
233- mkdir -p $UNICODE_DATA
234- download Unicode files into $UNICODE_DATA
235  + subfolders: emoji, idna, security, ucd, uca
236  + old way of fetching files: from the "Public" area on unicode.org
237    ~ inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
238    ~ split Unihan into single-property files
239      ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
240  + new way of fetching files, if available:
241    copy the files from a Unicode Tools workspace that is up to date with
242    https://github.com/unicode-org/unicodetools
243    and which might at this point be *ahead* of "Public"
244    ~ before the Unicode release copy files from "dev" subfolders, for example
245      https://github.com/unicode-org/unicodetools/tree/main/unicodetools/data/ucd/dev
246  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
247    or from the UCD/cldr/ output folder of the Unicode Tools:
248    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
249  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
250    or
251  cp ~/unitools/mine/Generated/UCD/15.0.0/cldr/GraphemeBreakTest-cldr.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
252
253* for manual diffs and for Unicode Tools input data updates:
254  remove version suffixes from the file names
255    ~$ unidata/desuffixucd.py $UNICODE_DATA
256  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
257
258* process and/or copy files
259- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
260  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
261  + For debugging, and tweaking how ppucd.txt is written,
262    the tool has an --only_ppucd option:
263    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
264
265- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
266
267* new constants for new property values
268- preparseucd.py error:
269    ValueError: missing uchar.h enum constants for some property values: [('blk', {'Nag_Mundari', 'CJK_Ext_H', 'Kawi', 'Kaktovik_Numerals', 'Devanagari_Ext_A', 'Arabic_Ext_C', 'Cyrillic_Ext_D'}), ('sc', {'Nagm', 'Kawi'})]
270  = PropertyValueAliases.txt new property values (diff old & new .txt files)
271    ~/unidata$ diff -u uni14/20210922/ucd/PropertyValueAliases.txt uni15/beta/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
272    +age; 15.0                             ; V15_0
273    +blk; Arabic_Ext_C                     ; Arabic_Extended_C
274    +blk; CJK_Ext_H                        ; CJK_Unified_Ideographs_Extension_H
275    +blk; Cyrillic_Ext_D                   ; Cyrillic_Extended_D
276    +blk; Devanagari_Ext_A                 ; Devanagari_Extended_A
277    +blk; Kaktovik_Numerals                ; Kaktovik_Numerals
278    +blk; Kawi                             ; Kawi
279    +blk; Nag_Mundari                      ; Nag_Mundari
280    +sc ; Kawi                             ; Kawi
281    +sc ; Nagm                             ; Nag_Mundari
282  -> add new blocks to uchar.h before UBLOCK_COUNT
283    use long property names for enum constants,
284    for the trailing comment get the block start code point: diff old & new Blocks.txt
285    ~/unidata$ diff -u uni14/20210922/ucd/Blocks.txt uni15/beta/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
286    +10EC0..10EFF; Arabic Extended-C
287    +11B00..11B5F; Devanagari Extended-A
288    +11F00..11F5F; Kawi
289    -13430..1343F; Egyptian Hieroglyph Format Controls
290    +13430..1345F; Egyptian Hieroglyph Format Controls
291    +1D2C0..1D2DF; Kaktovik Numerals
292    +1E030..1E08F; Cyrillic Extended-D
293    +1E4D0..1E4FF; Nag Mundari
294    +31350..323AF; CJK Unified Ideographs Extension H
295    (ignore blocks whose end code point changed)
296  -> add new blocks to UCharacter.UnicodeBlock IDs
297    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
298            replace  public static final int \1_ID = \2; \3
299  -> add new blocks to UCharacter.UnicodeBlock objects
300    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
301            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
302  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
303    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
304            replace  public static final int \1 = \2; \3
305  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
306      and in com.ibm.icu.dev.test.lang.TestUScript.java
307
308* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
309    (not strictly necessary for NOT_ENCODED scripts)
310  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
311
312* build ICU
313  to make sure that there are no syntax errors
314
315  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
316
317* update spoof checker UnicodeSet initializers:
318    inclusionPat & recommendedPat in i18n/uspoof.cpp
319    INCLUSION & RECOMMENDED in SpoofChecker.java
320- make sure that the Unicode Tools tree contains the latest security data files
321- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
322- run the tool (no special environment variables needed)
323- copy & paste from the Console output into the .cpp & .java files
324
325* Bazel build process
326
327See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
328for an overview and for setup instructions.
329
330Consider running `bazelisk --version` outside of the $ICU_SRC folder
331to find out the latest `bazel` version, and
332copying that version number into the $ICU_SRC/.bazeliskrc config file.
333(Revert if you find incompatibilities, or, better, update our build & config files.)
334
335* generate data files
336
337- remember to define the environment variables
338  (see the start of the section for this Unicode version)
339- cd $ICU_SRC
340- optional but not necessary:
341    bazelisk clean
342- build/bootstrap/generate new files:
343    icu4c/source/data/unidata/generate.sh
344
345* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
346  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
347- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
348    ~/unitools/mine/src$ grep disallowed_STD3_valid unicodetools/data/idna/dev/IdnaMappingTable.txt
349- Unicode 6.0..15.0: U+2260, U+226E, U+226F
350- nothing new in this Unicode version, no test file to update
351
352* run & fix ICU4C tests
353- Note: Some of the collation data and test data will be updated below,
354  so at this time we might get some collation test failures.
355  Ignore these for now.
356- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
357  (no rule changes in Unicode 15)
358- update CLDR GraphemeBreakTest.txt
359    cd ~/unitools/mine/Generated
360    cp UCD/15.0.0/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
361    cp UCD/15.0.0/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
362    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
363- Andy helps with RBBI & spoof check test failures
364
365* collation: CLDR collation root, UCA DUCET
366
367- UCA DUCET goes into Mark's Unicode tools,
368  and a tool-tailored version goes into CLDR, see
369    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
370
371- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
372    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
373- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
374    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
375    (note removing the underscore before "Rules")
376    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
377- restore TODO diffs in UCARules.txt
378    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
379- update (ICU4C)/source/test/testdata/CollationTest_*.txt
380  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
381  from the CLDR root files (..._CLDR_..._SHORT.txt)
382    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
383    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
384    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
385- if CLDR common/uca/unihan-index.txt changes, then update
386  CLDR common/collation/root.xml <collation type="private-unihan">
387  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
388
389- generate data files, as above (generate.sh), now to pick up new collation data
390- update CollationFCD.java:
391  copy & paste the initializers of lcccIndex[] etc. from
392    ICU4C/source/i18n/collationfcd.cpp to
393    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
394- rebuild ICU4C (make clean, make check, as usual)
395
396* Unihan collators
397    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
398- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
399  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
400- generate ICU zh collation data
401    instructions inspired by
402    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
403    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
404  + setup:
405    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
406        (didn't work without setting JAVA_HOME,
407         nor with the Google default of /usr/local/buildtools/java/jdk
408         [Google security limitations in the XML parser])
409    export TOOLS_ROOT=~/icu/uni/src/tools
410    export CLDR_DIR=~/cldr/uni/src
411    export CLDR_DATA_DIR=~/cldr/uni/src
412        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
413    cd "$TOOLS_ROOT/cldr/lib"
414    ./install-cldr-jars.sh "$CLDR_DIR"
415  + generate the files we need
416    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
417    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
418  + diff
419    cd $ICU_SRC
420    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
421    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
422  + copy into the source tree
423    cd $ICU_SRC
424    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
425    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
426- rebuild ICU4C
427
428* run & fix ICU4C tests, now with new CLDR collation root data
429- run all tests with the collation test data *_SHORT.txt or the full files
430  (the full ones have comments, useful for debugging)
431- note on intltest: if collate/UCAConformanceTest fails, then
432  utility/MultithreadTest/TestCollators will fail as well;
433  fix the conformance test before looking into the multi-thread test
434
435* update Java data files
436- refresh just the UCD/UCA-related/derived files, just to be safe
437- see (ICU4C)/source/data/icu4j-readme.txt
438- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
439- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
440    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
441    you need to reconfigure with unicore data; see the "configure" line above.
442  output:
443    ...
444    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
445    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt72b
446    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt72b
447    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt72l.dat ./out/icu4j/icudt72b.dat -s ./out/build/icudt72l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt72b
448    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt72b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt72b"
449    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt72b/
450    mkdir -p /tmp/icu4j/main/shared/data
451    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
452    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt72b/
453    mkdir -p /tmp/icu4j/main/shared/data
454    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
455    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
456- copy the big-endian Unicode data files to another location,
457  separate from the other data files,
458  and then refresh ICU4J
459    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
460    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
461    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
462    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
463    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
464    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
465    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
466    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
467    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
468    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
469
470* When refreshing all of ICU4J data from ICU4C
471- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
472- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
473or
474- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
475
476* refresh Java test .txt files
477- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
478    cd $ICU_SRC/icu4c/source/data/unidata
479    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
480    cd ../../test/testdata
481    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
482    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
483
484* run & fix ICU4J tests
485
486*** API additions
487- send notice to icu-design about new born-@stable API (enum constants etc.)
488
489*** CLDR numbering systems
490- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
491  for example:
492    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
493    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-15.txt
494    ~/icu/uni/src$ diff -u /tmp/icu/nv4-14.txt /tmp/icu/nv4-15.txt
495    -->
496    +cp;11F54;-Alpha;gc=Nd;InSC=Number;lb=NU;na=KAWI DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
497    +cp;1E4F4;-Alpha;gc=Nd;-IDS;lb=NU;na=NAG MUNDARI DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
498  or:
499    ~/unitools/mine/src$ diff -u unicodetools/data/ucd/14.0.0-Update/extracted/DerivedGeneralCategory.txt unicodetools/data/ucd/dev/extracted/DerivedGeneralCategory.txt | grep '; Nd' | egrep '^\+'
500    -->
501    +11F50..11F59  ; Nd #  [10] KAWI DIGIT ZERO..KAWI DIGIT NINE
502    +1E4F0..1E4F9  ; Nd #  [10] NAG MUNDARI DIGIT ZERO..NAG MUNDARI DIGIT NINE
503  Unicode 15:
504    kawi 11F50..11F59 Kawi
505    nagm 1E4F0..1E4F9 Nag Mundari
506    https://github.com/unicode-org/cldr/pull/2041
507
508*** merge the Unicode update branches back onto the trunk
509- do not merge the icudata.jar and testdata.jar,
510  instead rebuild them from merged & tested ICU4C
511- if there is a merge conflict in icudata.jar, here is one way to deal with it:
512  +   remove icudata.jar from the commit so that rebasing is trivial
513  + ~/icu/uni/src$ git restore --source=main icu4j/main/shared/data/icudata.jar
514  + ~/icu/uni/src$ git commit -a --amend
515  +   switch to main, pull updates, switch back to the dev branch
516  + ~/icu/uni/src$ git rebase main
517  +   rebuild icudata.jar
518  + ~/icu/uni/src$ git commit -a --amend
519  + ~/icu/uni/src$ git push -f
520- make sure that changes to Unicode tools are checked in:
521  https://github.com/unicode-org/unicodetools
522
523---------------------------------------------------------------------------- ***
524
525Unicode 14.0 update for ICU 70
526
527https://www.unicode.org/versions/Unicode14.0.0/
528https://www.unicode.org/versions/beta-14.0.0.html
529https://www.unicode.org/Public/14.0.0/ucd/
530https://www.unicode.org/reports/uax-proposed-updates.html
531https://www.unicode.org/reports/tr44/tr44-27.html
532
533https://unicode-org.atlassian.net/browse/CLDR-14801
534https://unicode-org.atlassian.net/browse/ICU-21635
535
536* Command-line environment setup
537
538export UNICODE_DATA=~/unidata/uni14/20210903
539export CLDR_SRC=~/cldr/uni/src
540export ICU_ROOT=~/icu/uni
541export ICU_SRC=$ICU_ROOT/src
542export ICUDT=icudt70b
543export ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
544export ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
545export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
546
547*** Unicode version numbers
548- makedata.mak
549- uchar.h
550- com.ibm.icu.util.VersionInfo
551- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
552
553- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
554    so that the makefiles see the new version number.
555  cd $ICU_ROOT/dbg/icu4c
556  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
557
558*** data files & enums & parser code
559
560* download files
561- same as for the early Unicode Tools setup and data refresh:
562  https://github.com/unicode-org/unicodetools/blob/main/docs/index.md
563  https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md
564- mkdir -p $UNICODE_DATA
565- download Unicode files into $UNICODE_DATA
566  + subfolders: emoji, idna, security, ucd, uca
567  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
568  + split Unihan into single-property files
569    ~/unitools/mine/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
570  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
571    or from the UCD/cldr/ output folder of the Unicode Tools:
572    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
573  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
574    or
575  cp ~/unitools/mine/Generated/UCD/d19/cldr/GraphemeBreakTest-cldr-14.0.0d19.txt icu4c/source/test/testdata/GraphemeBreakTest.txt
576
577* for manual diffs and for Unicode Tools input data updates:
578  remove version suffixes from the file names
579    ~$ unidata/desuffixucd.py $UNICODE_DATA
580  (see https://github.com/unicode-org/unicodetools/blob/main/docs/inputdata.md)
581
582* process and/or copy files
583- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
584  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
585  + For debugging, and tweaking how ppucd.txt is written,
586    the tool has an --only_ppucd option:
587    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
588
589- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
590
591* new constants for new property values
592- preparseucd.py error:
593    ValueError: missing uchar.h enum constants for some property values:
594    [(u'blk', set([u'Toto', u'Tangsa', u'Cypro_Minoan', u'Arabic_Ext_B', u'Vithkuqi', u'Old_Uyghur', u'Latin_Ext_F', u'UCAS_Ext_A', u'Kana_Ext_B', u'Ethiopic_Ext_B', u'Latin_Ext_G', u'Znamenny_Music'])),
595    (u'jg', set([u'Vertical_Tail', u'Thin_Yeh'])),
596    (u'sc', set([u'Toto', u'Ougr', u'Vith', u'Tnsa', u'Cpmn']))]
597  = PropertyValueAliases.txt new property values (diff old & new .txt files)
598    ~/unidata$ diff -u uni13/20200304/ucd/PropertyValueAliases.txt uni14/20210609/ucd/PropertyValueAliases.txt | egrep '^[-+][a-zA-Z]'
599    +age; 14.0                             ; V14_0
600    +blk; Arabic_Ext_B                     ; Arabic_Extended_B
601    +blk; Cypro_Minoan                     ; Cypro_Minoan
602    +blk; Ethiopic_Ext_B                   ; Ethiopic_Extended_B
603    +blk; Kana_Ext_B                       ; Kana_Extended_B
604    +blk; Latin_Ext_F                      ; Latin_Extended_F
605    +blk; Latin_Ext_G                      ; Latin_Extended_G
606    +blk; Old_Uyghur                       ; Old_Uyghur
607    +blk; Tangsa                           ; Tangsa
608    +blk; Toto                             ; Toto
609    +blk; UCAS_Ext_A                       ; Unified_Canadian_Aboriginal_Syllabics_Extended_A
610    +blk; Vithkuqi                         ; Vithkuqi
611    +blk; Znamenny_Music                   ; Znamenny_Musical_Notation
612    +jg ; Thin_Yeh                         ; Thin_Yeh
613    +jg ; Vertical_Tail                    ; Vertical_Tail
614    +sc ; Cpmn                             ; Cypro_Minoan
615    +sc ; Ougr                             ; Old_Uyghur
616    +sc ; Tnsa                             ; Tangsa
617    +sc ; Toto                             ; Toto
618    +sc ; Vith                             ; Vithkuqi
619  -> add new blocks to uchar.h before UBLOCK_COUNT
620    use long property names for enum constants,
621    for the trailing comment get the block start code point: diff old & new Blocks.txt
622    ~/unidata$ diff -u uni13/20200304/ucd/Blocks.txt uni14/20210609/ucd/Blocks.txt | egrep '^[-+][0-9A-Z]'
623    +0870..089F; Arabic Extended-B
624    +10570..105BF; Vithkuqi
625    +10780..107BF; Latin Extended-F
626    +10F70..10FAF; Old Uyghur
627    -11700..1173F; Ahom
628    +11700..1174F; Ahom
629    +11AB0..11ABF; Unified Canadian Aboriginal Syllabics Extended-A
630    +12F90..12FFF; Cypro-Minoan
631    +16A70..16ACF; Tangsa
632    -18D00..18D8F; Tangut Supplement
633    +18D00..18D7F; Tangut Supplement
634    +1AFF0..1AFFF; Kana Extended-B
635    +1CF00..1CFCF; Znamenny Musical Notation
636    +1DF00..1DFFF; Latin Extended-G
637    +1E290..1E2BF; Toto
638    +1E7E0..1E7FF; Ethiopic Extended-B
639    (ignore blocks whose end code point changed)
640  -> add new blocks to UCharacter.UnicodeBlock IDs
641    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
642            replace  public static final int \1_ID = \2; \3
643  -> add new blocks to UCharacter.UnicodeBlock objects
644    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
645            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
646  -> add new scripts to uscript.h & com.ibm.icu.lang.UScript
647    Eclipse find     USCRIPT_([^ ]+) *= ([0-9]+),(/.+)
648            replace  public static final int \1 = \2; \3
649  -> for new scripts: fix expectedLong names in cintltst/cucdapi.c/TestUScriptCodeAPI()
650      and in com.ibm.icu.dev.test.lang.TestUScript.java
651  -> add new joining groups to uchar.h & UCharacter.JoiningGroup
652
653* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
654    (not strictly necessary for NOT_ENCODED scripts)
655  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
656
657* build ICU
658  to make sure that there are no syntax errors
659
660  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 tests &> out.txt ; tail -n 30 out.txt ; date
661
662* update spoof checker UnicodeSet initializers:
663    inclusionPat & recommendedPat in i18n/uspoof.cpp
664    INCLUSION & RECOMMENDED in SpoofChecker.java
665- make sure that the Unicode Tools tree contains the latest security data files
666- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
667- run the tool (no special environment variables needed)
668- copy & paste from the Console output into the .cpp & .java files
669
670* Bazel build process
671
672See https://unicode-org.github.io/icu/processes/unicode-update#bazel-build-process
673for an overview and for setup instructions.
674
675Consider running `bazelisk --version` outside of the $ICU_SRC folder
676to find out the latest `bazel` version, and
677copying that version number into the $ICU_SRC/.bazeliskrc config file.
678(Revert if you find incompatibilities, or, better, update our build & config files.)
679
680* generate data files
681
682- remember to define the environment variables
683  (see the start of the section for this Unicode version)
684- cd $ICU_SRC
685- optional but not necessary:
686    bazelisk clean
687- build/bootstrap/generate new files:
688    icu4c/source/data/unidata/generate.sh
689
690* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
691  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
692- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
693- Unicode 6.0..14.0: U+2260, U+226E, U+226F
694- nothing new in this Unicode version, no test file to update
695
696* run & fix ICU4C tests
697- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
698- update CLDR GraphemeBreakTest.txt
699    cd ~/unitools/mine/Generated
700    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.txt $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
701    cp UCD/d22d/cldr/GraphemeBreakTest-cldr.html $CLDR_SRC/common/properties/segments/GraphemeBreakTest.html
702    cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt $ICU_SRC/icu4c/source/test/testdata
703- Andy helps with RBBI & spoof check test failures
704
705* collation: CLDR collation root, UCA DUCET
706
707- UCA DUCET goes into Mark's Unicode tools,
708  and a tool-tailored version goes into CLDR, see
709    https://github.com/unicode-org/unicodetools/blob/main/docs/uca/index.md
710
711- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
712    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
713- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
714    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
715    (note removing the underscore before "Rules")
716    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
717- restore TODO diffs in UCARules.txt
718    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
719- update (ICU4C)/source/test/testdata/CollationTest_*.txt
720  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
721  from the CLDR root files (..._CLDR_..._SHORT.txt)
722    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
723    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
724    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
725- if CLDR common/uca/unihan-index.txt changes, then update
726  CLDR common/collation/root.xml <collation type="private-unihan">
727  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
728
729- generate data files, as above (generate.sh), now to pick up new collation data
730- update CollationFCD.java:
731  copy & paste the initializers of lcccIndex[] etc. from
732    ICU4C/source/i18n/collationfcd.cpp to
733    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
734- rebuild ICU4C (make clean, make check, as usual)
735
736* Unihan collators
737    https://github.com/unicode-org/unicodetools/blob/main/docs/unihan.md
738- run Unicode Tools GenerateUnihanCollators & GenerateUnihanCollatorFiles,
739  check CLDR diffs, copy to CLDR, test CLDR, ... as documented there
740- generate ICU zh collation data
741    instructions inspired by
742    https://github.com/unicode-org/icu/blob/main/tools/cldr/cldr-to-icu/README.txt and
743    https://github.com/unicode-org/icu/blob/main/icu4c/source/data/cldr-icu-readme.txt
744  + setup:
745    export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
746        (didn't work without setting JAVA_HOME,
747         nor with the Google default of /usr/local/buildtools/java/jdk
748         [Google security limitations in the XML parser])
749    export TOOLS_ROOT=~/icu/uni/src/tools
750    export CLDR_DIR=~/cldr/uni/src
751    export CLDR_DATA_DIR=~/cldr/uni/src
752        (pointing to the "raw" data, not cldr-staging/.../production should be ok for the relevant files)
753    cd "$TOOLS_ROOT/cldr/lib"
754    ./install-cldr-jars.sh "$CLDR_DIR"
755  + generate the files we need
756    cd "$TOOLS_ROOT/cldr/cldr-to-icu"
757    ant -f build-icu-data.xml -DoutDir=/tmp/icu -DoutputTypes=coll,transforms -DlocaleIdFilter='zh.*'
758  + diff
759    cd $ICU_SRC
760    meld icu4c/source/data/coll/zh.txt /tmp/icu/coll/zh.txt
761    meld icu4c/source/data/translit/Hani_Latn.txt /tmp/icu/translit/Hani_Latn.txt
762  + copy into the source tree
763    cd $ICU_SRC
764    cp /tmp/icu/coll/zh.txt icu4c/source/data/coll/zh.txt
765    cp /tmp/icu/translit/Hani_Latn.txt icu4c/source/data/translit/Hani_Latn.txt
766- rebuild ICU4C
767
768* run & fix ICU4C tests, now with new CLDR collation root data
769- run all tests with the collation test data *_SHORT.txt or the full files
770  (the full ones have comments, useful for debugging)
771- note on intltest: if collate/UCAConformanceTest fails, then
772  utility/MultithreadTest/TestCollators will fail as well;
773  fix the conformance test before looking into the multi-thread test
774
775* update Java data files
776- refresh just the UCD/UCA-related/derived files, just to be safe
777- see (ICU4C)/source/data/icu4j-readme.txt
778- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
779- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
780    NOTE: If you get the error "No rule to make target 'out/build/icudt70l/uprops.icu'",
781    you need to reconfigure with unicore data; see the "configure" line above.
782  output:
783    ...
784    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
785    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt70b
786    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b
787    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt70l.dat ./out/icu4j/icudt70b.dat -s ./out/build/icudt70l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt70b
788    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt70b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt70b"
789    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt70b/
790    mkdir -p /tmp/icu4j/main/shared/data
791    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
792    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt70b/
793    mkdir -p /tmp/icu4j/main/shared/data
794    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
795    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
796- copy the big-endian Unicode data files to another location,
797  separate from the other data files,
798  and then refresh ICU4J
799    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
800    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
801    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
802    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
803    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
804    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
805    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
806    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
807    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
808    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
809
810* When refreshing all of ICU4J data from ICU4C
811- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
812- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
813or
814- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
815
816* refresh Java test .txt files
817- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
818    cd $ICU_SRC/icu4c/source/data/unidata
819    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
820    cd ../../test/testdata
821    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
822    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
823
824* run & fix ICU4J tests
825
826*** API additions
827- send notice to icu-design about new born-@stable API (enum constants etc.)
828
829*** CLDR numbering systems
830- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
831  for example:
832    ~/icu/mine/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-13.txt
833    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt > /tmp/icu/nv4-14.txt
834    ~/icu/uni/src$ diff -u /tmp/icu/nv4-13.txt /tmp/icu/nv4-14.txt
835    -->
836    +cp;16AC4;-Alpha;gc=Nd;-IDS;lb=NU;na=TANGSA DIGIT FOUR;nt=De;nv=4;SB=NU;WB=NU;-XIDS
837  Unicode 14:
838    tnsa 16AC0..16AC9 Tangsa
839    https://github.com/unicode-org/cldr/pull/1326
840
841*** merge the Unicode update branches back onto the trunk
842- do not merge the icudata.jar and testdata.jar,
843  instead rebuild them from merged & tested ICU4C
844- make sure that changes to Unicode tools are checked in:
845  https://github.com/unicode-org/unicodetools
846
847---------------------------------------------------------------------------- ***
848
849Unicode 13.0 update for ICU 66
850
851https://www.unicode.org/versions/Unicode13.0.0/
852https://www.unicode.org/versions/beta-13.0.0.html
853https://www.unicode.org/Public/13.0.0/ucd/
854https://www.unicode.org/reports/uax-proposed-updates.html
855https://www.unicode.org/reports/tr44/tr44-25.html
856
857https://unicode-org.atlassian.net/browse/CLDR-13387
858https://unicode-org.atlassian.net/browse/ICU-20893
859
860* Command-line environment setup
861
862UNICODE_DATA=~/unidata/uni13/20200212
863CLDR_SRC=~/cldr/uni/src
864ICU_ROOT=~/icu/uni
865ICU_SRC=$ICU_ROOT/src
866ICUDT=icudt66b
867ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
868ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
869export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
870
871*** Unicode version numbers
872- makedata.mak
873- uchar.h
874- com.ibm.icu.util.VersionInfo
875- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
876
877- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
878    so that the makefiles see the new version number.
879  cd $ICU_ROOT/dbg/icu4c
880  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
881
882*** data files & enums & parser code
883
884* download files
885- mkdir -p $UNICODE_DATA
886- download Unicode files into $UNICODE_DATA
887  + subfolders: emoji, idna, security, ucd, uca
888  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
889  + split Unihan into single-property files
890    ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
891  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
892    or from the ucd/cldr/ output folder of the Unicode Tools:
893    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
894  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
895
896* for manual diffs and for Unicode Tools input data updates:
897  remove version suffixes from the file names
898    ~$ unidata/desuffixucd.py $UNICODE_DATA
899  (see https://sites.google.com/site/unicodetools/inputdata)
900
901* process and/or copy files
902- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
903  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
904  + For debugging, and tweaking how ppucd.txt is written,
905    the tool has an --only_ppucd option:
906    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
907
908- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
909
910* new constants for new property values
911- preparseucd.py error:
912    ValueError: missing uchar.h enum constants for some property values:
913    [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
914        u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
915    (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
916    (u'InPC', set([u'Top_And_Bottom_And_Left']))]
917  = PropertyValueAliases.txt new property values (diff old & new .txt files)
918    blk; Chorasmian                       ; Chorasmian
919    blk; CJK_Ext_G                        ; CJK_Unified_Ideographs_Extension_G
920    blk; Dives_Akuru                      ; Dives_Akuru
921    blk; Khitan_Small_Script              ; Khitan_Small_Script
922    blk; Lisu_Sup                         ; Lisu_Supplement
923    blk; Symbols_For_Legacy_Computing     ; Symbols_For_Legacy_Computing
924    blk; Tangut_Sup                       ; Tangut_Supplement
925    blk; Yezidi                           ; Yezidi
926  -> add to uchar.h before UBLOCK_COUNT
927    use long property names for enum constants,
928    for the trailing comment get the block start code point: diff old & new Blocks.txt
929  -> add to UCharacter.UnicodeBlock IDs
930    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
931            replace  public static final int \1_ID = \2; \3
932  -> add to UCharacter.UnicodeBlock objects
933    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
934            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
935
936    sc ; Chrs                             ; Chorasmian
937    sc ; Diak                             ; Dives_Akuru
938    sc ; Kits                             ; Khitan_Small_Script
939    sc ; Yezi                             ; Yezidi
940  -> uscript.h & com.ibm.icu.lang.UScript
941  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
942      and in com.ibm.icu.dev.test.lang.TestUScript.java
943
944    InPC; Top_And_Bottom_And_Left         ; Top_And_Bottom_And_Left
945  -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
946
947* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
948    (not strictly necessary for NOT_ENCODED scripts)
949  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
950
951* build ICU (make install)
952  to make sure that there are no syntax errors, and
953  so that the tools build can pick up the new definitions from the installed header files.
954
955  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
956
957* update spoof checker UnicodeSet initializers:
958    inclusionPat & recommendedPat in i18n/uspoof.cpp
959    INCLUSION & RECOMMENDED in SpoofChecker.java
960- make sure that the Unicode Tools tree contains the latest security data files
961- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
962- update the hardcoded version number there in the DIRECTORY path
963- run the tool (no special environment variables needed)
964- copy & paste from the Console output into the .cpp & .java files
965
966* generate normalization data files
967  cd $ICU_ROOT/dbg/icu4c
968  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
969  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
970  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
971  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
972  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
973
974* build ICU (make install)
975  so that the tools build can pick up the new definitions from the installed header files.
976
977  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
978
979* build Unicode tools using CMake+make
980
981$ICU_SRC/tools/unicode/c/icudefs.txt:
982
983# Location (--prefix) of where ICU was installed.
984set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
985# Location of the ICU4C source tree.
986set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
987
988  $ICU_ROOT/dbg$
989    mkdir -p tools/unicode/c
990    cd tools/unicode/c
991
992  $ICU_ROOT/dbg/tools/unicode/c$
993    cmake ../../../../src/tools/unicode/c
994    make
995
996* generate core properties data files
997  $ICU_ROOT/dbg/tools/unicode/c$
998    genprops/genprops $ICU_SRC/icu4c
999- tool failure:
1000    genprops: Script_Extensions indexes overflow bit field
1001    genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
1002  -> uprops.icu data file format :
1003     add two more bits to store a script code or Script_Extensions index
1004  -> generator code, C++ & Java runtime, uprops.icu format version 7.7
1005- rebuild ICU (make install) & tools
1006
1007* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1008  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1009- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1010- Unicode 6.0..13.0: U+2260, U+226E, U+226F
1011- nothing new in this Unicode version, no test file to update
1012
1013* run & fix ICU4C tests
1014- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
1015- Andy helps with RBBI & spoof check test failures
1016
1017* collation: CLDR collation root, UCA DUCET
1018
1019- UCA DUCET goes into Mark's Unicode tools, see
1020    https://sites.google.com/site/unicodetools/home#TOC-UCA
1021  diff the main mapping file, look for bad changes
1022  (for example, more bytes per weight for common characters)
1023    ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
1024    ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
1025
1026- CLDR root data files are checked into $CLDR_SRC/common/uca/
1027    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1028
1029- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1030    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1031- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1032    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1033    (note removing the underscore before "Rules")
1034    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1035- restore TODO diffs in UCARules.txt
1036    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1037- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1038  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1039  from the CLDR root files (..._CLDR_..._SHORT.txt)
1040    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1041    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1042    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1043- if CLDR common/uca/unihan-index.txt changes, then update
1044  CLDR common/collation/root.xml <collation type="private-unihan">
1045  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1046
1047- run genuca
1048  $ICU_ROOT/dbg/tools/unicode/c$
1049    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
1050    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1051- rebuild ICU4C
1052
1053* Unihan collators
1054    https://sites.google.com/site/unicodetools/unihan
1055- run Unicode Tools
1056    org.unicode.draft.GenerateUnihanCollators
1057  with VM arguments
1058    -ea
1059    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1060    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1061    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1062    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
1063    -DUVERSION=13.0.0
1064- run Unicode Tools
1065    org.unicode.draft.GenerateUnihanCollatorFiles
1066  with the same arguments
1067- check CLDR diffs
1068    cd $CLDR_SRC
1069    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1070    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1071- copy to CLDR
1072    cd $CLDR_SRC
1073    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1074    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1075- run CLDR unit tests, commit to CLDR
1076- generate ICU zh collation data: run CLDR
1077    org.unicode.cldr.icu.NewLdml2IcuConverter
1078  with program arguments
1079    -t collation
1080    -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
1081    -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
1082    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
1083    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
1084    zh
1085  and VM arguments
1086    -ea
1087    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
1088- rebuild ICU4C
1089
1090* run & fix ICU4C tests, now with new CLDR collation root data
1091- run all tests with the collation test data *_SHORT.txt or the full files
1092  (the full ones have comments, useful for debugging)
1093- note on intltest: if collate/UCAConformanceTest fails, then
1094  utility/MultithreadTest/TestCollators will fail as well;
1095  fix the conformance test before looking into the multi-thread test
1096
1097* update Java data files
1098- refresh just the UCD/UCA-related/derived files, just to be safe
1099- see (ICU4C)/source/data/icu4j-readme.txt
1100- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1101- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1102  output:
1103    ...
1104    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1105    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
1106    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
1107    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
1108    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
1109    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
1110    mkdir -p /tmp/icu4j/main/shared/data
1111    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1112    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
1113    mkdir -p /tmp/icu4j/main/shared/data
1114    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1115    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1116- copy the big-endian Unicode data files to another location,
1117  separate from the other data files,
1118  and then refresh ICU4J
1119    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1120    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1121    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1122    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1123    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1124    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1125    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1126    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1127    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1128    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1129
1130* When refreshing all of ICU4J data from ICU4C
1131- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1132- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1133or
1134- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1135
1136* update CollationFCD.java
1137  + copy & paste the initializers of lcccIndex[] etc. from
1138    ICU4C/source/i18n/collationfcd.cpp to
1139    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1140
1141* refresh Java test .txt files
1142- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1143    cd $ICU_SRC/icu4c/source/data/unidata
1144    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1145    cd ../../test/testdata
1146    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1147    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1148
1149* run & fix ICU4J tests
1150
1151*** API additions
1152- send notice to icu-design about new born-@stable API (enum constants etc.)
1153
1154*** CLDR numbering systems
1155- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1156  for example, look for
1157    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1158    in new blocks (Blocks.txt)
1159  Unicode 13:
1160    diak 11950..11959 Dives_Akuru
1161
1162*** merge the Unicode update branches back onto the trunk
1163- do not merge the icudata.jar and testdata.jar,
1164  instead rebuild them from merged & tested ICU4C
1165- make sure that changes to Unicode tools are checked in:
1166  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1167
1168---------------------------------------------------------------------------- ***
1169
1170Unicode 12.1 update for ICU 64.2
1171
1172** This is an abbreviated update with one new character for the new
1173** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
1174https://en.wikipedia.org/wiki/Reiwa_period
1175
1176http://www.unicode.org/versions/Unicode12.1.0/
1177
1178ICU-20497 Unicode 12.1
1179
1180cldrbug 11978: Unicode 12.1
1181
1182* Command-line environment setup
1183
1184UNICODE_DATA=~/unidata/uni121/20190403
1185CLDR_SRC=~/svn.cldr/uni
1186ICU_ROOT=~/icu/uni
1187ICU_SRC=$ICU_ROOT/src
1188ICUDT=icudt64b
1189ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1190ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1191export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1192
1193*** Unicode version numbers
1194- makedata.mak
1195- uchar.h
1196- com.ibm.icu.util.VersionInfo
1197- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1198
1199- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1200    so that the makefiles see the new version number.
1201  cd $ICU_ROOT/dbg/icu4c
1202  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
1203
1204*** data files & enums & parser code
1205
1206* download files
1207- mkdir -p $UNICODE_DATA
1208- download Unicode files into $UNICODE_DATA
1209  + subfolders: emoji, idna, security, ucd, uca
1210  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1211
1212* for manual diffs and for Unicode Tools input data updates:
1213  remove version suffixes from the file names
1214    ~$ unidata/desuffixucd.py $UNICODE_DATA
1215  (see https://sites.google.com/site/unicodetools/inputdata)
1216
1217* process and/or copy files
1218- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1219  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1220  + For debugging, and tweaking how ppucd.txt is written,
1221    the tool has an --only_ppucd option:
1222    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1223
1224- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1225
1226* build ICU (make install)
1227  so that the tools build can pick up the new definitions from the installed header files.
1228
1229  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1230
1231* update spoof checker UnicodeSet initializers:
1232    inclusionPat & recommendedPat in uspoof.cpp
1233    INCLUSION & RECOMMENDED in SpoofChecker.java
1234- make sure that the Unicode Tools tree contains the latest security data files
1235- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1236- update the hardcoded version number there in the DIRECTORY path
1237- run the tool (no special environment variables needed)
1238- copy & paste from the Console output into the .cpp & .java files
1239
1240* generate normalization data files
1241  cd $ICU_ROOT/dbg/icu4c
1242  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1243  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1244  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1245  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1246  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1247
1248* build ICU (make install)
1249  so that the tools build can pick up the new definitions from the installed header files.
1250
1251  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1252
1253* build Unicode tools using CMake+make
1254
1255$ICU_SRC/tools/unicode/c/icudefs.txt:
1256
1257# Location (--prefix) of where ICU was installed.
1258set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1259# Location of the ICU4C source tree.
1260set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
1261
1262  $ICU_ROOT/dbg$
1263    mkdir -p tools/unicode/c
1264    cd tools/unicode/c
1265
1266  $ICU_ROOT/dbg/tools/unicode/c$
1267    cmake ../../../../src/tools/unicode/c
1268    make
1269
1270* generate core properties data files
1271  $ICU_ROOT/dbg/tools/unicode/c$
1272    genprops/genprops $ICU_SRC/icu4c
1273    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
1274    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1275- rebuild ICU (make install) & tools
1276
1277* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1278  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1279- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1280- Unicode 6.0..12.1: U+2260, U+226E, U+226F
1281- nothing new in this Unicode version, no test file to update
1282
1283* run & fix ICU4C tests
1284- Andy handles RBBI & spoof check test failures
1285
1286* collation: CLDR collation root, UCA DUCET
1287
1288- UCA DUCET goes into Mark's Unicode tools, see
1289    https://sites.google.com/site/unicodetools/home#TOC-UCA
1290  diff the main mapping file, look for bad changes
1291  (for example, more bytes per weight for common characters)
1292    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
1293    ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
1294
1295- CLDR root data files are checked into $CLDR_SRC/common/uca/
1296    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1297
1298- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1299    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1300- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1301    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1302    (note removing the underscore before "Rules")
1303    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1304- restore TODO diffs in UCARules.txt
1305    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1306- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1307  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1308  from the CLDR root files (..._CLDR_..._SHORT.txt)
1309    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1310    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1311    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1312- if CLDR common/uca/unihan-index.txt changes, then update
1313  CLDR common/collation/root.xml <collation type="private-unihan">
1314  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1315
1316- run genuca, see command line above
1317- rebuild ICU4C
1318
1319* Unihan collators
1320    https://sites.google.com/site/unicodetools/unihan
1321- run Unicode Tools
1322    org.unicode.draft.GenerateUnihanCollators
1323  with VM arguments
1324    -ea
1325    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1326    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1327    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1328    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1329    -DUVERSION=12.1.0
1330- run Unicode Tools
1331    org.unicode.draft.GenerateUnihanCollatorFiles
1332  with the same arguments
1333- check CLDR diffs
1334    cd $CLDR_SRC
1335    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1336    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1337- copy to CLDR
1338    cd $CLDR_SRC
1339    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1340    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1341- run CLDR unit tests, commit to CLDR
1342- generate ICU zh collation data: run CLDR
1343    org.unicode.cldr.icu.NewLdml2IcuConverter
1344  with program arguments
1345    -t collation
1346    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1347    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1348    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
1349    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
1350    zh
1351  and VM arguments
1352    -ea
1353    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1354- rebuild ICU4C
1355
1356* run & fix ICU4C tests, now with new CLDR collation root data
1357- run all tests with the collation test data *_SHORT.txt or the full files
1358  (the full ones have comments, useful for debugging)
1359- note on intltest: if collate/UCAConformanceTest fails, then
1360  utility/MultithreadTest/TestCollators will fail as well;
1361  fix the conformance test before looking into the multi-thread test
1362
1363* update Java data files
1364- refresh just the UCD/UCA-related/derived files, just to be safe
1365- see (ICU4C)/source/data/icu4j-readme.txt
1366- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1367- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1368  output:
1369    ...
1370    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1371    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
1372    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
1373    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
1374    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
1375    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
1376    mkdir -p /tmp/icu4j/main/shared/data
1377    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1378    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
1379    mkdir -p /tmp/icu4j/main/shared/data
1380    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1381    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1382- copy the big-endian Unicode data files to another location,
1383  separate from the other data files,
1384  and then refresh ICU4J
1385    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1386    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1387    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1388    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1389    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1390    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1391    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1392    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1393    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1394    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1395
1396* When refreshing all of ICU4J data from ICU4C
1397- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1398- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1399or
1400- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1401
1402* update CollationFCD.java
1403  + copy & paste the initializers of lcccIndex[] etc. from
1404    ICU4C/source/i18n/collationfcd.cpp to
1405    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1406
1407* refresh Java test .txt files
1408- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1409    cd $ICU_SRC/icu4c/source/data/unidata
1410    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1411    cd ../../test/testdata
1412    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1413    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1414
1415* run & fix ICU4J tests
1416
1417*** API additions
1418- send notice to icu-design about new born-@stable API (enum constants etc.)
1419
1420*** CLDR numbering systems
1421- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1422  for example, look for
1423    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1424    in new blocks (Blocks.txt)
1425  Unicode 12: using Unicode 12 CLDR ticket #11478
1426    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
1427    wcho 1E2F0..1E2F9 Wancho
1428  Unicode 11: using Unicode 11 CLDR ticket #10978
1429    rohg 10D30..10D39 Hanifi_Rohingya
1430    gong 11DA0..11DA9 Gunjala_Gondi
1431  Earlier: CLDR tickets specific to adding new numbering systems.
1432  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1433  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1434
1435*** merge the Unicode update branches back onto the trunk
1436- do not merge the icudata.jar and testdata.jar,
1437  instead rebuild them from merged & tested ICU4C
1438- make sure that changes to Unicode tools are checked in:
1439  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1440
1441---------------------------------------------------------------------------- ***
1442
1443Unicode 12.0 update for ICU 64
1444
1445http://www.unicode.org/versions/Unicode12.0.0/
1446http://unicode.org/versions/beta-12.0.0.html
1447https://www.unicode.org/review/pri389/
1448http://www.unicode.org/reports/uax-proposed-updates.html
1449http://www.unicode.org/reports/tr44/tr44-23.html
1450
1451ICU-20203 Unicode 12
1452
1453ICU-20111 move text layout properties data into a data file
1454
1455cldrbug 11478: Unicode 12
1456Accidentally used ^/trunk instead of ^/branches/markus/uni12
1457
1458* Command-line environment setup
1459
1460UNICODE_DATA=~/unidata/uni12/20190309
1461CLDR_SRC=~/svn.cldr/uni
1462ICU_ROOT=~/icu/uni
1463ICU_SRC=$ICU_ROOT/src
1464ICUDT=icudt63b
1465ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1466ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1467export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1468
1469*** Unicode version numbers
1470- makedata.mak
1471- uchar.h
1472- com.ibm.icu.util.VersionInfo
1473- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1474
1475- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1476  so that the makefiles see the new version number.
1477
1478*** data files & enums & parser code
1479
1480* download files
1481- mkdir -p $UNICODE_DATA
1482- download Unicode files into $UNICODE_DATA
1483  + subfolders: emoji, idna, security, ucd, uca
1484  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1485
1486* for manual diffs and for Unicode Tools input data updates:
1487  remove version suffixes from the file names
1488    ~$ unidata/desuffixucd.py $UNICODE_DATA
1489  (see https://sites.google.com/site/unicodetools/inputdata)
1490
1491* process and/or copy files
1492- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1493  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1494  + For debugging, and tweaking how ppucd.txt is written,
1495    the tool has an --only_ppucd option:
1496    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1497
1498- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1499
1500* build ICU (make install)
1501  so that the tools build can pick up the new definitions from the installed header files.
1502
1503  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1504
1505* new constants for new property values
1506- preparseucd.py error:
1507    ValueError: missing uchar.h enum constants for some property values:
1508    [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
1509        u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
1510        u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
1511    (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
1512  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1513    blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
1514    blk; Elymaic                          ; Elymaic
1515    blk; Nandinagari                      ; Nandinagari
1516    blk; Nyiakeng_Puachue_Hmong           ; Nyiakeng_Puachue_Hmong
1517    blk; Ottoman_Siyaq_Numbers            ; Ottoman_Siyaq_Numbers
1518    blk; Small_Kana_Ext                   ; Small_Kana_Extension
1519    blk; Symbols_And_Pictographs_Ext_A    ; Symbols_And_Pictographs_Extended_A
1520    blk; Tamil_Sup                        ; Tamil_Supplement
1521    blk; Wancho                           ; Wancho
1522  -> add to uchar.h
1523    use long property names for enum constants,
1524    for the trailing comment get the block start code point: diff old & new Blocks.txt
1525  -> add to UCharacter.UnicodeBlock IDs
1526    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1527            replace  public static final int \1_ID = \2; \3
1528  -> add to UCharacter.UnicodeBlock objects
1529    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1530            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
1531
1532    sc ; Elym                             ; Elymaic
1533    sc ; Hmnp                             ; Nyiakeng_Puachue_Hmong
1534    sc ; Nand                             ; Nandinagari
1535    sc ; Wcho                             ; Wancho
1536  -> uscript.h & com.ibm.icu.lang.UScript
1537  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1538      and in com.ibm.icu.dev.test.lang.TestUScript.java
1539
1540* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1541    (not strictly necessary for NOT_ENCODED scripts)
1542  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1543
1544* update spoof checker UnicodeSet initializers:
1545    inclusionPat & recommendedPat in uspoof.cpp
1546    INCLUSION & RECOMMENDED in SpoofChecker.java
1547- make sure that the Unicode Tools tree contains the latest security data files
1548- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1549- update the hardcoded version number there in the DIRECTORY path
1550- run the tool (no special environment variables needed)
1551- copy & paste from the Console output into the .cpp & .java files
1552
1553* generate normalization data files
1554  cd $ICU_ROOT/dbg/icu4c
1555  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1556  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1557  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1558  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1559  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1560
1561* build ICU (make install)
1562  so that the tools build can pick up the new definitions from the installed header files.
1563
1564  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
1565
1566* build Unicode tools using CMake+make
1567
1568$ICU_SRC/tools/unicode/c/icudefs.txt:
1569
1570# Location (--prefix) of where ICU was installed.
1571set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1572# Location of the ICU4C source tree.
1573set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
1574
1575  $ICU_ROOT/dbg$
1576    mkdir -p tools/unicode/c
1577    cd tools/unicode/c
1578
1579  $ICU_ROOT/dbg/tools/unicode/c$
1580    cmake ../../../../src/tools/unicode/c
1581    make
1582
1583* generate core properties data files
1584  $ICU_ROOT/dbg/tools/unicode/c$
1585    genprops/genprops $ICU_SRC/icu4c
1586    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
1587    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1588- rebuild ICU (make install) & tools
1589
1590* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1591  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1592- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1593- Unicode 6.0..12.0: U+2260, U+226E, U+226F
1594- nothing new in this Unicode version, no test file to update
1595
1596* run & fix ICU4C tests
1597- update test of default bidi classes:
1598  Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
1599  see diffs in DerivedBidiClass.txt
1600  + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
1601  + UCharacterTest.java TestIteration() defaultBidi[]
1602- Andy handles RBBI & spoof check test failures
1603
1604* collation: CLDR collation root, UCA DUCET
1605
1606- UCA DUCET goes into Mark's Unicode tools, see
1607    https://sites.google.com/site/unicodetools/home#TOC-UCA
1608  diff the main mapping file, look for bad changes
1609  (for example, more bytes per weight for common characters)
1610    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
1611    ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
1612
1613- CLDR root data files are checked into $CLDR_SRC/common/uca/
1614    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1615
1616- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1617    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1618- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1619    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1620    (note removing the underscore before "Rules")
1621    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1622- restore TODO diffs in UCARules.txt
1623    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1624- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1625  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1626  from the CLDR root files (..._CLDR_..._SHORT.txt)
1627    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1628    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1629    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1630- if CLDR common/uca/unihan-index.txt changes, then update
1631  CLDR common/collation/root.xml <collation type="private-unihan">
1632  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1633
1634- run genuca, see command line above;
1635  deal with
1636    Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1637    FDD1 119CE;	[71 CD 02, 05, 05]	# Nandinagari first primary (compressible)
1638        (add the character to genuca.cpp sampleCharsToScripts[])
1639  + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
1640    and cache its values.
1641    Works as long as the script metadata is updated before the collation data.
1642- rebuild ICU4C
1643
1644* Unihan collators
1645    https://sites.google.com/site/unicodetools/unihan
1646- run Unicode Tools
1647    org.unicode.draft.GenerateUnihanCollators
1648  with VM arguments
1649    -ea
1650    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1651    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1652    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1653    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1654    -DUVERSION=12.0.0
1655- run Unicode Tools
1656    org.unicode.draft.GenerateUnihanCollatorFiles
1657  with the same arguments
1658- check CLDR diffs
1659    cd $CLDR_SRC
1660    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1661    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1662- copy to CLDR
1663    cd $CLDR_SRC
1664    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1665    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1666- run CLDR unit tests, commit to CLDR
1667- generate ICU zh collation data: run CLDR
1668    org.unicode.cldr.icu.NewLdml2IcuConverter
1669  with program arguments
1670    -t collation
1671    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1672    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1673    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
1674    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
1675    zh
1676  and VM arguments
1677    -ea
1678    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1679- rebuild ICU4C
1680
1681* run & fix ICU4C tests, now with new CLDR collation root data
1682- run all tests with the collation test data *_SHORT.txt or the full files
1683  (the full ones have comments, useful for debugging)
1684- note on intltest: if collate/UCAConformanceTest fails, then
1685  utility/MultithreadTest/TestCollators will fail as well;
1686  fix the conformance test before looking into the multi-thread test
1687
1688* update Java data files
1689- refresh just the UCD/UCA-related/derived files, just to be safe
1690- see (ICU4C)/source/data/icu4j-readme.txt
1691- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1692- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1693  output:
1694    ...
1695    Unicode .icu files built to ./out/build/icudt63l
1696    echo timestamp > uni-core-data
1697    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
1698    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
1699    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1700    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
1701    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
1702    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
1703    mkdir -p /tmp/icu4j/main/shared/data
1704    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1705    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
1706    mkdir -p /tmp/icu4j/main/shared/data
1707    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1708    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
1709- copy the big-endian Unicode data files to another location,
1710  separate from the other data files,
1711  and then refresh ICU4J
1712    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1713    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1714    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1715    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1716    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1717    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1718    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1719    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1720    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1721    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1722
1723* When refreshing all of ICU4J data from ICU4C
1724- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1725- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1726or
1727- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1728
1729* update CollationFCD.java
1730  + copy & paste the initializers of lcccIndex[] etc. from
1731    ICU4C/source/i18n/collationfcd.cpp to
1732    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1733
1734* refresh Java test .txt files
1735- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1736    cd $ICU_SRC/icu4c/source/data/unidata
1737    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1738    cd ../../test/testdata
1739    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1740    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1741
1742* run & fix ICU4J tests
1743
1744*** API additions
1745- send notice to icu-design about new born-@stable API (enum constants etc.)
1746
1747*** CLDR numbering systems
1748- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1749  for example, look for
1750    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
1751    in new blocks (Blocks.txt)
1752  Unicode 12: using Unicode 12 CLDR ticket #11478
1753    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
1754    wcho 1E2F0..1E2F9 Wancho
1755  Unicode 11: using Unicode 11 CLDR ticket #10978
1756    rohg 10D30..10D39 Hanifi_Rohingya
1757    gong 11DA0..11DA9 Gunjala_Gondi
1758  Earlier: CLDR tickets specific to adding new numbering systems.
1759  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1760  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1761
1762*** merge the Unicode update branches back onto the trunk
1763- do not merge the icudata.jar and testdata.jar,
1764  instead rebuild them from merged & tested ICU4C
1765- make sure that changes to Unicode tools are checked in:
1766  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1767
1768---------------------------------------------------------------------------- ***
1769
1770ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
1771
1772* Command-line environment setup
1773
1774UNICODE_DATA=~/unidata/uni11/20180609
1775CLDR_SRC=~/svn.cldr/uni
1776ICU_ROOT=~/icu/mine
1777ICU_SRC=$ICU_ROOT/src
1778ICUDT=icudt62b
1779ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1780ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1781export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1782
1783*** Links
1784
1785https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
1786https://unicode-org.atlassian.net/browse/ICU-12850 vo
1787
1788*** data files & enums & parser code
1789
1790* API additions
1791- for each of the three new enumerated properties
1792  + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
1793  + uchar.h: update UCHAR_INT_LIMIT
1794  + uchar.h: add the enum U<long prop name>
1795    with constants U_<short prop name>_<long value name>
1796  + UProperty.java: add the constant <long prop name>
1797  + UProperty.java: update INT_LIMIT
1798  + UCharacter.java: add the interface <long prop name>
1799    with constants <long value name>
1800
1801* process and/or copy files
1802- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1803  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1804  + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
1805    names and aliases.
1806  + For debugging, and tweaking how ppucd.txt is written,
1807    the tool has an --only_ppucd option:
1808    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1809
1810* preparseucd.py changes
1811- add new property short names (uppercase) to _prop_and_value_re
1812  so that ParseUCharHeader() parses the new enum constants
1813
1814* build ICU (make install)
1815  so that the tools build can pick up the new definitions from the installed header files.
1816
1817  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1818
1819* build Unicode tools using CMake+make
1820
1821$ICU_SRC/tools/unicode/c/icudefs.txt:
1822
1823# Location (--prefix) of where ICU was installed.
1824set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1825# Location of the ICU4C source tree.
1826set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
1827
1828  $ICU_ROOT/dbg$
1829    mkdir -p tools/unicode/c
1830    cd tools/unicode/c
1831
1832  $ICU_ROOT/dbg/tools/unicode/c$
1833    cmake ../../../../../src/tools/unicode/c
1834    make
1835
1836* generate core properties data files
1837  $ICU_ROOT/dbg/tools/unicode/c$
1838    genprops/genprops $ICU_SRC/icu4c
1839- rebuild ICU (make install) & tools
1840
1841* write data for runtime, hardcoded for now
1842- add genprops/layoutpropsbuilder.cpp with pieces from sibling files
1843- generate new icu4c/source/common/ulayout_props_data.h
1844- for each of the three new enumerated properties
1845  + int property max value
1846  + small, 8-bit UCPTrie
1847    (A small 16-bit trie with bit fields for these three properties
1848    is very nearly the same size as the sum of the three.)
1849
1850* wire into C++
1851- uprops.cpp: #include ulayout_props_data.h
1852- uprops.cpp: add getInPC() etc. functions
1853- uprops.cpp: add lines to intProps[], include max values
1854- uprops.h: add UPropertySource constants
1855- uprops.cpp: add uprops_addPropertyStarts(src)
1856- uniset_props.cpp: add to UnicodeSet_initInclusion()
1857- intltest/ucdtest.cpp: write unit tests
1858
1859* update Java data files
1860- refresh just the pnames.icu file with the new property [value] names, just to be safe
1861- see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
1862- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1863- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1864- copy the big-endian Unicode data files to another location,
1865  separate from the other data files,
1866  and then refresh ICU4J
1867    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1868    cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1869    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1870
1871* wire into Java
1872- UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
1873- UCharacterProperty.java: for each new property
1874  + create a nested class to hold its CodePointTrie
1875  + initialize it from a string literal
1876  + paste in the initializer printed by genprops
1877  + add a new IntProperty object to the intProps[] array
1878  + use the correct max int value for each property, also printed by genprops
1879- UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
1880- UnicodeSet.java: add to getInclusions()
1881- UCharacterTest.java: write unit tests
1882
1883---------------------------------------------------------------------------- ***
1884
1885Unicode 11.0 update for ICU 62
1886
1887http://www.unicode.org/versions/Unicode11.0.0/
1888http://unicode.org/versions/beta-11.0.0.html
1889https://www.unicode.org/review/pri372/
1890http://www.unicode.org/reports/uax-proposed-updates.html
1891http://www.unicode.org/reports/tr44/tr44-21.html
1892
1893* Command-line environment setup
1894
1895UNICODE_DATA=~/unidata/uni11/20180521
1896CLDR_SRC=~/svn.cldr/uni
1897ICU_ROOT=~/svn.icu/uni
1898ICU_SRC=$ICU_ROOT/src
1899ICUDT=icudt61b
1900ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1901ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1902export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1903
1904*** ICU Trac
1905
1906- ticket:13630: Unicode 11
1907- ^/branches/markus/uni11
1908
1909*** CLDR Trac
1910
1911- cldrbug 10978: Unicode 11
1912- ^/branches/markus/uni11
1913
1914*** Unicode version numbers
1915- makedata.mak
1916- uchar.h
1917- com.ibm.icu.util.VersionInfo
1918- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1919
1920- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1921  so that the makefiles see the new version number.
1922
1923*** data files & enums & parser code
1924
1925* download files
1926- mkdir -p $UNICODE_DATA
1927- download Unicode files into $UNICODE_DATA
1928  + subfolders: emoji, idna, security, ucd, uca
1929  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1930
1931* for manual diffs and for Unicode Tools input data updates:
1932  remove version suffixes from the file names
1933    ~$ unidata/desuffixucd.py $UNICODE_DATA
1934  (see https://sites.google.com/site/unicodetools/inputdata)
1935
1936* process and/or copy files
1937- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1938  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1939  + For debugging, and tweaking how ppucd.txt is written,
1940    the tool has an --only_ppucd option:
1941    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1942
1943- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1944
1945* build ICU (make install)
1946  so that the tools build can pick up the new definitions from the installed header files.
1947
1948  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1949
1950* preparseucd.py changes
1951- fix other errors
1952    NameError: unknown property Extended_Pictographic
1953  -> add Extended_Pictographic binary property
1954  -> add new short names for all Emoji properties
1955
1956* new constants for new property values
1957- preparseucd.py error:
1958    ValueError: missing uchar.h enum constants for some property values:
1959    [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
1960                   u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
1961                   u'Indic_Siyaq_Numbers'])),
1962     (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
1963     (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
1964     (u'GCB', set([u'LinkC', u'Virama'])),
1965     (u'WB', set([u'WSegSpace']))]
1966  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1967    blk; Chess_Symbols                    ; Chess_Symbols
1968    blk; Dogra                            ; Dogra
1969    blk; Georgian_Ext                     ; Georgian_Extended
1970    blk; Gunjala_Gondi                    ; Gunjala_Gondi
1971    blk; Hanifi_Rohingya                  ; Hanifi_Rohingya
1972    blk; Indic_Siyaq_Numbers              ; Indic_Siyaq_Numbers
1973    blk; Makasar                          ; Makasar
1974    blk; Mayan_Numerals                   ; Mayan_Numerals
1975    blk; Medefaidrin                      ; Medefaidrin
1976    blk; Old_Sogdian                      ; Old_Sogdian
1977    blk; Sogdian                          ; Sogdian
1978  -> add to uchar.h
1979    use long property names for enum constants,
1980    for the trailing comment get the block start code point: diff old & new Blocks.txt
1981  -> add to UCharacter.UnicodeBlock IDs
1982    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1983            replace  public static final int \1_ID = \2; \3
1984  -> add to UCharacter.UnicodeBlock objects
1985    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1986            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1987
1988    GCB; LinkC                            ; LinkingConsonant
1989    GCB; Virama                           ; Virama
1990  -> uchar.h & UCharacter.GraphemeClusterBreak
1991  -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
1992
1993    InSC; Consonant_Initial_Postfixed     ; Consonant_Initial_Postfixed
1994  -> ignore: ICU does not yet support this property
1995
1996    jg ; Hanifi_Rohingya_Kinna_Ya         ; Hanifi_Rohingya_Kinna_Ya
1997    jg ; Hanifi_Rohingya_Pa               ; Hanifi_Rohingya_Pa
1998  -> uchar.h & UCharacter.JoiningGroup
1999
2000    sc ; Dogr                             ; Dogra
2001    sc ; Gong                             ; Gunjala_Gondi
2002    sc ; Maka                             ; Makasar
2003    sc ; Medf                             ; Medefaidrin
2004    sc ; Rohg                             ; Hanifi_Rohingya
2005    sc ; Sogd                             ; Sogdian
2006    sc ; Sogo                             ; Old_Sogdian
2007  -> uscript.h & com.ibm.icu.lang.UScript
2008  -> Nushu had been added already
2009  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2010      and in com.ibm.icu.dev.test.lang.TestUScript.java
2011
2012    WB ; WSegSpace                        ; WSegSpace
2013  -> uchar.h & UCharacter.WordBreak
2014
2015* New short names for emoji properties
2016- see UTS #51
2017- short names set in preparseucd.py
2018
2019* New properties
2020- boolean emoji property Extended_Pictographic
2021  -> added in preparseucd.py
2022  -> uchar.h & UProperty.java
2023- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
2024  as shown in PropertyValueAliases.txt
2025  -> ignore for now
2026
2027* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2028    (not strictly necessary for NOT_ENCODED scripts)
2029  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
2030
2031* update spoof checker UnicodeSet initializers:
2032    inclusionPat & recommendedPat in uspoof.cpp
2033    INCLUSION & RECOMMENDED in SpoofChecker.java
2034- make sure that the Unicode Tools tree contains the latest security data files
2035- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
2036- update the hardcoded version number there in the DIRECTORY path
2037- run the tool (no special environment variables needed)
2038- copy & paste from the Console output into the .cpp & .java files
2039
2040* generate normalization data files
2041  cd $ICU_ROOT/dbg/icu4c
2042  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
2043  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
2044  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
2045  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2046  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
2047
2048* build ICU (make install)
2049  so that the tools build can pick up the new definitions from the installed header files.
2050
2051  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2052
2053* build Unicode tools using CMake+make
2054
2055$ICU_SRC/tools/unicode/c/icudefs.txt:
2056
2057# Location (--prefix) of where ICU was installed.
2058set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
2059# Location of the ICU4C source tree.
2060set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
2061
2062  $ICU_ROOT/dbg$
2063    mkdir -p tools/unicode/c
2064    cd tools/unicode/c
2065
2066  $ICU_ROOT/dbg/tools/unicode/c$
2067    cmake ../../../../src/tools/unicode/c
2068    make
2069
2070* generate core properties data files
2071  $ICU_ROOT/dbg/tools/unicode/c$
2072    genprops/genprops $ICU_SRC/icu4c
2073    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
2074    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
2075- rebuild ICU (make install) & tools
2076
2077* Fix case props
2078    genprops error: casepropsbuilder: too many exceptions words
2079    genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
2080- With the addition of Georgian Mtavruli capital letters,
2081  there are now too many simple case mappings with big mapping deltas
2082  that yield uncompressible exceptions.
2083- Changing the data structure (now formatVersion 4),
2084  adding one bit for no-simple-case-folding (for Cherokee), and
2085  one optional slot for a big delta (for most faraway mappings),
2086  together with another bit for whether that is negative.
2087  This makes most Cherokee & Georgian etc. case mappings compressible,
2088  reducing the number of exceptions words.
2089- Further changes to gain one more bit for the exceptions index,
2090  for future growth. Details see casepropsbuilder.cpp.
2091
2092* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2093  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2094- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2095- Unicode 6.0..11.0: U+2260, U+226E, U+226F
2096- nothing new in this Unicode version, no test file to update
2097
2098* run & fix ICU4C tests
2099- Andy handles RBBI & spoof check test failures
2100
2101- Errors in char.txt, word.txt, word_POSIX.txt like
2102    createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET"  at line 46, column 16
2103  because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
2104  -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
2105     not empty, just to get ICU building.
2106  -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
2107     and properties together with the rules that used them (GB 10, WB 14).
2108  -> Andy adjusts the rule sets further to sync with
2109     Unicode 11 grapheme, word, and line break spec changes.
2110
2111* collation: CLDR collation root, UCA DUCET
2112
2113- UCA DUCET goes into Mark's Unicode tools, see
2114    https://sites.google.com/site/unicodetools/home#TOC-UCA
2115  diff the main mapping file, look for bad changes
2116  (for example, more bytes per weight for common characters)
2117    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
2118    ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
2119
2120- CLDR root data files are checked into $CLDR_SRC/common/uca/
2121    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
2122
2123- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2124    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
2125- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2126    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
2127    (note removing the underscore before "Rules")
2128    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
2129- restore TODO diffs in UCARules.txt
2130    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
2131- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2132  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2133  from the CLDR root files (..._CLDR_..._SHORT.txt)
2134    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2135    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2136    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
2137- if CLDR common/uca/unihan-index.txt changes, then update
2138  CLDR common/collation/root.xml <collation type="private-unihan">
2139  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
2140
2141- run genuca, see command line above;
2142  deal with
2143    Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
2144    FDD1 1180B;	[71 CC 02, 05, 05]	# Dogra first primary (compressible)
2145        (add the character to genuca.cpp sampleCharsToScripts[])
2146  + look up the USCRIPT_ code for the new sample characters
2147    (should be obvious from the comment in the error output)
2148  + *add* mappings to sampleCharsToScripts[], do not replace them
2149    (in case the script sample characters flip-flop)
2150  + insert new scripts in DUCET script order, see the top_byte table
2151    at the beginning of FractionalUCA.txt
2152- rebuild ICU4C
2153
2154* Unihan collators
2155    https://sites.google.com/site/unicodetools/unihan
2156- run Unicode Tools
2157    org.unicode.draft.GenerateUnihanCollators
2158  with VM arguments
2159    -ea
2160    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
2161    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
2162    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
2163    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
2164    -DUVERSION=11.0.0
2165- run Unicode Tools
2166    org.unicode.draft.GenerateUnihanCollatorFiles
2167  with the same arguments
2168- check CLDR diffs
2169    cd $CLDR_SRC
2170    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2171    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2172- copy to CLDR
2173    cd $CLDR_SRC
2174    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2175    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2176- run CLDR unit tests, commit to CLDR
2177- generate ICU zh collation data: run CLDR
2178    org.unicode.cldr.icu.NewLdml2IcuConverter
2179  with program arguments
2180    -t collation
2181    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
2182    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
2183    -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
2184    -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
2185    zh
2186  and VM arguments
2187    -ea
2188    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
2189- rebuild ICU4C
2190
2191* run & fix ICU4C tests, now with new CLDR collation root data
2192- run all tests with the collation test data *_SHORT.txt or the full files
2193  (the full ones have comments, useful for debugging)
2194- note on intltest: if collate/UCAConformanceTest fails, then
2195  utility/MultithreadTest/TestCollators will fail as well;
2196  fix the conformance test before looking into the multi-thread test
2197
2198* update Java data files
2199- refresh just the UCD/UCA-related/derived files, just to be safe
2200- see (ICU4C)/source/data/icu4j-readme.txt
2201- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2202- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2203  output:
2204    ...
2205    Unicode .icu files built to ./out/build/icudt61l
2206    echo timestamp > uni-core-data
2207    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
2208    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
2209    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2210    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
2211    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
2212    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
2213    mkdir -p /tmp/icu4j/main/shared/data
2214    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2215    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
2216    mkdir -p /tmp/icu4j/main/shared/data
2217    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2218    make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
2219- copy the big-endian Unicode data files to another location,
2220  separate from the other data files,
2221  and then refresh ICU4J
2222    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
2223    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2224    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2225    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2226    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2227    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2228    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2229    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2230    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2231    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2232
2233* When refreshing all of ICU4J data from ICU4C
2234- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2235- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
2236or
2237- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
2238
2239* update CollationFCD.java
2240  + copy & paste the initializers of lcccIndex[] etc. from
2241    ICU4C/source/i18n/collationfcd.cpp to
2242    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2243
2244* refresh Java test .txt files
2245- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2246    cd $ICU_SRC/icu4c/source/data/unidata
2247    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2248    cd ../../test/testdata
2249    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2250    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2251
2252* run & fix ICU4J tests
2253
2254*** API additions
2255- send notice to icu-design about new born-@stable API (enum constants etc.)
2256
2257*** CLDR numbering systems
2258- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
2259  Unicode 11: using Unicode 11 CLDR ticket #10978
2260    rohg 10D30..10D39 Hanifi_Rohingya
2261    gong 11DA0..11DA9 Gunjala_Gondi
2262  Earlier: CLDR tickets specific to adding new numbering systems.
2263  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
2264  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
2265
2266*** merge the Unicode update branches back onto the trunk
2267- do not merge the icudata.jar and testdata.jar,
2268  instead rebuild them from merged & tested ICU4C
2269- make sure that changes to Unicode tools are checked in:
2270  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2271
2272---------------------------------------------------------------------------- ***
2273
2274Unicode 10.0 update for ICU 60
2275
2276http://www.unicode.org/versions/Unicode10.0.0/
2277http://www.unicode.org/versions/beta-10.0.0.html
2278http://blog.unicode.org/2017/03/unicode-100-beta-review.html
2279http://www.unicode.org/review/pri350/
2280http://www.unicode.org/reports/uax-proposed-updates.html
2281http://www.unicode.org/reports/tr44/tr44-19.html
2282
2283* Command-line environment setup
2284
2285UNICODE_DATA=~/unidata/uni10/20170605
2286CLDR_SRC=~/svn.cldr/uni10
2287ICU_ROOT=~/svn.icu/uni10
2288ICU_SRC=$ICU_ROOT/src
2289ICUDT=icudt60b
2290ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
2291ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
2292export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
2293
2294*** ICU Trac
2295
2296- ticket:12985: Unicode 10
2297- ticket:13061: undo hacks from emoji 5.0 update
2298- ticket:13062: add Emoji_Component property
2299- ^/branches/markus/uni10
2300
2301*** CLDR Trac
2302
2303- cldrbug 10055: Unicode 10
2304- cldrbug 9882: Unicode 10 script metadata
2305- cldrbug 10219: numbering systems for Unicode 10
2306
2307*** Unicode version numbers
2308- makedata.mak
2309- uchar.h
2310- com.ibm.icu.util.VersionInfo
2311- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2312
2313- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2314  so that the makefiles see the new version number.
2315
2316*** data files & enums & parser code
2317
2318* download files
2319- mkdir -p $UNICODE_DATA
2320- download Unicode 10.0 files into $UNICODE_DATA
2321  + subfolders: ucd, uca, idna, security
2322  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2323- download emoji 5.0 files into $UNICODE_DATA/emoji
2324
2325* for manual diffs: remove version suffixes from the file names
2326  ~$ unidata/desuffixucd.py $UNICODE_DATA
2327  (see https://sites.google.com/site/unicodetools/inputdata)
2328
2329* process and/or copy files
2330- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
2331  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2332  + For debugging, and tweaking how ppucd.txt is written,
2333    the tool has an --only_ppucd option:
2334    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
2335
2336- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
2337
2338* build ICU (make install)
2339  so that the tools build can pick up the new definitions from the installed header files.
2340
2341  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2342
2343* preparseucd.py changes
2344- remove or add new Unicode scripts from/to the
2345  only-in-ISO-15924 list according to the error messages:
2346    ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
2347  -> adjust _scripts_only_in_iso15924 as indicated
2348- fix other errors
2349    Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
2350  -> add vo=Vertical_Orientation to _ignored_properties
2351  -> later removed again, parsing the file, even though we do not yet store data for runtime use
2352
2353* new constants for new property values
2354- preparseucd.py error:
2355    ValueError: missing uchar.h enum constants for some property values:
2356    [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
2357                   u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
2358     (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
2359                  u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
2360                  u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
2361     (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
2362  = PropertyValueAliases.txt new property values (diff old & new .txt files)
2363    blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
2364    blk; Kana_Ext_A                       ; Kana_Extended_A
2365    blk; Masaram_Gondi                    ; Masaram_Gondi
2366    blk; Nushu                            ; Nushu
2367    blk; Soyombo                          ; Soyombo
2368    blk; Syriac_Sup                       ; Syriac_Supplement
2369    blk; Zanabazar_Square                 ; Zanabazar_Square
2370  -> add to uchar.h
2371    use long property names for enum constants,
2372    for the trailing comment get the block start code point: diff old & new Blocks.txt
2373  -> add to UCharacter.UnicodeBlock IDs
2374    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2375            replace  public static final int \1_ID = \2; \3
2376  -> add to UCharacter.UnicodeBlock objects
2377    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2378            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2379
2380    jg ; Malayalam_Bha                    ; Malayalam_Bha
2381    jg ; Malayalam_Ja                     ; Malayalam_Ja
2382    jg ; Malayalam_Lla                    ; Malayalam_Lla
2383    jg ; Malayalam_Llla                   ; Malayalam_Llla
2384    jg ; Malayalam_Nga                    ; Malayalam_Nga
2385    jg ; Malayalam_Nna                    ; Malayalam_Nna
2386    jg ; Malayalam_Nnna                   ; Malayalam_Nnna
2387    jg ; Malayalam_Nya                    ; Malayalam_Nya
2388    jg ; Malayalam_Ra                     ; Malayalam_Ra
2389    jg ; Malayalam_Ssa                    ; Malayalam_Ssa
2390    jg ; Malayalam_Tta                    ; Malayalam_Tta
2391  -> uchar.h & UCharacter.JoiningGroup
2392
2393    sc ; Gonm                             ; Masaram_Gondi
2394    sc ; Nshu                             ; Nushu
2395    sc ; Soyo                             ; Soyombo
2396    sc ; Zanb                             ; Zanabazar_Square
2397  -> uscript.h & com.ibm.icu.lang.UScript
2398  -> Nushu had been added already
2399  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2400      and in com.ibm.icu.dev.test.lang.TestUScript.java
2401
2402* New properties as shown in PropertyValueAliases.txt changes
2403- boolean Emoji_Component from emoji 5
2404  -> uchar.h & UProperty.java
2405- boolean
2406    # Regional_Indicator (RI)
2407
2408    RI ; N                                ; No                               ; F                                ; False
2409    RI ; Y                                ; Yes                              ; T                                ; True
2410  -> uchar.h & UProperty.java
2411  -> single immutable range, to be hardcoded
2412- boolean
2413    # Prepended_Concatenation_Mark (PCM)
2414
2415    PCM; N                                ; No                               ; F                                ; False
2416    PCM; Y                                ; Yes                              ; T                                ; True
2417  -> was new in Unicode 9
2418  -> uchar.h & UProperty.java
2419- enumerated
2420    # Vertical_Orientation (vo)
2421
2422    vo ; R                                ; Rotated
2423    vo ; Tr                               ; Transformed_Rotated
2424    vo ; Tu                               ; Transformed_Upright
2425    vo ; U                                ; Upright
2426  -> only pre-parsed for now, but not yet stored for runtime use
2427
2428* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2429    (not strictly necessary for NOT_ENCODED scripts)
2430  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
2431
2432* generate normalization data files
2433  cd $ICU_ROOT/dbg/icu4c
2434  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
2435  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
2436  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
2437  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2438  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
2439
2440* build ICU (make install)
2441  so that the tools build can pick up the new definitions from the installed header files.
2442
2443  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2444
2445* build Unicode tools using CMake+make
2446
2447$ICU_SRC/tools/unicode/c/icudefs.txt:
2448
2449# Location (--prefix) of where ICU was installed.
2450set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
2451# Location of the ICU4C source tree.
2452set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
2453
2454  $ICU_ROOT/dbg/tools/unicode/c$
2455    cmake ../../../../src/tools/unicode/c
2456    make
2457
2458* generate core properties data files
2459  $ICU_ROOT/dbg/tools/unicode/c$
2460    genprops/genprops $ICU_SRC/icu4c
2461    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
2462    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
2463- rebuild ICU (make install) & tools
2464
2465* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2466  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2467- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2468- Unicode 6.0..10.0: U+2260, U+226E, U+226F
2469- nothing new in this Unicode version, no test file to update
2470
2471* run & fix ICU4C tests
2472- Andy handles RBBI & spoof check test failures
2473
2474* collation: CLDR collation root, UCA DUCET
2475
2476- UCA DUCET goes into Mark's Unicode tools, see
2477  https://sites.google.com/site/unicodetools/home#TOC-UCA
2478- CLDR root data files are checked into $CLDR_SRC/common/uca/
2479    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
2480
2481- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2482    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
2483- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2484    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
2485    (note removing the underscore before "Rules")
2486    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
2487- restore TODO diffs in UCARules.txt
2488    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
2489- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2490  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2491  from the CLDR root files (..._CLDR_..._SHORT.txt)
2492    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2493    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2494    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
2495- if CLDR common/uca/unihan-index.txt changes, then update
2496  CLDR common/collation/root.xml <collation type="private-unihan">
2497  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
2498
2499- run genuca, see command line above;
2500  deal with
2501    Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
2502    FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
2503        (add the character to genuca.cpp sampleCharsToScripts[])
2504  + look up the USCRIPT_ code for the new sample characters
2505    (should be obvious from the comment in the error output)
2506  + *add* mappings to sampleCharsToScripts[], do not replace them
2507    (in case the script sample characters flip-flop)
2508  + insert new scripts in DUCET script order, see the top_byte table
2509    at the beginning of FractionalUCA.txt
2510- rebuild ICU4C
2511
2512* Unihan collators
2513    https://sites.google.com/site/unicodetools/unihan
2514- run Unicode Tools
2515    org.unicode.draft.GenerateUnihanCollators
2516  with VM arguments
2517    -ea
2518    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
2519    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
2520    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
2521    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
2522    -DUVERSION=10.0.0
2523- run Unicode Tools
2524    org.unicode.draft.GenerateUnihanCollatorFiles
2525  with the same arguments
2526- check CLDR diffs
2527    cd $CLDR_SRC
2528    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2529    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2530- copy to CLDR
2531    cd $CLDR_SRC
2532    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2533    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2534- run CLDR unit tests, commit to CLDR
2535- generate ICU zh collation data: run CLDR
2536    org.unicode.cldr.icu.NewLdml2IcuConverter
2537  with program arguments
2538    -t collation
2539    -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
2540    -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
2541    -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
2542    -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
2543    zh
2544  and VM arguments
2545    -ea
2546    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
2547- rebuild ICU4C
2548
2549* run & fix ICU4C tests, now with new CLDR collation root data
2550- run all tests with the collation test data *_SHORT.txt or the full files
2551  (the full ones have comments, useful for debugging)
2552- note on intltest: if collate/UCAConformanceTest fails, then
2553  utility/MultithreadTest/TestCollators will fail as well;
2554  fix the conformance test before looking into the multi-thread test
2555
2556* update Java data files
2557- refresh just the UCD/UCA-related/derived files, just to be safe
2558- see (ICU4C)/source/data/icu4j-readme.txt
2559- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2560- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2561  output:
2562    ...
2563    Unicode .icu files built to ./out/build/icudt60l
2564    echo timestamp > uni-core-data
2565    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
2566    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
2567    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2568    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
2569    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
2570    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
2571    mkdir -p /tmp/icu4j/main/shared/data
2572    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2573    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
2574    mkdir -p /tmp/icu4j/main/shared/data
2575    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2576    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
2577- copy the big-endian Unicode data files to another location,
2578  separate from the other data files,
2579  and then refresh ICU4J
2580    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
2581    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2582    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2583    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2584    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2585    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2586    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2587    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2588    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2589    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2590
2591* When refreshing all of ICU4J data from ICU4C
2592- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2593- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
2594or
2595- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
2596
2597* update CollationFCD.java
2598  + copy & paste the initializers of lcccIndex[] etc. from
2599    ICU4C/source/i18n/collationfcd.cpp to
2600    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2601
2602* refresh Java test .txt files
2603- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2604    cd $ICU_SRC/icu4c/source/data/unidata
2605    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2606    cd ../../test/testdata
2607    cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2608    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2609
2610* run & fix ICU4J tests
2611
2612*** API additions
2613- send notice to icu-design about new born-@stable API (enum constants etc.)
2614
2615*** CLDR numbering systems
2616- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
2617  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
2618  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
2619
2620*** merge the Unicode update branches back onto the trunk
2621- do not merge the icudata.jar and testdata.jar,
2622  instead rebuild them from merged & tested ICU4C
2623- make sure that changes to Unicode tools are checked in:
2624  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2625
2626---------------------------------------------------------------------------- ***
2627
2628Emoji 5.0 update for ICU 59
2629- ICU 59 mostly remains on Unicode 9.0
2630- except updates bidi and segmentation data to Unicode 10 beta
2631
2632First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
2633
2634* Command-line environment setup
2635
2636ICU_ROOT=~/svn.icu/trunk
2637ICU_SRC_DIR=$ICU_ROOT/src
2638ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
2639ICUDT=icudt59b
2640export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2641SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
2642UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
2643
2644*** ICU Trac
2645
2646- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
2647- changes directly on trunk
2648
2649*** data files & enums & parser code
2650
2651* download files
2652
2653- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
2654- download emoji 5.0 beta files into the same uni90e50 folder
2655- download Unicode 10.0 beta files: ucd
2656  + copy Unicode 10 bidi files to the uni90e50/ucd folder:
2657    BidiBrackets.txt
2658    BidiCharacterTest.txt
2659    BidiMirroring.txt
2660    BidiTest.txt
2661    extracted/DerivedBidiClass.txt
2662  + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
2663    LineBreak.txt
2664    auxiliary/*
2665
2666* preparseucd.py changes
2667- adjust for combined trunks
2668- write new copyright lines
2669- ignore new Emoji_Component property for now
2670
2671* process and/or copy files
2672- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
2673  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2674
2675- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
2676
2677* build ICU (make install)
2678  so that the tools build can pick up the new definitions from the installed header files.
2679
2680  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
2681
2682* build Unicode tools using CMake+make
2683
2684~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
2685
2686# Location (--prefix) of where ICU was installed.
2687set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
2688# Location of the ICU4C source tree.
2689set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
2690
2691  ~/svn.icu/trunk/dbg/tools/unicode/c$
2692    cmake ../../../../src/tools/unicode/c
2693    make
2694
2695* generate core properties data files
2696  ~/svn.icu/trunk/dbg/tools/unicode/c$
2697    genprops/genprops $ICU4C_SRC_DIR
2698- rebuild ICU (make install) & tools
2699
2700* run & fix ICU4C tests
2701- Andy handles RBBI & spoof check test failures
2702
2703* update Java data files
2704- refresh just the UCD/UCA-related/derived files, just to be safe
2705- see (ICU4C)/source/data/icu4j-readme.txt
2706- mkdir /tmp/icu4j
2707- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2708  output:
2709    ...
2710    Unicode .icu files built to ./out/build/icudt59l
2711    echo timestamp > uni-core-data
2712    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
2713    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
2714    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2715    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
2716    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
2717    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
2718    mkdir -p /tmp/icu4j/main/shared/data
2719    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2720    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
2721    mkdir -p /tmp/icu4j/main/shared/data
2722    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2723    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
2724- copy the big-endian Unicode data files to another location,
2725  separate from the other data files,
2726  and then refresh ICU4J
2727    cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
2728    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2729    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2730    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2731    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2732    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2733    jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2734
2735* When refreshing all of ICU4J data from ICU4C
2736- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2737- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
2738or
2739- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
2740
2741* refresh Java test .txt files
2742- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2743    cd $ICU4C_SRC_DIR/source/data/unidata
2744    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2745    cd ../../test/testdata
2746    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2747    cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
2748
2749* run & fix ICU4J tests
2750
2751---------------------------------------------------------------------------- ***
2752
2753Unicode 9.0 update for ICU 58
2754
2755* Command-line environment setup
2756
2757ICU_ROOT=~/svn.icu/trunk
2758ICU_SRC_DIR=$ICU_ROOT/src
2759ICUDT=icudt58b
2760export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2761SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2762UNIDATA=$ICU_SRC_DIR/source/data/unidata
2763
2764http://www.unicode.org/review/pri323/  -- beta review
2765http://www.unicode.org/reports/uax-proposed-updates.html
2766http://www.unicode.org/versions/beta-9.0.0.html
2767http://www.unicode.org/versions/Unicode9.0.0/
2768http://www.unicode.org/reports/tr44/tr44-17.html
2769
2770*** ICU Trac
2771
2772- ticket:12526: integrate Unicode 9
2773- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
2774- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
2775
2776*** CLDR Trac
2777
2778- cldrbug 9414: UCA 9
2779- ^/branches/markus/uni90 at r11518 from trunk at r11517
2780
2781- cldrbug 8745: Unicode 9.0 script metadata
2782
2783*** Unicode version numbers
2784- makedata.mak
2785- uchar.h
2786- com.ibm.icu.util.VersionInfo
2787- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2788
2789- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2790  so that the makefiles see the new version number.
2791
2792*** data files & enums & parser code
2793
2794* file preparation
2795
2796- download UCD & IDNA files
2797- make sure that the Unicode data folder passed into preparseucd.py
2798  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2799- only for manual diffs: remove version suffixes from the file names
2800  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2801  (see https://sites.google.com/site/unicodetools/inputdata)
2802- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2803- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2804- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2805
2806- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
2807  and copy to $UNIDATA
2808    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
2809
2810* preparseucd.py changes
2811- remove or add new Unicode scripts from/to the
2812  only-in-ISO-15924 list according to the error messages:
2813    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
2814    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
2815    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
2816    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
2817  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2818      and in com.ibm.icu.dev.test.lang.TestUScript.java
2819- DerivedNumericValues.txt new numeric values
2820    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
2821    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
2822    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
2823    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
2824    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
2825  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
2826     uchar.c, UCharacterProperty.java
2827     to support a new series of values
2828- adjust preparseucd.py for Tangut algorithmic names
2829  in ppucd.txt:
2830    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
2831  ->
2832    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
2833- avoid block-compressing most String/Miscellaneous property values,
2834  triggered by genprops not coping with a multi-code point Case_Folding on
2835    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
2836  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
2837
2838* PropertyAliases.txt changes
2839- 1 new property PCM=Prepended_Concatenation_Mark
2840  Ignore: Only useful for layout engines.
2841  Ok to list in ppucd.txt.
2842
2843* PropertyValueAliases.txt new property values
2844    blk; Adlam                            ; Adlam
2845    blk; Bhaiksuki                        ; Bhaiksuki
2846    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
2847    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
2848    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
2849    blk; Marchen                          ; Marchen
2850    blk; Mongolian_Sup                    ; Mongolian_Supplement
2851    blk; Newa                             ; Newa
2852    blk; Osage                            ; Osage
2853    blk; Tangut                           ; Tangut
2854    blk; Tangut_Components                ; Tangut_Components
2855  -> add to uchar.h
2856    use long property names for enum constants
2857  -> add to UCharacter.UnicodeBlock IDs
2858    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2859            replace  public static final int \1_ID = \2; \3
2860  -> add to UCharacter.UnicodeBlock objects
2861    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2862            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2863
2864    GCB; EB                               ; E_Base
2865    GCB; EBG                              ; E_Base_GAZ
2866    GCB; EM                               ; E_Modifier
2867    GCB; GAZ                              ; Glue_After_Zwj
2868    GCB; ZWJ                              ; ZWJ
2869  -> uchar.h & UCharacter.GraphemeClusterBreak
2870
2871    jg ; African_Feh                      ; African_Feh
2872    jg ; African_Noon                     ; African_Noon
2873    jg ; African_Qaf                      ; African_Qaf
2874  -> uchar.h & UCharacter.JoiningGroup
2875
2876    lb ; EB                               ; E_Base
2877    lb ; EM                               ; E_Modifier
2878    lb ; ZWJ                              ; ZWJ
2879  -> uchar.h & UCharacter.LineBreak
2880
2881    sc ; Adlm                             ; Adlam
2882    sc ; Bhks                             ; Bhaiksuki
2883    sc ; Marc                             ; Marchen
2884    sc ; Newa                             ; Newa
2885    sc ; Osge                             ; Osage
2886    sc ; Tang                             ; Tangut
2887  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2888
2889    WB ; EB                               ; E_Base
2890    WB ; EBG                              ; E_Base_GAZ
2891    WB ; EM                               ; E_Modifier
2892    WB ; GAZ                              ; Glue_After_Zwj
2893    WB ; ZWJ                              ; ZWJ
2894  -> uchar.h & UCharacter.WordBreak
2895
2896* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2897    (not strictly necessary for NOT_ENCODED scripts)
2898  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2899
2900* generate normalization data files
2901  cd $ICU_ROOT/dbg
2902  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2903  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2904  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2905  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2906  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2907
2908* build ICU (make install)
2909  so that the tools build can pick up the new definitions from the installed header files.
2910
2911  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
2912
2913* build Unicode tools using CMake+make
2914
2915~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2916
2917  # Location (--prefix) of where ICU was installed.
2918  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2919  # Location of the ICU source tree.
2920  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2921
2922  ~/svn.icutools/trunk/dbg/unicode/c$
2923    cmake ../../../src/unicode/c
2924    make
2925
2926* generate core properties data files
2927  ~/svn.icutools/trunk/dbg/unicode/c$
2928    genprops/genprops $ICU_SRC_DIR
2929    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2930    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2931- rebuild ICU (make install) & tools
2932
2933* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2934  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2935- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2936- Unicode 6.0..9.0: U+2260, U+226E, U+226F
2937- nothing new in 9.0, no test file to update
2938
2939* run & fix ICU4C tests
2940- Andy handles RBBI & spoof check test failures
2941
2942* collation: CLDR collation root, UCA DUCET
2943
2944- UCA DUCET goes into Mark's Unicode tools, see
2945  https://sites.google.com/site/unicodetools/home#TOC-UCA
2946- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2947    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2948
2949- cd (CLDR UCA branch)/common/uca/
2950- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2951    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2952- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2953    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2954    (note removing the underscore before "Rules")
2955    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2956- restore TODO diffs in UCARules.txt
2957    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2958- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2959  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2960  from the CLDR root files (..._CLDR_..._SHORT.txt)
2961    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2962    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2963    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2964- if CLDR common/uca/unihan-index.txt changes, then update
2965  CLDR common/collation/root.xml <collation type="private-unihan">
2966  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2967
2968- run genuca, see command line above;
2969  deal with
2970    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
2971    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
2972        (add the character to genuca.cpp sampleCharsToScripts[])
2973  + look up the USCRIPT_ code for the new sample characters
2974    (should be obvious from the comment in the error output)
2975  + *add* mappings to sampleCharsToScripts[], do not replace them
2976    (in case the script sample characters flip-flop)
2977  + insert new scripts in DUCET script order, see the top_byte table
2978    at the beginning of FractionalUCA.txt
2979- rebuild ICU4C
2980
2981* Unihan collators
2982- run Unicode Tools
2983    org.unicode.draft.GenerateUnihanCollators
2984  with VM arguments
2985    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
2986    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
2987    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
2988    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2989    -DUVERSION=9.0.0
2990    -ea
2991- run Unicode Tools
2992    org.unicode.draft.GenerateUnihanCollatorFiles
2993  with the same arguments
2994- check CLDR diffs
2995    cd ~/svn.cldr/trunk
2996    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2997    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2998- copy to CLDR
2999    cd ~/svn.cldr/trunk
3000    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
3001    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
3002- commit to CLDR
3003- generate ICU zh collation data: run CLDR
3004    org.unicode.cldr.icu.NewLdml2IcuConverter
3005  with program arguments
3006    -t collation
3007    -s /home/mscherer/svn.cldr/trunk/common/collation
3008    -m /home/mscherer/svn.cldr/trunk/common/supplemental
3009    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
3010    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
3011    zh
3012  and VM arguments
3013    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
3014- rebuild ICU4C
3015
3016* run & fix ICU4C tests, now with new CLDR collation root data
3017- run all tests with the collation test data *_SHORT.txt or the full files
3018  (the full ones have comments, useful for debugging)
3019- note on intltest: if collate/UCAConformanceTest fails, then
3020  utility/MultithreadTest/TestCollators will fail as well;
3021  fix the conformance test before looking into the multi-thread test
3022
3023* update Java data files
3024- refresh just the UCD/UCA-related/derived files, just to be safe
3025- see (ICU4C)/source/data/icu4j-readme.txt
3026- mkdir /tmp/icu4j
3027- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3028  output:
3029    ...
3030    Unicode .icu files built to ./out/build/icudt58l
3031    echo timestamp > uni-core-data
3032    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
3033    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
3034    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
3035    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
3036    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
3037    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
3038    mkdir -p /tmp/icu4j/main/shared/data
3039    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3040    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
3041    mkdir -p /tmp/icu4j/main/shared/data
3042    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3043    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
3044- copy the big-endian Unicode data files to another location,
3045  separate from the other data files,
3046  and then refresh ICU4J
3047    cd ~/svn.icu/trunk/dbg/data/out/icu4j
3048    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3049    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3050    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3051    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3052    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
3053    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3054    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3055    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3056    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3057
3058* When refreshing all of ICU4J data from ICU4C
3059- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3060- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3061or
3062- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3063
3064* update CollationFCD.java
3065  + copy & paste the initializers of lcccIndex[] etc. from
3066    ICU4C/source/i18n/collationfcd.cpp to
3067    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
3068
3069* refresh Java test .txt files
3070- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3071    cd $ICU_SRC_DIR/source/data/unidata
3072    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3073    cd ../../test/testdata
3074    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3075    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3076
3077* run & fix ICU4J tests
3078
3079*** LayoutEngine script information
3080
3081* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3082  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3083  in the working directory.
3084
3085  (It also generates ScriptRunData.cpp, which is no longer needed.)
3086
3087  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
3088  (a plain text file)
3089  which maps ICU versions to the numbers of script/language constants
3090  that were added then.
3091  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
3092
3093  The generated files have a current copyright date and "@deprecated" statement.
3094
3095* Review changes, fix Java tool if necessary, and copy to ICU4C
3096  cd ~/svn.icu4j/trunk/src
3097  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3098  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
3099  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
3100
3101*** API additions
3102- send notice to icu-design about new born-@stable API (enum constants etc.)
3103
3104*** merge the Unicode update branches back onto the trunk
3105- do not merge the icudata.jar and testdata.jar,
3106  instead rebuild them from merged & tested ICU4C
3107- make sure that changes to Unicode tools & ICU tools are checked in
3108  http://www.unicode.org/utility/trac/log/trunk/unicodetools
3109  http://bugs.icu-project.org/trac/log/tools/trunk
3110
3111---------------------------------------------------------------------------- ***
3112
3113New script codes early in ICU 58: https://unicode-org.atlassian.net/browse/ICU-11764
3114
3115Adding
3116- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
3117- new combination/alias codes: Hanb, Jamo
3118  - used in CLDR 29 and in spoof checker
3119- new Z* code: Zsye
3120
3121Add new codes to uscript.h & UScript.java, see Unicode update logs.
3122  -> com.ibm.icu.lang.UScript
3123    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3124    replace  public static final int \1 = \2; \3
3125
3126Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
3127add new script codes.
3128"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
3129
3130Note: If we have to run preparseucd.py again before the Unicode 9 update,
3131then we need to manually keep/restore the new script codes.
3132
3133ICU_ROOT=~/svn.icu/trunk
3134ICU_SRC_DIR=$ICU_ROOT/src
3135ICUDT=icudt57b
3136export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3137SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3138UNIDATA=$ICU_SRC_DIR/source/data/unidata
3139
3140Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
3141see https://unicode-org.atlassian.net/browse/ICU-12141
3142
3143make install, then icutools cmake & make, then
3144~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
3145
3146Generate Java data as usual, only update pnames.icu & uprops.icu.
3147
3148*** LayoutEngine script information
3149
3150* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3151  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3152  in the working directory.
3153
3154  (It also generates ScriptRunData.cpp, which is no longer needed.)
3155
3156  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
3157  (a plain text file)
3158  which maps ICU versions to the numbers of script/language constants
3159  that were added then.
3160  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
3161
3162  The generated files have a current copyright date and "@deprecated" statement.
3163
3164* Review changes, fix Java tool if necessary, and copy to ICU4C
3165  cd ~/svn.icu4j/trunk/src
3166  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3167  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
3168  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
3169
3170---------------------------------------------------------------------------- ***
3171
3172Emoji properties added in ICU 57: https://unicode-org.atlassian.net/browse/ICU-11802
3173
3174Edit preparseucd.py to add & parse new properties.
3175They share the UCD property namespace but are not listed in PropertyAliases.txt.
3176
3177Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
3178Initial data from emoji/2.0/
3179
3180ICU_ROOT=~/svn.icu/trunk
3181ICU_SRC_DIR=$ICU_ROOT/src
3182ICUDT=icudt56b
3183export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3184SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3185UNIDATA=$ICU_SRC_DIR/source/data/unidata
3186
3187Add binary-property constants to uchar.h enum UProperty & UProperty.java.
3188
3189~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3190(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
3191
3192Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
3193
3194make install, then icutools cmake & make, then
3195~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
3196
3197Generate Java data as usual, only update pnames.icu & uprops.icu.
3198
3199---------------------------------------------------------------------------- ***
3200
3201Unicode 8.0 update for ICU 56
3202
3203* Command-line environment setup
3204
3205ICU_ROOT=~/svn.icu/trunk
3206ICU_SRC_DIR=$ICU_ROOT/src
3207ICUDT=icudt56b
3208export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3209SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3210UNIDATA=$ICU_SRC_DIR/source/data/unidata
3211
3212http://www.unicode.org/review/pri297/  -- beta review
3213http://www.unicode.org/reports/uax-proposed-updates.html
3214http://unicode.org/versions/beta-8.0.0.html
3215http://www.unicode.org/versions/Unicode8.0.0/
3216http://www.unicode.org/reports/tr44/tr44-15.html
3217
3218*** ICU Trac
3219
3220- ticket:11574: Unicode 8
3221- C++ branches/markus/uni80 at r37351 from trunk at r37343
3222- Java branches/markus/uni80 at r37352 from trunk at r37338
3223
3224*** CLDR Trac
3225
3226- cldrbug 8311: UCA 8
3227- branches/markus/uni80 at r11518 from trunk at r11517
3228
3229- cldrbug 8109: Unicode 8.0 script metadata
3230- cldrbug 8418: Updated segmentation for Unicode 8.0
3231
3232*** Unicode version numbers
3233- makedata.mak
3234- uchar.h
3235- com.ibm.icu.util.VersionInfo
3236- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3237
3238- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3239  so that the makefiles see the new version number.
3240
3241*** data files & enums & parser code
3242
3243* file preparation
3244
3245- download UCD & IDNA files
3246- make sure that the Unicode data folder passed into preparseucd.py
3247  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3248- only for manual diffs: remove version suffixes from the file names
3249  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
3250  (see https://sites.google.com/site/unicodetools/inputdata)
3251- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
3252- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3253- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3254
3255- also: from http://unicode.org/Public/security/8.0.0/ download new
3256  confusables.txt & confusablesWholeScript.txt
3257  and copy to $UNIDATA
3258    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
3259    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
3260
3261* initial preparseucd.py changes
3262- remove new Unicode scripts from the
3263  only-in-ISO-15924 list according to the error message:
3264    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
3265    from _scripts_only_in_iso15924
3266  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3267      and in com.ibm.icu.dev.test.lang.TestUScript.java
3268- property and file name change:
3269    IndicMatraCategory -> IndicPositionalCategory
3270- UnicodeData.txt unusual numeric values (improper fractions)
3271    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
3272    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
3273    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
3274    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
3275    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
3276    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
3277    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
3278    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
3279    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
3280    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
3281  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
3282     which are listed in DerivedNumericValues.txt;
3283     keeps storage in data file simple
3284
3285* PropertyValueAliases.txt changes
3286- 10 new Block (blk) values:
3287    blk; Ahom                             ; Ahom
3288    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
3289    blk; Cherokee_Sup                     ; Cherokee_Supplement
3290    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
3291    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
3292    blk; Hatran                           ; Hatran
3293    blk; Multani                          ; Multani
3294    blk; Old_Hungarian                    ; Old_Hungarian
3295    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
3296    blk; Sutton_SignWriting               ; Sutton_SignWriting
3297  -> add to uchar.h
3298    use long property names for enum constants
3299  -> add to UCharacter.UnicodeBlock IDs
3300    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3301            replace  public static final int \1_ID = \2; \3
3302  -> add to UCharacter.UnicodeBlock objects
3303    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3304            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3305- 6 new Script (sc) values:
3306    sc ; Ahom                             ; Ahom
3307    sc ; Hatr                             ; Hatran
3308    sc ; Hluw                             ; Anatolian_Hieroglyphs
3309    sc ; Hung                             ; Old_Hungarian
3310    sc ; Mult                             ; Multani
3311    sc ; Sgnw                             ; SignWriting
3312  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
3313
3314* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3315    (not strictly necessary for NOT_ENCODED scripts)
3316  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
3317
3318* generate normalization data files
3319  cd $ICU_ROOT/dbg
3320  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
3321  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3322  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3323  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3324  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3325
3326* build ICU (make install)
3327  so that the tools build can pick up the new definitions from the installed header files.
3328
3329  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3330
3331* build Unicode tools using CMake+make
3332
3333~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3334
3335  # Location (--prefix) of where ICU was installed.
3336  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
3337  # Location of the ICU source tree.
3338  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
3339
3340  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3341  ~/svn.icutools/trunk/dbg/unicode/c$ make
3342
3343* generate core properties data files
3344- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
3345- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
3346- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
3347- rebuild ICU (make install) & tools
3348- run genuca again (see step above) so that it picks up the new nfc.nrm
3349- rebuild ICU (make install) & tools
3350
3351* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3352  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3353- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3354- Unicode 6.0..8.0: U+2260, U+226E, U+226F
3355- nothing new in 8.0, no test file to update
3356
3357* run & fix ICU4C tests
3358- bad Cherokee case folding due to difference in fallbacks:
3359  UCD case folding falls back to no mapping,
3360  ICU runtime case folding falls back to lowercasing;
3361  fixed casepropsbuilder.cpp to generate scf mappings to self
3362  when there is an slc mapping but no scf
3363- Andy handles RBBI & spoof check test failures
3364
3365* collation: CLDR collation root, UCA DUCET
3366
3367- UCA DUCET goes into Mark's Unicode tools, see
3368  https://sites.google.com/site/unicodetools/home#TOC-UCA
3369- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
3370- cd (CLDR UCA branch)/common/uca/
3371- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3372  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3373- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3374    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
3375    (note removing the underscore before "Rules")
3376    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3377- restore TODO diffs in UCARules.txt
3378    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3379- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3380  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3381  from the CLDR root files (..._CLDR_..._SHORT.txt)
3382    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3383    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3384    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3385- if CLDR common/uca/unihan-index.txt changes, then update
3386  CLDR common/collation/root.xml <collation type="private-unihan">
3387  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
3388- run genuca, see command line above;
3389  deal with
3390    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
3391        (add the character to genuca.cpp sampleCharsToScripts[])
3392  + look up the script for the new sample characters
3393    (e.g., in FractionalUCA.txt)
3394  + *add* mappings to sampleCharsToScripts[], do not replace them
3395    (in case the script sample characters flip-flop)
3396  + insert new scripts in DUCET script order, see the top_byte table
3397    at the beginning of FractionalUCA.txt
3398- rebuild ICU4C
3399
3400* run & fix ICU4C tests, now with new CLDR collation root data
3401- run all tests with the collation test data *_SHORT.txt or the full files
3402  (the full ones have comments, useful for debugging)
3403- note on intltest: if collate/UCAConformanceTest fails, then
3404  utility/MultithreadTest/TestCollators will fail as well;
3405  fix the conformance test before looking into the multi-thread test
3406- fixed bug in CollationWeights::getWeightRanges()
3407  exposed by new data and CollationTest::TestRootElements
3408
3409* update Java data files
3410- refresh just the UCD/UCA-related/derived files, just to be safe
3411- see (ICU4C)/source/data/icu4j-readme.txt
3412- mkdir /tmp/icu4j
3413- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3414  output:
3415    ...
3416    Unicode .icu files built to ./out/build/icudt56l
3417    echo timestamp > uni-core-data
3418    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
3419    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
3420    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
3421    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
3422    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
3423    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
3424    mkdir -p /tmp/icu4j/main/shared/data
3425    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3426    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
3427    mkdir -p /tmp/icu4j/main/shared/data
3428    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3429    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
3430- copy the big-endian Unicode data files to another location,
3431  separate from the other data files,
3432  and then refresh ICU4J
3433    cd ~/svn.icu/trunk/dbg/data/out/icu4j
3434    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3435    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3436    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3437    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3438    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
3439    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3440    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3441    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3442    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3443
3444* When refreshing all of ICU4J data from ICU4C
3445- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3446- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3447or
3448- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3449
3450* update CollationFCD.java
3451  + copy & paste the initializers of lcccIndex[] etc. from
3452    ICU4C/source/i18n/collationfcd.cpp to
3453    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
3454
3455* refresh Java test .txt files
3456- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3457    cd $ICU_SRC_DIR/source/data/unidata
3458    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3459    cd ../../test/testdata
3460    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3461    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3462
3463* run & fix ICU4J tests
3464
3465*** LayoutEngine script information
3466
3467* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
3468  because the layout engine was deprecated in ICU 54.
3469  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
3470  to write lines that we used to add manually.
3471
3472* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3473  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3474  in the working directory.
3475
3476  (It also generates ScriptRunData.cpp, which is no longer needed.)
3477
3478  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
3479  (a plain text file)
3480  which maps ICU versions to the numbers of script/language constants
3481  that were added then.
3482  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
3483
3484  The generated files have a current copyright date and "@deprecated" statement.
3485
3486* Review changes, fix Java tool if necessary, and copy to ICU4C
3487  cd ~/svn.icu4j/trunk/src
3488  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3489  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
3490  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
3491
3492*** API additions
3493- send notice to icu-design about new born-@stable API (enum constants etc.)
3494
3495*** merge the Unicode update branches back onto the trunk
3496- do not merge the icudata.jar and testdata.jar,
3497  instead rebuild them from merged & tested ICU4C
3498- make sure that changes to Unicode tools & ICU tools are checked in
3499  http://www.unicode.org/utility/trac/log/trunk/unicodetools
3500  http://bugs.icu-project.org/trac/log/tools/trunk
3501
3502---------------------------------------------------------------------------- ***
3503
3504Unicode 7.0 update for ICU 54
3505
3506http://www.unicode.org/review/pri271/  -- beta review
3507http://www.unicode.org/reports/uax-proposed-updates.html
3508http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
3509http://www.unicode.org/reports/tr44/tr44-13.html
3510
3511*** ICU Trac
3512
3513- ticket 10821: Unicode 7.0, UCA 7.0
3514- C++ branches/markus/uni70 at r35584 from trunk at r35580
3515- Java branches/markus/uni70 at r35587 from trunk at r35545
3516
3517*** CLDR Trac
3518
3519- ticket 7195: UCA 7.0 CLDR root collation
3520- branches/markus/uni70 at r10062 from trunk at r10061
3521
3522- ticket 6762: script metadata for Unicode 7.0 new scripts
3523
3524*** Unicode version numbers
3525- makedata.mak
3526- uchar.h
3527- com.ibm.icu.util.VersionInfo
3528- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3529
3530- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3531  so that the makefiles see the new version number.
3532
3533*** data files & enums & parser code
3534
3535* file preparation
3536
3537- download UCD & IDNA files
3538- make sure that the Unicode data folder passed into preparseucd.py
3539  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3540- only for manual diffs: remove version suffixes from the file names
3541  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
3542  (see https://sites.google.com/site/unicodetools/inputdata)
3543- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
3544- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
3545- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3546- Restore TODO diffs in source/data/unidata/UCARules.txt
3547    cd $ICU_SRC_DIR
3548    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
3549- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
3550
3551- also: from http://unicode.org/Public/security/7.0.0/ download new
3552  confusables.txt & confusablesWholeScript.txt
3553  and copy to $ICU_ROOT/src/source/data/unidata/
3554
3555* initial preparseucd.py changes
3556- remove new Unicode scripts from the
3557  only-in-ISO-15924 list according to the error message:
3558    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
3559                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
3560                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
3561    from _scripts_only_in_iso15924
3562  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3563      and in com.ibm.icu.dev.test.lang.TestUScript.java
3564- NamesList.txt now has a heading with a non-ASCII character
3565  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
3566  + escape non-ASCII characters in heading comments
3567- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
3568  + get the copyright from the first file whose copyright line contains the current year
3569
3570* PropertyValueAliases.txt changes
3571- 32 new Block (blk) values:
3572    blk; Bassa_Vah                        ; Bassa_Vah
3573    blk; Caucasian_Albanian               ; Caucasian_Albanian
3574    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
3575    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
3576    blk; Duployan                         ; Duployan
3577    blk; Elbasan                          ; Elbasan
3578    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
3579    blk; Grantha                          ; Grantha
3580    blk; Khojki                           ; Khojki
3581    blk; Khudawadi                        ; Khudawadi
3582    blk; Latin_Ext_E                      ; Latin_Extended_E
3583    blk; Linear_A                         ; Linear_A
3584    blk; Mahajani                         ; Mahajani
3585    blk; Manichaean                       ; Manichaean
3586    blk; Mende_Kikakui                    ; Mende_Kikakui
3587    blk; Modi                             ; Modi
3588    blk; Mro                              ; Mro
3589    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
3590    blk; Nabataean                        ; Nabataean
3591    blk; Old_North_Arabian                ; Old_North_Arabian
3592    blk; Old_Permic                       ; Old_Permic
3593    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
3594    blk; Pahawh_Hmong                     ; Pahawh_Hmong
3595    blk; Palmyrene                        ; Palmyrene
3596    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
3597    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
3598    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
3599    blk; Siddham                          ; Siddham
3600    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
3601    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
3602    blk; Tirhuta                          ; Tirhuta
3603    blk; Warang_Citi                      ; Warang_Citi
3604  -> add to uchar.h
3605    use long property names for enum constants
3606  -> add to UCharacter.UnicodeBlock IDs
3607    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3608            replace  public static final int \1_ID = \2; \3
3609  -> add to UCharacter.UnicodeBlock objects
3610    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3611            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3612- 28 new Joining_Group (jg) values:
3613    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
3614    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
3615    jg ; Manichaean_Beth                  ; Manichaean_Beth
3616    jg ; Manichaean_Daleth                ; Manichaean_Daleth
3617    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
3618    jg ; Manichaean_Five                  ; Manichaean_Five
3619    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
3620    jg ; Manichaean_Heth                  ; Manichaean_Heth
3621    jg ; Manichaean_Hundred               ; Manichaean_Hundred
3622    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
3623    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
3624    jg ; Manichaean_Mem                   ; Manichaean_Mem
3625    jg ; Manichaean_Nun                   ; Manichaean_Nun
3626    jg ; Manichaean_One                   ; Manichaean_One
3627    jg ; Manichaean_Pe                    ; Manichaean_Pe
3628    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
3629    jg ; Manichaean_Resh                  ; Manichaean_Resh
3630    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
3631    jg ; Manichaean_Samekh                ; Manichaean_Samekh
3632    jg ; Manichaean_Taw                   ; Manichaean_Taw
3633    jg ; Manichaean_Ten                   ; Manichaean_Ten
3634    jg ; Manichaean_Teth                  ; Manichaean_Teth
3635    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
3636    jg ; Manichaean_Twenty                ; Manichaean_Twenty
3637    jg ; Manichaean_Waw                   ; Manichaean_Waw
3638    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
3639    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
3640    jg ; Straight_Waw                     ; Straight_Waw
3641  -> uchar.h & UCharacter.JoiningGroup
3642- 23 new Script (sc) values:
3643    sc ; Aghb                             ; Caucasian_Albanian
3644    sc ; Bass                             ; Bassa_Vah
3645    sc ; Dupl                             ; Duployan
3646    sc ; Elba                             ; Elbasan
3647    sc ; Gran                             ; Grantha
3648    sc ; Hmng                             ; Pahawh_Hmong
3649    sc ; Khoj                             ; Khojki
3650    sc ; Lina                             ; Linear_A
3651    sc ; Mahj                             ; Mahajani
3652    sc ; Mani                             ; Manichaean
3653    sc ; Mend                             ; Mende_Kikakui
3654    sc ; Modi                             ; Modi
3655    sc ; Mroo                             ; Mro
3656    sc ; Narb                             ; Old_North_Arabian
3657    sc ; Nbat                             ; Nabataean
3658    sc ; Palm                             ; Palmyrene
3659    sc ; Pauc                             ; Pau_Cin_Hau
3660    sc ; Perm                             ; Old_Permic
3661    sc ; Phlp                             ; Psalter_Pahlavi
3662    sc ; Sidd                             ; Siddham
3663    sc ; Sind                             ; Khudawadi
3664    sc ; Tirh                             ; Tirhuta
3665    sc ; Wara                             ; Warang_Citi
3666  -> uscript.h (many were added before)
3667    comment "Mende Kikakui" for USCRIPT_MENDE
3668    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
3669  -> com.ibm.icu.lang.UScript
3670    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3671    replace  public static final int \1 = \2; \3
3672- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3673  (added 2012-11-01)
3674    Ahom        338     Ahom
3675    Hatr        127     Hatran
3676    Mult        323     Multani
3677  (added 2013-10-12)
3678    Modi        324     Modi
3679    Pauc        263     Pau Cin Hau
3680    Sidd        302     Siddham
3681  -> uscript.h (some overlap with additions from Unicode)
3682  -> com.ibm.icu.lang.UScript
3683    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3684    replace  public static final int \1 = \2; \3
3685  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
3686  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3687      and in com.ibm.icu.dev.test.lang.TestUScript.java
3688
3689* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3690    (not strictly necessary for NOT_ENCODED scripts)
3691  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
3692
3693* generate normalization data files
3694- cd $ICU_ROOT/dbg
3695- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
3696- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
3697- UNIDATA=$ICU_SRC_DIR/source/data/unidata
3698- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
3699- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3700- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3701- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3702- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3703
3704* build ICU (make install)
3705  so that the tools build can pick up the new definitions from the installed header files.
3706
3707~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3708
3709* build Unicode tools using CMake+make
3710
3711~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3712
3713# Location (--prefix) of where ICU was installed.
3714set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
3715# Location of the ICU source tree.
3716set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
3717
3718~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3719~/svn.icutools/trunk/dbg/unicode/c$ make
3720
3721* genprops work
3722- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
3723  + add second array of Joining_Group values for at most 10800..10FFF
3724    icutools: unicode/c/genprops/bidipropsbuilder.cpp
3725    icu: source/common/ubidi_props.h/.c/_data.h
3726    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
3727
3728* generate core properties data files
3729- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
3730- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
3731- rebuild ICU (make install) & tools
3732- run genuca again (see step above) so that it picks up the new nfc.nrm
3733- rebuild ICU (make install) & tools
3734
3735* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3736  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3737- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3738- Unicode 6.0..7.0: U+2260, U+226E, U+226F
3739- nothing new in 7.0, no test file to update
3740
3741* run & fix ICU4C tests
3742
3743* update Java data files
3744- refresh just the UCD-related files, just to be safe
3745- see (ICU4C)/source/data/icu4j-readme.txt
3746- mkdir /tmp/icu4j
3747- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3748  output:
3749    ...
3750    Unicode .icu files built to ./out/build/icudt53l
3751    echo timestamp > uni-core-data
3752    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
3753    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
3754    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3755    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
3756    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
3757    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
3758    mkdir -p /tmp/icu4j/main/shared/data
3759    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3760    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
3761    mkdir -p /tmp/icu4j/main/shared/data
3762    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3763    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
3764- copy the big-endian Unicode data files to another location,
3765  separate from the other data files
3766    ICUDT=icudt54b
3767    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3768    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3769    cd ~/svn.icu/uni70/dbg/data/out/icu4j
3770    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3771    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3772    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
3773    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
3774    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3775    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
3776- refresh ICU4J
3777    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3778
3779* update CollationFCD.java
3780  + copy & paste the initializers of lcccIndex[] etc. from
3781    ICU4C/source/i18n/collationfcd.cpp to
3782    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
3783
3784* refresh Java test .txt files
3785- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3786    cd $ICU_SRC_DIR/source/data/unidata
3787    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3788    cd ../../test/testdata
3789    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3790    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
3791
3792* UCA
3793
3794- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
3795- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
3796- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
3797- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
3798- output files are in ~/svn.unitools/Generated/uca/7.0.0/
3799- review data; compare files, use blankweights.sed or similar
3800  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
3801- cd ~/svn.unitools/Generated/uca/7.0.0/
3802- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3803  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3804- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3805    (note removing the underscore before "Rules")
3806    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3807- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3808  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3809  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3810    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3811    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3812    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3813- run genuca, see command line above
3814- rebuild ICU4C
3815- refresh ICU4J collation data:
3816  (subset of instructions above for properties data refresh, except copies all coll/*)
3817    ICUDT=icudt54b
3818    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3819    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3820    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3821    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3822- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3823- note on intltest: if collate/UCAConformanceTest fails, then
3824  utility/MultithreadTest/TestCollators will fail as well;
3825  fix the conformance test before looking into the multi-thread test
3826- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
3827- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
3828  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
3829
3830* When refreshing all of ICU4J data from ICU4C
3831- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3832- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3833or
3834- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3835
3836* run & fix ICU4J tests
3837
3838*** LayoutEngine script information
3839
3840(For details see the Unicode 5.2 change log below.)
3841
3842* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3843  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3844  in the working directory.
3845  (It also generates ScriptRunData.cpp, which is no longer needed.)
3846
3847  The generated files have a current copyright date and "@stable" statement.
3848  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
3849  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
3850  which may not contain dots any more.
3851
3852- diff current <icu>/source/layout files vs. generated ones
3853    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3854  review and manually merge desired changes;
3855  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
3856  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3857- if you just copy the above files, then
3858  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3859  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3860
3861*** API additions
3862- send notice to icu-design about new born-@stable API (enum constants etc.)
3863
3864*** merge the Unicode update branches back onto the trunk
3865- do not merge the icudata.jar and testdata.jar,
3866  instead rebuild them from merged & tested ICU4C
3867
3868---------------------------------------------------------------------------- ***
3869
3870Unicode 6.3 update
3871
3872http://www.unicode.org/review/pri249/  -- beta review
3873http://www.unicode.org/reports/uax-proposed-updates.html
3874http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
3875http://www.unicode.org/reports/tr44/tr44-11.html
3876
3877*** ICU Trac
3878
3879- ticket 10128: update ICU to Unicode 6.3 beta
3880- ticket 10168: update ICU to Unicode 6.3 final
3881- C++ branches/markus/uni63 at r33552 from trunk at r33551
3882- Java branches/markus/uni63 at r33550 from trunk at r33553
3883
3884- ticket 10142: implement Unicode 6.3 bidi algorithm additions
3885
3886*** Unicode version numbers
3887- makedata.mak
3888- uchar.h
3889  (configure.in & configure: have been modified to extract the version from uchar.h)
3890- com.ibm.icu.util.VersionInfo
3891- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3892
3893- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3894  so that the makefiles see the new version number.
3895
3896*** data files & enums & parser code
3897
3898* file preparation
3899
3900- download UCD, UCA & IDNA files
3901- make sure that the Unicode data folder passed into preparseucd.py
3902  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3903- modify preparseucd.py:
3904  parse new file BidiBrackets.txt
3905  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
3906- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
3907- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3908- Check test file diffs for previously commented-out, known-failing data lines;
3909  probably need to keep those commented out.
3910
3911* PropertyAliases.txt changes
3912- 1 new Enumerated Property
3913  bpt                      ; Bidi_Paired_Bracket_Type
3914  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
3915  -> ubidi_props.h & .c & UBiDiProps.java
3916  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
3917  -> uprops.cpp
3918  -> change ubidi.icu format version from 2.0 to 2.1
3919- 1 new Miscellaneous Property
3920  bpb                      ; Bidi_Paired_Bracket
3921  -> uchar.h & UProperty.java
3922  -> ppucd.h & .cpp
3923
3924* PropertyValueAliases.txt changes
3925- 3 Bidi_Paired_Bracket_Type (bpt) values:
3926  bpt; c                                ; Close
3927  bpt; n                                ; None
3928  bpt; o                                ; Open
3929  -> uchar.h & UCharacter.BidiPairedBracketType
3930  -> ubidi_props.h & .c & UBiDiProps.java
3931  -> change ubidi.icu format version from 2.0 to 2.1
3932- 4 new Bidi_Class (bc) values:
3933  bc ; FSI                              ; First_Strong_Isolate
3934  bc ; LRI                              ; Left_To_Right_Isolate
3935  bc ; RLI                              ; Right_To_Left_Isolate
3936  bc ; PDI                              ; Pop_Directional_Isolate
3937  -> uchar.h & UCharacterEnums.ECharacterDirection
3938  -> until the bidi code gets updated,
3939     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
3940- 3 new Word_Break (WB) values:
3941  WB ; HL                               ; Hebrew_Letter
3942  WB ; SQ                               ; Single_Quote
3943  WB ; DQ                               ; Double_Quote
3944  -> uchar.h & UCharacter.WordBreak
3945  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
3946- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3947  (added 2012-10-16)
3948  Aghb  239     Caucasian Albanian
3949  Mahj  314     Mahajani
3950  -> uscript.h
3951  -> com.ibm.icu.lang.UScript
3952    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3953    replace  public static final int \1 = \2;\3
3954  -> preparseucd.py _scripts_only_in_iso15924
3955  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3956      and in com.ibm.icu.dev.test.lang.TestUScript.java
3957  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3958     (not strictly necessary for NOT_ENCODED scripts)
3959
3960* generate normalization data files
3961- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
3962- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
3963- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
3964- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3965- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3966- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3967- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3968
3969* build ICU (make install)
3970  so that the tools build can pick up the new definitions from the installed header files.
3971
3972~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3973
3974* build Unicode tools using CMake+make
3975
3976~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3977
3978# Location (--prefix) of where ICU was installed.
3979set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
3980# Location of the ICU source tree.
3981set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
3982
3983~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3984~/svn.icutools/trunk/dbg/unicode/c$ make
3985
3986* generate core properties data files
3987- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
3988- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
3989- rebuild ICU (make install) & tools
3990- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3991- rebuild ICU (make install) & tools
3992
3993* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3994  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3995- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3996- Unicode 6.0..6.3: U+2260, U+226E, U+226F
3997- nothing new in 6.3, no test file to update
3998
3999* update Java data files
4000- refresh just the UCD-related files, just to be safe
4001- see (ICU4C)/source/data/icu4j-readme.txt
4002- mkdir /tmp/icu4j
4003- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4004  output:
4005    ...
4006    Unicode .icu files built to ./out/build/icudt52l
4007    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
4008    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
4009    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4010    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
4011    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
4012    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
4013    mkdir -p /tmp/icu4j/main/shared/data
4014    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4015    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
4016    mkdir -p /tmp/icu4j/main/shared/data
4017    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
4018    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
4019- copy the big-endian Unicode data files to another location,
4020  separate from the other data files
4021    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
4022    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
4023    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
4024    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
4025    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
4026    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
4027    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
4028- refresh ICU4J
4029    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
4030
4031* refresh Java test .txt files
4032- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4033
4034* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
4035
4036- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
4037- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
4038- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4039- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4040  (note removing the underscore before "Rules")
4041- update (ICU4C)/source/test/testdata/CollationTest_*.txt
4042  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4043  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
4044- check test file diffs for previously commented-out, known-failing data lines;
4045  probably need to keep those commented out
4046- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
4047- run genuca, see command line above
4048- rebuild ICU4C
4049- refresh ICU4J collation data:
4050  (subset of instructions above for properties data refresh, except copies all coll/*)
4051    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4052    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
4053    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
4054    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
4055- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
4056- note on intltest: if collate/UCAConformanceTest fails, then
4057  utility/MultithreadTest/TestCollators will fail as well;
4058  fix the conformance test before looking into the multi-thread test
4059
4060* test ICU, fix test code where necessary
4061
4062* When refreshing all of ICU4J data from ICU4C
4063- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4064- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4065or
4066- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4067
4068*** LayoutEngine script information
4069- skipped for Unicode 6.3: no new scripts
4070
4071*** merge the Unicode update branches back onto the trunk
4072- do not merge the icudata.jar and testdata.jar,
4073  instead rebuild them from merged & tested ICU4C
4074
4075---------------------------------------------------------------------------- ***
4076
4077Unicode 6.2 update
4078
4079http://www.unicode.org/review/pri230/
4080http://www.unicode.org/versions/beta-6.2.0.html
4081http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
4082http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
4083http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
4084http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
4085http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
4086http://unicode.org/Public/idna/6.2.0/
4087
4088*** ICU Trac
4089
4090- ticket 9515: Unicode 6.2: final ICU update
4091
4092- ticket 9514: UCA 6.2: fix UCARules.txt
4093
4094- ticket 9437: update ICU to Unicode 6.2
4095- C++ branches/markus/uni62 at r32050 from trunk at r32041
4096- Java branches/markus/uni62 at r32068 from trunk at r32066
4097
4098*** Unicode version numbers
4099- makedata.mak
4100- uchar.h
4101  (configure.in & configure: have been modified to extract the version from uchar.h)
4102- com.ibm.icu.util.VersionInfo
4103- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
4104
4105*** data files & enums & parser code
4106
4107* file preparation
4108
4109- download UCD, UCA & IDNA files
4110- make sure that the Unicode data folder passed into preparseucd.py
4111  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
4112- modify preparseucd.py: NamesList.txt is now in UTF-8
4113- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
4114- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
4115- Check test file diffs for previously commented-out, known-failing data lines;
4116  probably need to keep those commented out.
4117
4118* PropertyValueAliases.txt changes
4119- 1 new Line_Break (lb) value:
4120  lb ; RI                               ; Regional_Indicator
4121  -> uchar.h & UCharacter.LineBreak
4122- 1 new Word_Break (WB) value:
4123  WB ; RI                               ; Regional_Indicator
4124  -> uchar.h & UCharacter.WordBreak
4125- 1 new Grapheme_Cluster_Break (GCB) value:
4126  GCB; RI                               ; Regional_Indicator
4127  -> uchar.h & UCharacter.GraphemeClusterBreak
4128
4129* 3 new numeric values
4130  The new value -1, which was really supposed to be NaN but that would have required
4131  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
4132  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
4133    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
4134    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
4135  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
4136    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
4137    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
4138  -> uprops.h, uchar.c & UCharacterProperty.java
4139  -> cucdtst.c & UCharacterTest.java
4140
4141* generate normalization data files
4142- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
4143- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
4144- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
4145- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
4146- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
4147- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
4148- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
4149
4150* build ICU (make install)
4151  so that the tools build can pick up the new definitions from the installed header files.
4152* build Unicode tools using CMake+make
4153
4154* generate core properties data files
4155- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
4156- in initial bootstrapping, change the UCA version
4157  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
4158- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
4159- rebuild ICU (make install) & tools
4160  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
4161    check if the UCA version in FractionalUCA.txt matches the new Unicode version
4162    (see step above)
4163- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
4164- rebuild ICU (make install) & tools
4165
4166* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4167  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4168- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4169- Unicode 6.0..6.2: U+2260, U+226E, U+226F
4170- nothing new in 6.2, no test file to update
4171
4172* update Java data files
4173- refresh just the UCD-related files, just to be safe
4174- see (ICU4C)/source/data/icu4j-readme.txt
4175- mkdir /tmp/icu4j
4176- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4177  output:
4178    ...
4179    Unicode .icu files built to ./out/build/icudt50l
4180    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
4181    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
4182    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4183    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
4184    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
4185    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
4186    mkdir -p /tmp/icu4j/main/shared/data
4187    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4188    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
4189    mkdir -p /tmp/icu4j/main/shared/data
4190    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
4191    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
4192- copy the big-endian Unicode data files to another location,
4193  separate from the other data files
4194    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4195    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
4196    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
4197    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
4198    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
4199    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4200    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
4201- refresh ICU4J
4202    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
4203
4204* refresh Java test .txt files
4205- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4206
4207* UCA
4208
4209- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
4210- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
4211- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4212- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4213  (note removing the underscore before "Rules")
4214- update (ICU4C)/source/test/testdata/CollationTest_*.txt
4215  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4216  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
4217- check test file diffs for previously commented-out, known-failing data lines;
4218  probably need to keep those commented out
4219- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
4220- run genuca, see command line above
4221- rebuild ICU4C
4222- refresh ICU4J collation data:
4223  (subset of instructions above for properties data refresh, except copies all coll/*)
4224    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4225    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4226    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
4227    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
4228- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
4229- note on intltest: if collate/UCAConformanceTest fails, then
4230  utility/MultithreadTest/TestCollators will fail as well;
4231  fix the conformance test before looking into the multi-thread test
4232
4233* test ICU, fix test code where necessary
4234
4235* When refreshing all of ICU4J data from ICU4C
4236- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4237- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4238or
4239- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4240
4241*** LayoutEngine script information
4242- skipped for Unicode 6.2: no new scripts
4243
4244*** merge the Unicode update branches back onto the trunk
4245- do not merge the icudata.jar and testdata.jar,
4246  instead rebuild them from merged & tested ICU4C
4247
4248---------------------------------------------------------------------------- ***
4249
4250Future Unicode update
4251
4252Tools simplified since the Unicode 6.1 update. See
4253- https://icu.unicode.org/design/props/ppucd
4254- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
4255
4256* Unicode version numbers
4257- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
4258
4259* file preparation
4260- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
4261- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
4262- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
4263- Check test file diffs for previously commented-out, known-failing data lines;
4264  probably need to keep those commented out.
4265
4266* PropertyValueAliases.txt changes
4267- Script codes that are in ISO 15924 but not in Unicode are now listed in
4268  preparseucd.py, in the _scripts_only_in_iso15924 variable.
4269  If there are new ISO codes, then add them.
4270  If Unicode adds some of them, then remove them from the .py variable.
4271
4272* UnicodeData.txt changes
4273- No more manual changes for CJK ranges for algorithmic names;
4274  those are now written to ppucd.txt and genprops reads them from there.
4275
4276* generate core properties data files (makeprops.sh was deleted)
4277- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
4278
4279* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
4280- it is now generated by preparseucd.py
4281
4282* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
4283- it is now generated by preparseucd.py
4284- make sure that the Unicode data folder passed into preparseucd.py
4285  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
4286  (can be in some subfolder)
4287
4288* generate normalization data files
4289- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
4290- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
4291- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
4292- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
4293- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
4294- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
4295- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
4296
4297* build ICU (make install)
4298* build Unicode tools using CMake+make
4299
4300* new way to call genuca (makeuca.sh was deleted)
4301- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
4302
4303---------------------------------------------------------------------------- ***
4304
4305Unicode 6.1 update
4306
4307*** ICU Trac
4308
4309- ticket 8995 final update to Unicode 6.1
4310- ticket 8994 regenerate source/layout/CanonData.cpp
4311
4312- ticket 8961 support Unicode "Age" value *names*
4313- ticket 8963 support multiple character name aliases & types
4314
4315- ticket 8827 "update ICU to Unicode 6.1"
4316- C++ branches/markus/uni61 at r30864 from trunk at r30843
4317- Java branches/markus/uni61 at r30865 from trunk at r30863
4318
4319*** Unicode version numbers
4320- makedata.mak
4321- uchar.h
4322  (configure.in & configure: have been modified to extract the version from uchar.h)
4323- com.ibm.icu.util.VersionInfo
4324- icutools/unicode/makedefs.sh
4325  + also review & update other definitions in that file,
4326    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
4327
4328*** data files & enums & parser code
4329
4330* file preparation
4331
4332~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
4333- This prepares both unidata and testdata files in respective output subfolders.
4334- Check test file diffs for previously commented-out, known-failing data lines;
4335  probably need to keep those commented out.
4336
4337* PropertyValueAliases.txt changes
4338- 11 new block names:
4339  Arabic_Extended_A
4340  Arabic_Mathematical_Alphabetic_Symbols
4341  Chakma
4342  Meetei_Mayek_Extensions
4343  Meroitic_Cursive
4344  Meroitic_Hieroglyphs
4345  Miao
4346  Sharada
4347  Sora_Sompeng
4348  Sundanese_Supplement
4349  Takri
4350  -> add to uchar.h
4351  -> add to UCharacter.UnicodeBlock IDs
4352    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
4353            replace  public static final int \1_ID = \2; \3
4354  -> add to UCharacter.UnicodeBlock objects
4355    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
4356            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4357- 1 new Joining_Group (jg) value:
4358  Rohingya_Yeh
4359  -> uchar.h & UCharacter.JoiningGroup
4360- 2 new Line_Break (lb) values:
4361  CJ=Conditional_Japanese_Starter
4362  HL=Hebrew_Letter
4363  -> uchar.h & UCharacter.LineBreak
4364- 7 new scripts:
4365  sc ; Cakm      ; Chakma
4366  sc ; Merc      ; Meroitic_Cursive
4367  sc ; Mero      ; Meroitic_Hieroglyphs
4368  sc ; Plrd      ; Miao
4369  sc ; Shrd      ; Sharada
4370  sc ; Sora      ; Sora_Sompeng
4371  sc ; Takr      ; Takri
4372  -> remove these from SyntheticPropertyValueAliases.txt
4373  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
4374      and in com.ibm.icu.dev.test.lang.TestUScript.java
4375- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4376  (added 2011-06-21)
4377  Khoj        322     Khojki
4378  Tirh        326     Tirhuta
4379    and another one added 2011-12-09
4380  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
4381  -> uscript.h
4382  -> com.ibm.icu.lang.UScript
4383    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4384    replace  public static final int \1 = \2;\3
4385  -> SyntheticPropertyValueAliases.txt
4386  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4387      and in com.ibm.icu.dev.test.lang.TestUScript.java
4388
4389* UnicodeData.txt changes
4390- the last Unihan code point changes from U+9FCB to U+9FCC
4391  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
4392  + do change gennames.c
4393  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4394
4395* DerivedBidiClass.txt changes
4396- 2 new default-AL blocks:
4397#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
4398#     Arabic Mathematical Alphabetic Symbols:
4399#                       U+1EE00  - U+1EEFF  (was default-R)
4400- 2 new default-R blocks:
4401#     Meroitic Hieroglyphs:
4402#                        U+10980 - U+1099F
4403#     Meroitic Cursive:  U+109A0 - U+109FF
4404  -> should be picked up by the explicit data in the file
4405
4406* NameAliases.txt changes
4407- from
4408    # Each line has two fields
4409    # First field: Code point
4410    # Second field: Alias
4411- to
4412    # Each line has three fields, as described here:
4413    #
4414    # First field:  Code point
4415    # Second field: Alias
4416    # Third field:  Type
4417- Also, the file previously allowed multiple aliases but only now does it
4418  actually provide multiple, even multiple of the same type. For example,
4419    FEFF;BYTE ORDER MARK;alternate
4420    FEFF;BOM;abbreviation
4421    FEFF;ZWNBSP;abbreviation
4422- This breaks our gennames parser, unames.icu data structure, and API.
4423  Fix gennames to only pick up "correction" aliases.
4424  New ticket #8963 for further changes.
4425
4426* run genpname/preparse.pl (on Linux)
4427  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4428  + make sure that data.h is writable
4429  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4430  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4431
4432* build ICU (make install)
4433  so that the tools build can pick up the new definitions from the installed header files.
4434* build Unicode tools (at least genpname) using CMake+make
4435
4436* run genpname
4437  (builds both pnames.icu and propname_data.h)
4438- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4439- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
4440
4441* build ICU (make install)
4442* build Unicode tools using CMake+make
4443
4444* update source/data/unidata/norm2/nfkc_cf.txt
4445- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
4446
4447* update source/data/unidata/norm2/uts46.txt
4448- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
4449  to ~/svn.icu/tools/trunk/src/unicode/py
4450- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
4451- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
4452- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
4453
4454* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4455  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4456- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4457- Unicode 6.0..6.1: U+2260, U+226E, U+226F
4458- nothing new in 6.1, no test file to update
4459
4460* generate core properties data files
4461- in initial bootstrapping, change the UCA version
4462  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
4463- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4464- rebuild ICU & tools
4465  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
4466    check if the UCA version in FractionalUCA.txt matches the new Unicode version
4467    (see step above)
4468- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
4469  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4470- rebuild ICU & tools
4471
4472* update Java data files
4473- refresh just the UCD-related files, just to be safe
4474- see (ICU4C)/source/data/icu4j-readme.txt
4475- mkdir /tmp/icu4j
4476- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4477  output:
4478    ...
4479    Unicode .icu files built to ./out/build/icudt49l
4480    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
4481    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
4482    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4483    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
4484    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
4485    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
4486    mkdir -p /tmp/icu4j/main/shared/data
4487    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4488    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
4489    mkdir -p /tmp/icu4j/main/shared/data
4490    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
4491    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
4492- copy the big-endian Unicode data files to another location,
4493  separate from the other data files
4494    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4495    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
4496    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
4497    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
4498    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
4499    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4500    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
4501- refresh ICU4J
4502    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
4503
4504* refresh Java test .txt files
4505- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4506
4507* test ICU so far, fix test code where necessary
4508- temporarily ignore collation issues that look like UCA/UCD mismatches,
4509  until UCA data is updated
4510
4511* UCA
4512
4513- get output from Mark's tools; look in
4514    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
4515- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4516- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4517  (note removing the underscore before "Rules")
4518- update (ICU)/source/test/testdata/CollationTest_*.txt
4519  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4520  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
4521- check test file diffs for previously commented-out, known-failing data lines;
4522  probably need to keep those commented out
4523- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
4524- run makeuca.sh:
4525  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4526- rebuild ICU4C
4527- refresh ICU4J collation data:
4528  (subset of instructions above for properties data refresh, except copies all coll/*)
4529    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4530    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4531    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
4532    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
4533- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
4534- note on intltest: if collate/UCAConformanceTest fails, then
4535  utility/MultithreadTest/TestCollators will fail as well;
4536  fix the conformance test before looking into the multi-thread test
4537
4538* When refreshing all of ICU4J data from ICU4C
4539- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4540- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4541or
4542- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4543
4544*** LayoutEngine script information
4545
4546(For details see the Unicode 5.2 change log below.)
4547
4548* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
4549  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
4550  in the working directory.
4551  (It also generates ScriptRunData.cpp, which is no longer needed.)
4552
4553  The generated files have a current copyright date and "@draft" statement.
4554
4555- diff current <icu>/source/layout files vs. generated ones
4556    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
4557  review and manually merge desired changes;
4558  fix gratuitous changes, incorrect @draft and missing aliases;
4559  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4560- if you just copy the above files, then
4561  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
4562  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4563
4564*** merge the Unicode update branches back onto the trunk
4565- do not merge the icudata.jar and testdata.jar,
4566  instead rebuild them from merged & tested ICU4C
4567
4568---------------------------------------------------------------------------- ***
4569
4570ICU 4.8 (no Unicode update, just new script codes)
4571
4572* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4573  (added 2010-12-21)
4574    Afak    439     Afaka
4575    Jurc    510     Jurchen
4576    Mroo    199     Mro, Mru
4577    Nshu    499     Nüshu
4578    Shrd    319     Sharada, Śāradā
4579    Sora    398     Sora Sompeng
4580    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
4581    Tang    520     Tangut
4582    Wole    480     Woleai
4583  -> uscript.h
4584  -> com.ibm.icu.lang.UScript
4585    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4586    replace  public static final int \1 = \2;\3
4587  -> genpname/SyntheticPropertyValueAliases.txt
4588  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4589      and in com.ibm.icu.dev.test.lang.TestUScript.java
4590
4591* run genpname/preparse.pl (on Linux)
4592  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4593  + make sure that data.h is writable
4594  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4595  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4596
4597* rebuild Unicode tools (at least genpname) using make
4598- You might first need to "make install" ICU so that the tools build can pick
4599  up the new definitions from the installed header files.
4600
4601* run genpname
4602  (builds both pnames.icu and propname_data.h)
4603- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4604- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
4605- rebuild ICU & tools
4606
4607* run genprops
4608- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
4609- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
4610- rebuild ICU & tools
4611
4612* update Java data files
4613- refresh just the UCD-related files, just to be safe
4614- see (ICU4C)/source/data/icu4j-readme.txt
4615- mkdir /tmp/icu4j
4616- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4617- copy the big-endian Unicode data files to another location,
4618  separate from the other data files
4619    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4620    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4621    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
4622- refresh ICU4J
4623    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
4624
4625* should have updated the layout engine script codes but forgot
4626
4627---------------------------------------------------------------------------- ***
4628
4629Unicode 6.0 update
4630
4631*** related ICU Trac tickets
4632
46337264 Unicode 6.0 Update
4634
4635*** Unicode version numbers
4636- makedata.mak
4637- uchar.h
4638  (configure.in & configure: have been modified to extract the version from uchar.h)
4639- com.ibm.icu.util.VersionInfo
4640
4641*** data files & enums & parser code
4642
4643* file preparation
4644
4645~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
4646- This now prepares both unidata and testdata files in respective output subfolders.
4647
4648* PropertyAliases.txt changes
4649- new Script_Extensions property defined in the new ScriptExtensions.txt file
4650  but not listed in PropertyAliases.txt; reported to unicode.org;
4651  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
4652    scx; Script_Extensions
4653  -> uchar.h with new UProperty section
4654  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
4655
4656* PropertyValueAliases.txt changes
4657- 12 new block names:
4658  Alchemical_Symbols
4659  Bamum_Supplement
4660  Batak
4661  Brahmi
4662  CJK_Unified_Ideographs_Extension_D
4663  Emoticons
4664  Ethiopic_Extended_A
4665  Kana_Supplement
4666  Mandaic
4667  Miscellaneous_Symbols_And_Pictographs
4668  Playing_Cards
4669  Transport_And_Map_Symbols
4670  -> add to uchar.h
4671  -> add to UCharacter.UnicodeBlock
4672    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
4673            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4674- Joining_Group (jg) values:
4675  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
4676  -> uchar.h & UCharacter.JoiningGroup
4677- 3 new scripts:
4678  sc ; Batk      ; Batak
4679  sc ; Brah      ; Brahmi
4680  sc ; Mand      ; Mandaic
4681  -> remove these from SyntheticPropertyValueAliases.txt
4682  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
4683  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
4684      and in com.ibm.icu.dev.test.lang.TestUScript.java
4685- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
4686  (added 2009-11-11..2010-07-18)
4687  Bass        259     Bassa Vah
4688  Dupl        755     Duployan shortand
4689  Elba        226     Elbasan
4690  Gran        343     Grantha
4691  Kpel        436     Kpelle
4692  Loma        437     Loma
4693  Mend        438     Mende
4694  Merc        101     Meroitic Cursive
4695  Narb        106     Old North Arabian
4696  Nbat        159     Nabataean
4697  Palm        126     Palmyrene
4698  Sind        318     Sindhi
4699  Wara        262     Warang Citi
4700  -> uscript.h
4701  -> com.ibm.icu.lang.UScript
4702    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
4703    replace  public static final int \1 = \2;\3
4704  -> SyntheticPropertyValueAliases.txt
4705  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
4706      and in com.ibm.icu.dev.test.lang.TestUScript.java
4707- ISO 15924 name change
4708  Mero        100     Meroitic Hieroglyphs (was Meroitic)
4709  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
4710- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
4711
4712* UnicodeData.txt changes
4713- new CJK block:
4714  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
4715  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
4716  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
4717
4718* build Unicode tools using CMake+make
4719
4720* run genpname/preparse.pl (on Linux)
4721  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
4722  + make sure that data.h is writable
4723  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
4724  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
4725
4726* rebuild Unicode tools (at least genpname) using make
4727- You might first need to "make install" ICU so that the tools build can pick
4728  up the new definitions from the installed header files.
4729
4730* run genpname
4731- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
4732- rebuild ICU & tools
4733
4734* update source/data/unidata/norm2/nfkc_cf.txt
4735- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
4736
4737* update source/data/unidata/norm2/uts46.txt
4738- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
4739  to ~/svn.icu/tools/trunk/src/unicode/py
4740- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
4741- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
4742- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
4743
4744* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
4745  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
4746- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
4747- Unicode 6.0: U+2260, U+226E, U+226F
4748
4749* generate core properties data files
4750- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4751- rebuild ICU & tools
4752- run makeuca.sh so that genuca picks up the new nfc.nrm:
4753  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4754- rebuild ICU & tools
4755
4756* implement new Script_Extensions property (provisional)
4757- parser & generator: genprops & uprops.icu
4758- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
4759- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
4760
4761* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
4762- (one-time change)
4763- genbidi/gencase/genprops tools changes
4764- re-run makeprops.sh (see above)
4765- UCharacterProperty.java, UCharacterTypeIterator.java,
4766  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
4767  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
4768
4769* update Java data files
4770- refresh just the UCD-related files, just to be safe
4771- see (ICU4C)/source/data/icu4j-readme.txt
4772- mkdir /tmp/icu4j
4773- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4774  output:
4775    ...
4776    Unicode .icu files built to ./out/build/icudt45l
4777    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
4778    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
4779    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
4780    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
4781    mkdir -p /tmp/icu4j/main/shared/data
4782    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
4783- copy the big-endian Unicode data files to another location,
4784  separate from the other data files
4785    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4786    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
4787    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
4788    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
4789    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
4790    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4791    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
4792- refresh ICU4J
4793    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4794
4795* refresh Java test .txt files
4796- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4797
4798* un-hardcode normalization skippable (NF*_Inert) test data
4799- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
4800
4801* copy updated break iterator test files
4802- now handled by early ucdcopy.py and
4803  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
4804  (old instructions:
4805   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
4806   to ~/svn.icu/trunk/src/source/test/testdata)
4807- they are not used in ICU4J
4808
4809* UCA
4810
4811- get output from Mark's tools; look in
4812    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
4813    http://www.macchiato.com/unicode/utc/additional-uca-files
4814    http://www.unicode.org/Public/UCA/6.0.0/
4815    http://www.unicode.org/~mdavis/uca/
4816- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4817- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4818- update Han-implicit ranges for new CJK extensions:
4819  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4820- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
4821  do not add it into invuca so that tailoring primary-after an ignorable works
4822- genuca: permit space between [variable top] bytes
4823- ucol.cpp: treat noncharacters like unassigned rather than ignorable
4824- run makeuca.sh:
4825  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4826- rebuild ICU4C
4827- refresh ICU4J collation data:
4828  (subset of instructions above for properties data refresh, except copies all coll/*)
4829    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4830    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4831    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4832    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4833- update (ICU)/source/test/testdata/CollationTest_*.txt
4834  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4835  with output from Mark's Unicode tools
4836- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4837- note on intltest: if collate/UCAConformanceTest fails, then
4838  utility/MultithreadTest/TestCollators will fail as well;
4839  fix the conformance test before looking into the multi-thread test
4840
4841* When refreshing all of ICU4J data from ICU4C
4842- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4843- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4844or
4845- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4846
4847*** LayoutEngine script information
4848
4849(For details see the Unicode 5.2 change log below.)
4850
4851* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4852ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4853ScriptRunData.cpp, which is no longer needed.)
4854
4855The generated files have a current copyright date and "@draft" statement.
4856
4857* copy the above files into <icu>/source/layout, replacing the old files.
4858* fix mixed line endings
4859* review the diffs and fix incorrect @draft and missing aliases;
4860  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4861* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4862
4863---------------------------------------------------------------------------- ***
4864
4865Unicode 5.2 update
4866
4867*** related ICU Trac tickets
4868
48697084 Unicode 5.2
4870
48717167 verify collation bytes
48727235 Java test NAME_ALIAS
48737236 Java DerivedCoreProperties.txt test
48747237 Java BidiTest.txt
48757238 UTrie2 in core unidata
48767239 test for tailoring gaps
48777240 Java fix CollationMiscTest
48787243 update layout engine for Unicode 5.2
4879
4880*** Unicode version numbers
4881- makedata.mak
4882- uchar.h
4883- configure.in & configure
4884- update ucdVersion in gennames.c if an algorithmic range changes
4885
4886*** data files & enums & parser code
4887
4888* file preparation
4889
4890python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
4891- includes finding files regardless of version numbers,
4892  copying them, and performing the equivalent processing of the
4893  ucdstrip and ucdmerge tools on the desired set of files
4894
4895* notes on changes
4896- PropertyAliases.txt
4897  moved from numeric to enumerated:
4898    ccc       ; Canonical_Combining_Class
4899  new string properties:
4900    NFKC_CF   ; NFKC_Casefold
4901    Name_Alias; Name_Alias
4902  new binary properties:
4903    Cased     ; Cased
4904    CI        ; Case_Ignorable
4905    CWCF      ; Changes_When_Casefolded
4906    CWCM      ; Changes_When_Casemapped
4907    CWKCF     ; Changes_When_NFKC_Casefolded
4908    CWL       ; Changes_When_Lowercased
4909    CWT       ; Changes_When_Titlecased
4910    CWU       ; Changes_When_Uppercased
4911  new CJK Unihan properties (not supported by ICU)
4912- PropertyValueAliases.txt
4913  new block names
4914  new scripts
4915  one script code change:
4916    sc ; Qaai      ; Inherited
4917    ->
4918    sc ; Zinh      ; Inherited                        ; Qaai
4919  new Line_Break (lb) value:
4920    lb ; CP        ; Close_Parenthesis
4921  new Joining_Group (jg) values: Farsi_Yeh, Nya
4922  other new values:
4923    ccc; 214; ATA  ; Attached_Above
4924- DerivedBidiClass.txt
4925  new default-R range: U+1E800 - U+1EFFF
4926- UnicodeData.txt
4927  all of the ISO comments are gone
4928  new CJK block end:
4929    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
4930  new CJK block:
4931    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
4932    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
4933
4934* genpname
4935- run preparse.pl
4936  + cd \svn\icuproj\icu\trunk\source\tools\genpname
4937  + make sure that data.h is writable
4938  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
4939  + preparse.pl complains with errors like the following:
4940      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
4941    This is because ICU 4.0 had scripts from ISO 15924 which are now
4942    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
4943    and PropertyValueAliases.txt.
4944    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4945       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
4946  + preparse.pl complains with errors about block names missing from uchar.h; add them
4947
4948* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4949- new block & script values
4950  + 26 new blocks
4951    copy new blocks from Blocks.txt
4952    MS VC++ 2008 regular expression:
4953      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
4954      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
4955  + several new script values already added in ICU 4.0 for ISO 15924 coverage
4956    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
4957  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
4958  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
4959    (added to SyntheticPropertyValueAliases.txt)
4960- new Joining Group (JG) values: Farsi_Yeh, Nya
4961- new Line_Break (lb) value:
4962    lb ; CP        ; Close_Parenthesis
4963
4964* hardcoded Unihan range end/limit
4965- Unihan range end moves from 9FC3 to 9FCB
4966  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
4967  + do change gennames.c
4968
4969* Compare definitions of new binary properties with what we used to use
4970  in algorithms, to see if the definitions changed.
4971- Verified that definitions for Cased and Case_Ignorable are unchanged.
4972  The gencase tool now parses the newly public Case_Ignorable values
4973  in case the definition changes in the future.
4974
4975* uchar.c & uprops.h & uprops.c & genprops
4976- new numeric values that didn't exist in Unicode data before:
4977    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
4978  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
4979  therefore redesign the encoding of numeric types and values for formatVersion 6;
4980  design for simple numbers up to at least 144 ("one gross"),
4981  large values up to at least 10^20,
4982  and fractions with numerators -1..17 and denominators 1..16
4983  to cover current and expected future values
4984  (e.g., more Han numeric values, Meroitic twelfths)
4985
4986* reimplement Hangul_Syllable_Type for new Jamo characters
4987- the old code assumed that all Jamo characters are in the 11xx block
4988- Unicode 5.2 fills holes there and adds new Jamo characters in
4989    A960..A97F; Hangul Jamo Extended-A
4990  and in
4991    D7B0..D7FF; Hangul Jamo Extended-B
4992- Hangul_Syllable_Type can be trivially derived from a subset of
4993  Grapheme_Cluster_Break values
4994
4995* build Unicode data source code for hardcoding core data
4996C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
4997
4998ICU data make path is \svn\icuproj\icu\trunk\source\data\
4999ICU root path is \svn\icuproj\icu\trunk
5000Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
5001Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
5002Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
5003Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
5004Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
5005Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
5006Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
5007Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
5008Creating data file for Unicode Property Names
5009Creating data file for Unicode Character Properties
5010Creating data file for Unicode Case Mapping Properties
5011Creating data file for Unicode BiDi/Shaping Properties
5012Creating data file for Unicode Normalization
5013Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
5014Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
5015
5016- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
5017  and rebuild the common library
5018
5019*** UCA
5020
5021- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
5022- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
5023- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
5024[ Begin obsolete instructions:
5025  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
5026    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
5027      on Windows:
5028        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
5029        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
5030  End obsolete instructions]
5031- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
5032  not just the *_STUB.txt files
5033- note on intltest: if collate/UCAConformanceTest fails, then
5034  utility/MultithreadTest/TestCollators will fail as well;
5035  fix the conformance test before looking into the multi-thread test
5036
5037*** Implement Cased & Case_Ignorable properties
5038- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
5039- Problem: These properties should be disjoint, but aren't
5040- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
5041- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
5042
5043*** Implement Changes_When_Xyz properties
5044- without stored data
5045
5046*** Implement Name_Alias property
5047- add it as another name field in unames.icu
5048- make it available via u_charName() and UCharNameChoice and
5049- consider it in u_charFromName()
5050
5051*** Break iterators
5052
5053* Update break iterator rules to new UAX versions and new property values
5054* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
5055
5056*** new BidiTest file
5057- review format and data
5058- copy BidiTest.txt to source/test/testdata
5059- write test code using this data
5060- fix ICU code where it fails the conformance test
5061
5062*** Java
5063- generally, find and update code corresponding to C/C++
5064- UCharacter.UnicodeBlock constants:
5065  a) add an _ID integer per new block, update COUNT
5066  b) add a class instance per new block
5067     Visual Studio regex:
5068        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
5069        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
5070- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
5071
5072- port test changes to Java
5073
5074*** LayoutEngine script information
5075
5076(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
5077
5078* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
5079ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
5080ScriptRunData.cpp, which is no longer needed.)
5081
5082The generated files have a current copyright date and "@draft" statement.
5083
5084-> Eric Mader wrote in email on 20090930:
5085    "I think the tool has been modified to update @draft to @stable for
5086     older scripts and to add @draft for new scripts.
5087     (I worked with an intern on this last year.)
5088     You should check the output after you run it."
5089
5090* copy the above files into <icu>/source/layout, replacing the old files.
5091* fix mixed line endings
5092* review the diffs and fix incorrect @draft and missing aliases
5093* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
5094
5095Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
5096and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
5097
5098-> Eric Mader wrote in email on 20090930:
5099    "This is just a matter of making sure that all the per-script tables have
5100     entries for any new scripts that were added.
5101     If any new Indic characters were added, then the class tables in
5102     IndicClassTables.cpp should be updated to reflect this.
5103     John Emmons should know how to do this if it's required."
5104
5105* rebuild the layout and layoutex libraries.
5106
5107*** Documentation
5108- Update User Guide
5109  + Jamo_Short_Name, sfc->scf, binary property value aliases
5110
5111---------------------------------------------------------------------------- ***
5112
5113Unicode 5.1 update
5114
5115*** related ICU Trac tickets
5116
51175696 Update to Unicode 5.1
5118
5119*** Unicode version numbers
5120- makedata.mak
5121- uchar.h
5122- configure.in & configure
5123- update ucdVersion in gennames.c if an algorithmic range changes
5124
5125*** data files & enums & parser code
5126
5127* file preparation
5128- ucdstrip:
5129    DerivedCoreProperties.txt
5130    DerivedNormalizationProps.txt
5131    NormalizationTest.txt
5132    PropList.txt
5133    Scripts.txt
5134    GraphemeBreakProperty.txt
5135    SentenceBreakProperty.txt
5136    WordBreakProperty.txt
5137- ucdstrip and ucdmerge:
5138    EastAsianWidth.txt
5139    LineBreak.txt
5140
5141* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
5142copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
5143copy 5.1.0\ucd\Blocks.txt ..\unidata\
5144copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
5145copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
5146copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
5147copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
5148copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
5149copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
5150copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
5151copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
5152copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
5153copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
5154copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
5155
5156ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
5157ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
5158ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
5159ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
5160ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
5161ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
5162ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
5163ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
5164ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
5165ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
5166
5167* genpname
5168- run preparse.pl
5169  + cd \svn\icuproj\icu\uni51\source\tools\genpname
5170  + make sure that data.h is writable
5171  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
5172  + preparse.pl complains with errors like the following:
5173      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
5174    This is because ICU 3.8 had scripts from ISO 15924 which are now
5175    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
5176    and PropertyValueAliases.txt.
5177    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
5178       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
5179  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
5180      N/Y, No/Yes, F/T, False/True
5181    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
5182       It will use further values from the file if present.
5183
5184* uchar.h & uscript.h & uprops.h & uprops.c & genprops
5185- new block & script values
5186  + 17 new blocks
5187  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
5188    (removed from SyntheticPropertyValueAliases.txt)
5189  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
5190    (added to SyntheticPropertyValueAliases.txt)
5191- uprops.icu (uprops.h) only provides 7 bits for script codes.
5192  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
5193  There is none above 127 yet which is the script code for an
5194  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
5195  script code values greater than 127.
5196  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
5197  in a parallel bit field, and that overflows now.
5198  Also, future values >=128 would be incompatible anyway.
5199  uprops.h is modified to move around several of the bit fields
5200  in the properties vector words, and now uses 8 bits for the script code.
5201  Two other bit fields also grow to accommodate future growth:
5202  Block (current count: 172) grows from 8 to 9 bits,
5203  and Word_Break grows from 4 to 5 bits.
5204- renamed property Simple_Case_Folding (sfc->scf)
5205  + nothing to be done: handled as normal alias
5206- new property JSN Jamo_Short_Name
5207  + no new API: only contributes to the Name property
5208- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
5209- new Joining Group (JG) value: Burushashki_Yeh_Barree
5210- new Sentence_Break (SB) values:
5211    SB ; CR        ; CR
5212    SB ; EX        ; Extend
5213    SB ; LF        ; LF
5214    SB ; SC        ; SContinue
5215- new Word_Break (WB) values:
5216    WB ; CR        ; CR
5217    WB ; Extend    ; Extend
5218    WB ; LF        ; LF
5219    WB ; MB        ; MidNumLet
5220
5221* Further changes in the 2008-02-29 update:
5222- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
5223  because they should not normally be invisible.
5224- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
5225- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
5226- new Word_Break (WB) value: NL=Newline
5227
5228* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
5229- Unihan range end moves from 9FBB to 9FC3
5230  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
5231  + do change gennames.c
5232
5233* build Unicode data source code for hardcoding core data
5234C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
5235
5236ICU data make path is \svn\icuproj\icu\uni51\source\data\
5237ICU root path is \svn\icuproj\icu\uni51
5238Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
5239Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
5240Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
5241Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
5242Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
5243Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
5244Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
5245Creating data file for Unicode Character Properties
5246Creating data file for Unicode Case Mapping Properties
5247Creating data file for Unicode BiDi/Shaping Properties
5248Creating data file for Unicode Normalization
5249Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
5250Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
5251
5252- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
5253  and rebuild the common library
5254
5255*** Break iterators
5256
5257* Update break iterator rules to new UAX versions and new property values
5258
5259*** UCA
5260
5261* update FractionalUCA.txt and UCARules.txt with new canonical closure
5262
5263*** Test suites
5264- Test that APIs using Unicode property value aliases (like UnicodeSet)
5265  support all of the boolean values N/Y, No/Yes, F/T, False/True
5266  -> TestBinaryValues() tests in both cintltst and intltest
5267
5268*** LayoutEngine script information
5269* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
5270ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
5271ScriptRunData.cpp, which is no longer needed.)
5272
5273The generated files have a current copyright date and "@draft" statement.
5274
5275* copy the above files into <icu>/source/layout, replacing the old files.
5276
5277Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
5278and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
5279
5280* rebuild the layout and layoutex libraries.
5281
5282*** Documentation
5283- Update User Guide
5284  + Jamo_Short_Name, sfc->scf, binary property value aliases
5285
5286---------------------------------------------------------------------------- ***
5287
5288Unicode 5.0 update
5289
5290*** related Jitterbugs
5291
52925084 RFE: Update to Unicode 5.0
5293
5294*** data files & enums & parser code
5295
5296* file preparation
5297- ucdstrip:
5298    DerivedCoreProperties.txt
5299    DerivedNormalizationProps.txt
5300    NormalizationTest.txt
5301    PropList.txt
5302    Scripts.txt
5303    GraphemeBreakProperty.txt
5304    SentenceBreakProperty.txt
5305    WordBreakProperty.txt
5306- ucdstrip and ucdmerge:
5307    EastAsianWidth.txt
5308    LineBreak.txt
5309
5310* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
5311copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
5312copy 5.0.0\ucd\Blocks.txt ..\unidata\
5313copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
5314copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
5315copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
5316copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
5317copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
5318copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
5319copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
5320copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
5321copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
5322copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
5323copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
5324
5325ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
5326ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
5327ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
5328ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
5329ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
5330ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
5331ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
5332ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
5333ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
5334ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
5335
5336* update FractionalUCA.txt and UCARules.txt with new canonical closure
5337
5338* genpname
5339- run preparse.pl
5340  + make sure that data.h is writable
5341  + perl preparse.pl \cvs\oss\icu > out.txt
5342
5343* uchar.h & uscript.h & uprops.h & uprops.c & genprops
5344- new block & script values
5345  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
5346
5347* build Unicode data source code for hardcoding core data
5348C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
5349
5350ICU data make path is \cvs\oss\icu\source\data\
5351ICU root path is \cvs\oss\icu
5352Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
5353[etc.]
5354Creating data file for Unicode Character Properties
5355Creating data file for Unicode Case Mapping Properties
5356Creating data file for Unicode BiDi/Shaping Properties
5357Creating data file for Unicode Normalization
5358Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
5359Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
5360
5361- copy the .c source files to C:\cvs\oss\icu\source\common
5362  and rebuild the common library
5363
5364*** Unicode version numbers
5365- makedata.mak
5366- uchar.h
5367- configure.in
5368
5369*** LayoutEngine script information
5370* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
5371ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
5372ScriptRunData.cpp, which is no longer needed.)
5373
5374The generated files have a current copyright date and "@draft" statement.
5375
5376* copy the above files into <icu>/source/layout, replacing the old files.
5377
5378Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
5379and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
5380
5381* rebuild the layout and layoutex libraries.
5382
5383---------------------------------------------------------------------------- ***
5384
5385Unicode 4.1 update
5386
5387*** related Jitterbugs
5388
53894332 RFE: Update to Unicode 4.1
53904157 RBBI, TR29 4.1 updates
5391
5392*** data files & enums & parser code
5393
5394* file preparation
5395- ucdstrip:
5396    DerivedCoreProperties.txt
5397    DerivedNormalizationProps.txt
5398    NormalizationTest.txt
5399    GraphemeBreakProperty.txt
5400    SentenceBreakProperty.txt
5401    WordBreakProperty.txt
5402- ucdstrip and ucdmerge:
5403    EastAsianWidth.txt
5404    LineBreak.txt
5405
5406* add new files to the repository
5407    GraphemeBreakProperty.txt
5408    SentenceBreakProperty.txt
5409    WordBreakProperty.txt
5410
5411* update FractionalUCA.txt and UCARules.txt with new canonical closure
5412
5413* genpname
5414- handle new enumerated properties in sub read_uchar
5415- run preparse.pl
5416
5417* uchar.h & uscript.h & uprops.h & uprops.c & genprops
5418- new binary properties
5419  + Pattern_Syntax
5420  + Pattern_White_Space
5421- new enumerated properties
5422  + Grapheme_Cluster_Break
5423  + Sentence_Break
5424  + Word_Break
5425- new block & script & line break values
5426
5427* gencase
5428- case-ignorable changes
5429  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
5430  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
5431
5432*** Unicode version numbers
5433- makedata.mak
5434- uchar.h
5435- configure.in
5436
5437*** tests
5438- verify that u_charMirror() round-trips
5439- test all new properties and some new values of old properties
5440
5441*** other code
5442
5443* hardcoded Unihan range end/limit
5444- Unihan range end moves from 9FA5 to 9FBB
5445  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
5446  + do not modify BOCU/BOCSU code because that would change the encoding
5447    and break binary compatibility!
5448  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
5449    NamePrepProfile.txt
5450  + ignore trietest.c: test data is arbitrary
5451  + ignore tstnorm.cpp: test optimization, not important
5452  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
5453  + do change line_th.txt and word_th.txt
5454    by replacing hardcoded ranges with the new property values
5455  + do change gennames.c
5456
5457source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
5458source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
5459source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
5460
5461* case mappings
5462- compare new special casing context conditions with previous ones
5463  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
5464
5465* genpname
5466- consider storing only the short name if it is the same as the long name
5467
5468*** other reviews
5469- UAX #29 changes (grapheme/word/sentence breaks)
5470- UAX #14 changes (line breaks)
5471- Pattern_Syntax & Pattern_White_Space
5472
5473---------------------------------------------------------------------------- ***
5474
5475Unicode 4.0.1 update
5476
5477*** related Jitterbugs
5478
54793170 RFE: Update to Unicode 4.0.1
54803171 Add new Unicode 4.0.1 properties
54813520 use Unicode 4.0.1 updates for break iteration
5482
5483*** data files & enums & parser code
5484
5485* file preparation
5486- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
5487- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
5488
5489* file fixes
5490- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
5491  according to PRI #26
5492  http://www.unicode.org/review/resolved-pri.html#pri26
5493- undone again because no corrigendum in sight;
5494  instead modified tests to not check consistency on this for Unicode 4.0.1
5495
5496* ucdterms.txt
5497- update from http://www.unicode.org/copyright.html
5498  formatted for plain text
5499
5500* uchar.h & uprops.h & uprops.c & genprops
5501- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
5502- add U_LB_INSEPARABLE due to a spelling fix
5503  + put short name comment only on line with new constant
5504    for genpname perl script parser
5505- new binary properties
5506  + STerm
5507  + Variation_Selector
5508
5509* genpname
5510- fix genpname perl script so that it doesn't choke on more than 2 names per property value
5511- perl script: correctly calculate the maximum number of fields per row
5512
5513* uscript.h
5514- new script code Hrkt=Katakana_Or_Hiragana
5515
5516* gennorm.c track changes in DerivedNormalizationProps.txt
5517- "FNC" -> "FC_NFKC"
5518- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
5519
5520* genprops/props2.c track changes in DerivedNumericValues.txt
5521- changed from 3 columns to 2, dropping the numeric type
5522  + assume that the type is always numeric for Han characters,
5523    and that only those are added in addition to what UnicodeData.txt lists
5524
5525*** Unicode version numbers
5526- makedata.mak
5527- uchar.h
5528- configure.in
5529
5530*** tests
5531- update test of default bidi classes according to PRI #28
5532  /tsutil/cucdtst/TestUnicodeData
5533  http://www.unicode.org/review/resolved-pri.html#pri28
5534- bidi tests: change exemplar character for ES depending on Unicode version
5535- change hardcoded expected property values where they change
5536
5537*** other code
5538
5539* name matching
5540- read UCD.html
5541
5542* scripts
5543- use new Hrkt=Katakana_Or_Hiragana
5544
5545* ZWJ & ZWNJ
5546- are now part of combining character sequences
5547- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
5548