• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2016 and later: Unicode, Inc. and others.
2* License & terms of use: http://www.unicode.org/copyright.html
3* Copyright (C) 2004-2016, International Business Machines
4* Corporation and others.  All Rights Reserved.
5*
6*   file name:  changes.txt
7*   encoding:   US-ASCII
8*   tab size:   8 (not used)
9*   indentation:4
10*
11*   created on: 2004may06
12*   created by: Markus W. Scherer
13*
14* change log for Unicode updates
15*
16* For each new Unicode version, during the beta period,
17* I copy the change log for the previous version to the top of this file.
18* I adjust the versions, tickets, URLs, and paths.
19* I work my way through the steps listed in the log, top to bottom,
20* adjusting the log as necessary.
21* I report problems to the UTC and/or CLDR and/or ICU.
22* Before the data is final, I "turn the crank" several more times,
23* using appropriate subsets of the steps.
24
25---------------------------------------------------------------------------- ***
26
27* New ISO 15924 script codes
28
29Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
30until they are encoded in Unicode,
31or can be assumed to be encoded in the next Unicode version.
32Script enum constant names want to follow the Unicode script property value aliases,
33which are assigned only when the scripts are encoded.
34When we encode scripts early and guess wrong, then we have confusing enum constants
35and have sometimes added aliases.
36
37Variant script codes like Latf and Aran that are not subject to separate encoding
38can be added at any time.
39(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
40
41We add script codes used in CLDR or in the spoof checker.
42This includes combination/alias codes like Hanb and Jamo.
43See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
44and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
45
46We add special Z* script codes like Zsye.
47
48For new script codes see http://www.unicode.org/iso15924/codechanges.html
49
50---------------------------------------------------------------------------- ***
51
52Unicode 13.0 update for ICU 66
53
54https://www.unicode.org/versions/Unicode13.0.0/
55https://www.unicode.org/versions/beta-13.0.0.html
56https://www.unicode.org/Public/13.0.0/ucd/
57https://www.unicode.org/reports/uax-proposed-updates.html
58https://www.unicode.org/reports/tr44/tr44-25.html
59
60https://unicode-org.atlassian.net/browse/CLDR-13387
61https://unicode-org.atlassian.net/browse/ICU-20893
62
63* Command-line environment setup
64
65UNICODE_DATA=~/unidata/uni13/20200212
66CLDR_SRC=~/cldr/uni/src
67ICU_ROOT=~/icu/uni
68ICU_SRC=$ICU_ROOT/src
69ICUDT=icudt66b
70ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
71ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
72export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
73
74*** Unicode version numbers
75- makedata.mak
76- uchar.h
77- com.ibm.icu.util.VersionInfo
78- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
79
80- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
81    so that the makefiles see the new version number.
82  cd $ICU_ROOT/dbg/icu4c
83  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
84
85*** data files & enums & parser code
86
87* download files
88- mkdir -p $UNICODE_DATA
89- download Unicode files into $UNICODE_DATA
90  + subfolders: emoji, idna, security, ucd, uca
91  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
92  + split Unihan into single-property files
93    ~/unitools/trunk/src$ py/splitunihan.py $UNICODE_DATA/ucd/Unihan
94  + get GraphemeBreakTest-cldr.txt from $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt
95    or from the ucd/cldr/ output folder of the Unicode Tools:
96    Since Unicode 12/CLDR 35/ICU 64 CLDR uses modified break rules.
97  cp $CLDR_SRC/common/properties/segments/GraphemeBreakTest.txt icu4c/source/test/testdata
98
99* for manual diffs and for Unicode Tools input data updates:
100  remove version suffixes from the file names
101    ~$ unidata/desuffixucd.py $UNICODE_DATA
102  (see https://sites.google.com/site/unicodetools/inputdata)
103
104* process and/or copy files
105- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
106  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
107  + For debugging, and tweaking how ppucd.txt is written,
108    the tool has an --only_ppucd option:
109    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
110
111- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
112
113* new constants for new property values
114- preparseucd.py error:
115    ValueError: missing uchar.h enum constants for some property values:
116    [(u'blk', set([u'Symbols_For_Legacy_Computing', u'Dives_Akuru', u'Yezidi',
117        u'Tangut_Sup', u'CJK_Ext_G', u'Khitan_Small_Script', u'Chorasmian', u'Lisu_Sup'])),
118    (u'sc', set([u'Chrs', u'Diak', u'Kits', u'Yezi'])),
119    (u'InPC', set([u'Top_And_Bottom_And_Left']))]
120  = PropertyValueAliases.txt new property values (diff old & new .txt files)
121    blk; Chorasmian                       ; Chorasmian
122    blk; CJK_Ext_G                        ; CJK_Unified_Ideographs_Extension_G
123    blk; Dives_Akuru                      ; Dives_Akuru
124    blk; Khitan_Small_Script              ; Khitan_Small_Script
125    blk; Lisu_Sup                         ; Lisu_Supplement
126    blk; Symbols_For_Legacy_Computing     ; Symbols_For_Legacy_Computing
127    blk; Tangut_Sup                       ; Tangut_Supplement
128    blk; Yezidi                           ; Yezidi
129  -> add to uchar.h before UBLOCK_COUNT
130    use long property names for enum constants,
131    for the trailing comment get the block start code point: diff old & new Blocks.txt
132  -> add to UCharacter.UnicodeBlock IDs
133    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
134            replace  public static final int \1_ID = \2; \3
135  -> add to UCharacter.UnicodeBlock objects
136    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
137            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
138
139    sc ; Chrs                             ; Chorasmian
140    sc ; Diak                             ; Dives_Akuru
141    sc ; Kits                             ; Khitan_Small_Script
142    sc ; Yezi                             ; Yezidi
143  -> uscript.h & com.ibm.icu.lang.UScript
144  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
145      and in com.ibm.icu.dev.test.lang.TestUScript.java
146
147    InPC; Top_And_Bottom_And_Left         ; Top_And_Bottom_And_Left
148  -> uchar.h enum UIndicPositionalCategory & UCharacter.java IndicPositionalCategory
149
150* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
151    (not strictly necessary for NOT_ENCODED scripts)
152  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
153
154* build ICU (make install)
155  to make sure that there are no syntax errors, and
156  so that the tools build can pick up the new definitions from the installed header files.
157
158  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
159
160* update spoof checker UnicodeSet initializers:
161    inclusionPat & recommendedPat in i18n/uspoof.cpp
162    INCLUSION & RECOMMENDED in SpoofChecker.java
163- make sure that the Unicode Tools tree contains the latest security data files
164- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
165- update the hardcoded version number there in the DIRECTORY path
166- run the tool (no special environment variables needed)
167- copy & paste from the Console output into the .cpp & .java files
168
169* generate normalization data files
170  cd $ICU_ROOT/dbg/icu4c
171  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
172  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
173  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
174  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
175  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
176
177* build ICU (make install)
178  so that the tools build can pick up the new definitions from the installed header files.
179
180  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
181
182* build Unicode tools using CMake+make
183
184$ICU_SRC/tools/unicode/c/icudefs.txt:
185
186# Location (--prefix) of where ICU was installed.
187set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
188# Location of the ICU4C source tree.
189set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
190
191  $ICU_ROOT/dbg$
192    mkdir -p tools/unicode/c
193    cd tools/unicode/c
194
195  $ICU_ROOT/dbg/tools/unicode/c$
196    cmake ../../../../src/tools/unicode/c
197    make
198
199* generate core properties data files
200  $ICU_ROOT/dbg/tools/unicode/c$
201    genprops/genprops $ICU_SRC/icu4c
202- tool failure:
203    genprops: Script_Extensions indexes overflow bit field
204    genprops: error parsing or setting values from ppucd.txt line 32696 - U_BUFFER_OVERFLOW_ERROR
205  -> uprops.icu data file format :
206     add two more bits to store a script code or Script_Extensions index
207  -> generator code, C++ & Java runtime, uprops.icu format version 7.7
208- rebuild ICU (make install) & tools
209
210* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
211  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
212- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
213- Unicode 6.0..13.0: U+2260, U+226E, U+226F
214- nothing new in this Unicode version, no test file to update
215
216* run & fix ICU4C tests
217- fix Unicode Tools class Segmenter to generate correct *BreakTest.txt files
218- Andy helps with RBBI & spoof check test failures
219
220* collation: CLDR collation root, UCA DUCET
221
222- UCA DUCET goes into Mark's Unicode tools, see
223    https://sites.google.com/site/unicodetools/home#TOC-UCA
224  diff the main mapping file, look for bad changes
225  (for example, more bytes per weight for common characters)
226    ~/svn.unitools/trunk$ sed -r -f ~/cldr/uni/src/tools/scripts/uca/blankweights.sed ../Generated/UCA/13.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-13.0.txt
227    ~/svn.unitools/trunk$ meld ../frac-12.1.txt ../frac-13.0.txt
228
229- CLDR root data files are checked into $CLDR_SRC/common/uca/
230    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
231
232- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
233    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
234- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
235    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
236    (note removing the underscore before "Rules")
237    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
238- restore TODO diffs in UCARules.txt
239    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
240- update (ICU4C)/source/test/testdata/CollationTest_*.txt
241  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
242  from the CLDR root files (..._CLDR_..._SHORT.txt)
243    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
244    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
245    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
246- if CLDR common/uca/unihan-index.txt changes, then update
247  CLDR common/collation/root.xml <collation type="private-unihan">
248  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
249
250- run genuca
251  $ICU_ROOT/dbg/tools/unicode/c$
252    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
253    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
254- rebuild ICU4C
255
256* Unihan collators
257    https://sites.google.com/site/unicodetools/unihan
258- run Unicode Tools
259    org.unicode.draft.GenerateUnihanCollators
260  with VM arguments
261    -ea
262    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
263    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
264    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
265    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
266    -DUVERSION=13.0.0
267- run Unicode Tools
268    org.unicode.draft.GenerateUnihanCollatorFiles
269  with the same arguments
270- check CLDR diffs
271    cd $CLDR_SRC
272    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
273    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
274- copy to CLDR
275    cd $CLDR_SRC
276    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
277    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
278- run CLDR unit tests, commit to CLDR
279- generate ICU zh collation data: run CLDR
280    org.unicode.cldr.icu.NewLdml2IcuConverter
281  with program arguments
282    -t collation
283    -s /usr/local/google/home/mscherer/cldr/uni/src/common/collation
284    -m /usr/local/google/home/mscherer/cldr/uni/src/common/supplemental
285    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
286    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
287    zh
288  and VM arguments
289    -ea
290    -DCLDR_DIR=/usr/local/google/home/mscherer/cldr/uni/src
291- rebuild ICU4C
292
293* run & fix ICU4C tests, now with new CLDR collation root data
294- run all tests with the collation test data *_SHORT.txt or the full files
295  (the full ones have comments, useful for debugging)
296- note on intltest: if collate/UCAConformanceTest fails, then
297  utility/MultithreadTest/TestCollators will fail as well;
298  fix the conformance test before looking into the multi-thread test
299
300* update Java data files
301- refresh just the UCD/UCA-related/derived files, just to be safe
302- see (ICU4C)/source/data/icu4j-readme.txt
303- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
304- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
305  output:
306    ...
307    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
308    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt66b
309    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b
310    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt66l.dat ./out/icu4j/icudt66b.dat -s ./out/build/icudt66l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt66b
311    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt66b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt66b"
312    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt66b/
313    mkdir -p /tmp/icu4j/main/shared/data
314    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
315    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt66b/
316    mkdir -p /tmp/icu4j/main/shared/data
317    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
318    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
319- copy the big-endian Unicode data files to another location,
320  separate from the other data files,
321  and then refresh ICU4J
322    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
323    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
324    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
325    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
326    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
327    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
328    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
329    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
330    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
331    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
332
333* When refreshing all of ICU4J data from ICU4C
334- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
335- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
336or
337- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
338
339* update CollationFCD.java
340  + copy & paste the initializers of lcccIndex[] etc. from
341    ICU4C/source/i18n/collationfcd.cpp to
342    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
343
344* refresh Java test .txt files
345- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
346    cd $ICU_SRC/icu4c/source/data/unidata
347    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
348    cd ../../test/testdata
349    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
350    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
351
352* run & fix ICU4J tests
353
354*** API additions
355- send notice to icu-design about new born-@stable API (enum constants etc.)
356
357*** CLDR numbering systems
358- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
359  for example, look for
360    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
361    in new blocks (Blocks.txt)
362  Unicode 13:
363    diak 11950..11959 Dives_Akuru
364
365*** merge the Unicode update branches back onto the trunk
366- do not merge the icudata.jar and testdata.jar,
367  instead rebuild them from merged & tested ICU4C
368- make sure that changes to Unicode tools are checked in:
369  http://www.unicode.org/utility/trac/log/trunk/unicodetools
370
371---------------------------------------------------------------------------- ***
372
373Unicode 12.1 update for ICU 64.2
374
375** This is an abbreviated update with one new character for the new
376** Japanese era expected to start on 2019-May-01: U+32FF SQUARE ERA NAME REIWA
377https://en.wikipedia.org/wiki/Reiwa_period
378
379http://www.unicode.org/versions/Unicode12.1.0/
380
381ICU-20497 Unicode 12.1
382
383cldrbug 11978: Unicode 12.1
384
385* Command-line environment setup
386
387UNICODE_DATA=~/unidata/uni121/20190403
388CLDR_SRC=~/svn.cldr/uni
389ICU_ROOT=~/icu/uni
390ICU_SRC=$ICU_ROOT/src
391ICUDT=icudt64b
392ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
393ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
394export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
395
396*** Unicode version numbers
397- makedata.mak
398- uchar.h
399- com.ibm.icu.util.VersionInfo
400- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
401
402- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
403    so that the makefiles see the new version number.
404  cd $ICU_ROOT/dbg/icu4c
405  ICU_DATA_BUILDTOOL_OPTS=--include_uni_core_data ../../../doconfig-clang-dbg.sh
406
407*** data files & enums & parser code
408
409* download files
410- mkdir -p $UNICODE_DATA
411- download Unicode files into $UNICODE_DATA
412  + subfolders: emoji, idna, security, ucd, uca
413  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
414
415* for manual diffs and for Unicode Tools input data updates:
416  remove version suffixes from the file names
417    ~$ unidata/desuffixucd.py $UNICODE_DATA
418  (see https://sites.google.com/site/unicodetools/inputdata)
419
420* process and/or copy files
421- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
422  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
423  + For debugging, and tweaking how ppucd.txt is written,
424    the tool has an --only_ppucd option:
425    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
426
427- cp -v $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
428
429* build ICU (make install)
430  so that the tools build can pick up the new definitions from the installed header files.
431
432  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
433
434* update spoof checker UnicodeSet initializers:
435    inclusionPat & recommendedPat in uspoof.cpp
436    INCLUSION & RECOMMENDED in SpoofChecker.java
437- make sure that the Unicode Tools tree contains the latest security data files
438- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
439- update the hardcoded version number there in the DIRECTORY path
440- run the tool (no special environment variables needed)
441- copy & paste from the Console output into the .cpp & .java files
442
443* generate normalization data files
444  cd $ICU_ROOT/dbg/icu4c
445  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
446  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
447  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
448  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
449  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
450
451* build ICU (make install)
452  so that the tools build can pick up the new definitions from the installed header files.
453
454  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
455
456* build Unicode tools using CMake+make
457
458$ICU_SRC/tools/unicode/c/icudefs.txt:
459
460# Location (--prefix) of where ICU was installed.
461set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
462# Location of the ICU4C source tree.
463set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
464
465  $ICU_ROOT/dbg$
466    mkdir -p tools/unicode/c
467    cd tools/unicode/c
468
469  $ICU_ROOT/dbg/tools/unicode/c$
470    cmake ../../../../src/tools/unicode/c
471    make
472
473* generate core properties data files
474  $ICU_ROOT/dbg/tools/unicode/c$
475    genprops/genprops $ICU_SRC/icu4c
476    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
477    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
478- rebuild ICU (make install) & tools
479
480* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
481  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
482- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
483- Unicode 6.0..12.1: U+2260, U+226E, U+226F
484- nothing new in this Unicode version, no test file to update
485
486* run & fix ICU4C tests
487- Andy handles RBBI & spoof check test failures
488
489* collation: CLDR collation root, UCA DUCET
490
491- UCA DUCET goes into Mark's Unicode tools, see
492    https://sites.google.com/site/unicodetools/home#TOC-UCA
493  diff the main mapping file, look for bad changes
494  (for example, more bytes per weight for common characters)
495    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.1.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.1.txt
496    ~/svn.unitools/trunk$ meld ../frac-12.txt ../frac-12.1.txt
497
498- CLDR root data files are checked into $CLDR_SRC/common/uca/
499    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
500
501- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
502    cp -v $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
503- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
504    cp -v $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
505    (note removing the underscore before "Rules")
506    cp -v $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
507- restore TODO diffs in UCARules.txt
508    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
509- update (ICU4C)/source/test/testdata/CollationTest_*.txt
510  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
511  from the CLDR root files (..._CLDR_..._SHORT.txt)
512    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
513    cp -v $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
514    cp -v $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
515- if CLDR common/uca/unihan-index.txt changes, then update
516  CLDR common/collation/root.xml <collation type="private-unihan">
517  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
518
519- run genuca, see command line above
520- rebuild ICU4C
521
522* Unihan collators
523    https://sites.google.com/site/unicodetools/unihan
524- run Unicode Tools
525    org.unicode.draft.GenerateUnihanCollators
526  with VM arguments
527    -ea
528    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
529    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
530    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
531    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
532    -DUVERSION=12.1.0
533- run Unicode Tools
534    org.unicode.draft.GenerateUnihanCollatorFiles
535  with the same arguments
536- check CLDR diffs
537    cd $CLDR_SRC
538    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
539    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
540- copy to CLDR
541    cd $CLDR_SRC
542    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
543    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
544- run CLDR unit tests, commit to CLDR
545- generate ICU zh collation data: run CLDR
546    org.unicode.cldr.icu.NewLdml2IcuConverter
547  with program arguments
548    -t collation
549    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
550    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
551    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
552    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
553    zh
554  and VM arguments
555    -ea
556    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
557- rebuild ICU4C
558
559* run & fix ICU4C tests, now with new CLDR collation root data
560- run all tests with the collation test data *_SHORT.txt or the full files
561  (the full ones have comments, useful for debugging)
562- note on intltest: if collate/UCAConformanceTest fails, then
563  utility/MultithreadTest/TestCollators will fail as well;
564  fix the conformance test before looking into the multi-thread test
565
566* update Java data files
567- refresh just the UCD/UCA-related/derived files, just to be safe
568- see (ICU4C)/source/data/icu4j-readme.txt
569- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
570- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
571  output:
572    ...
573    make[1]: Entering directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
574    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt64b
575    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b
576    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt64l.dat ./out/icu4j/icudt64b.dat -s ./out/build/icudt64l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt64b
577    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt64b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt64b"
578    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt64b/
579    mkdir -p /tmp/icu4j/main/shared/data
580    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
581    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt64b/
582    mkdir -p /tmp/icu4j/main/shared/data
583    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
584    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
585- copy the big-endian Unicode data files to another location,
586  separate from the other data files,
587  and then refresh ICU4J
588    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
589    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
590    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
591    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
592    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
593    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
594    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
595    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
596    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
597    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
598
599* When refreshing all of ICU4J data from ICU4C
600- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
601- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
602or
603- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
604
605* update CollationFCD.java
606  + copy & paste the initializers of lcccIndex[] etc. from
607    ICU4C/source/i18n/collationfcd.cpp to
608    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
609
610* refresh Java test .txt files
611- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
612    cd $ICU_SRC/icu4c/source/data/unidata
613    cp -v confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
614    cd ../../test/testdata
615    cp -v BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
616    cp -v $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
617
618* run & fix ICU4J tests
619
620*** API additions
621- send notice to icu-design about new born-@stable API (enum constants etc.)
622
623*** CLDR numbering systems
624- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
625  for example, look for
626    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
627    in new blocks (Blocks.txt)
628  Unicode 12: using Unicode 12 CLDR ticket #11478
629    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
630    wcho 1E2F0..1E2F9 Wancho
631  Unicode 11: using Unicode 11 CLDR ticket #10978
632    rohg 10D30..10D39 Hanifi_Rohingya
633    gong 11DA0..11DA9 Gunjala_Gondi
634  Earlier: CLDR tickets specific to adding new numbering systems.
635  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
636  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
637
638*** merge the Unicode update branches back onto the trunk
639- do not merge the icudata.jar and testdata.jar,
640  instead rebuild them from merged & tested ICU4C
641- make sure that changes to Unicode tools are checked in:
642  http://www.unicode.org/utility/trac/log/trunk/unicodetools
643
644---------------------------------------------------------------------------- ***
645
646Unicode 12.0 update for ICU 64
647
648http://www.unicode.org/versions/Unicode12.0.0/
649http://unicode.org/versions/beta-12.0.0.html
650https://www.unicode.org/review/pri389/
651http://www.unicode.org/reports/uax-proposed-updates.html
652http://www.unicode.org/reports/tr44/tr44-23.html
653
654ICU-20203 Unicode 12
655
656ICU-20111 move text layout properties data into a data file
657
658cldrbug 11478: Unicode 12
659Accidentally used ^/trunk instead of ^/branches/markus/uni12
660
661* Command-line environment setup
662
663UNICODE_DATA=~/unidata/uni12/20190309
664CLDR_SRC=~/svn.cldr/uni
665ICU_ROOT=~/icu/uni
666ICU_SRC=$ICU_ROOT/src
667ICUDT=icudt63b
668ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
669ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
670export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
671
672*** Unicode version numbers
673- makedata.mak
674- uchar.h
675- com.ibm.icu.util.VersionInfo
676- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
677
678- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
679  so that the makefiles see the new version number.
680
681*** data files & enums & parser code
682
683* download files
684- mkdir -p $UNICODE_DATA
685- download Unicode files into $UNICODE_DATA
686  + subfolders: emoji, idna, security, ucd, uca
687  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
688
689* for manual diffs and for Unicode Tools input data updates:
690  remove version suffixes from the file names
691    ~$ unidata/desuffixucd.py $UNICODE_DATA
692  (see https://sites.google.com/site/unicodetools/inputdata)
693
694* process and/or copy files
695- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
696  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
697  + For debugging, and tweaking how ppucd.txt is written,
698    the tool has an --only_ppucd option:
699    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
700
701- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
702
703* build ICU (make install)
704  so that the tools build can pick up the new definitions from the installed header files.
705
706  $ICU_ROOT/dbg/icu4c$ echo;echo; date; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
707
708* new constants for new property values
709- preparseucd.py error:
710    ValueError: missing uchar.h enum constants for some property values:
711    [(u'blk', set([u'Symbols_And_Pictographs_Ext_A', u'Elymaic',
712        u'Ottoman_Siyaq_Numbers', u'Nandinagari', u'Nyiakeng_Puachue_Hmong',
713        u'Small_Kana_Ext', u'Egyptian_Hieroglyph_Format_Controls', u'Wancho', u'Tamil_Sup'])),
714    (u'sc', set([u'Nand', u'Wcho', u'Elym', u'Hmnp']))]
715  = PropertyValueAliases.txt new property values (diff old & new .txt files)
716    blk; Egyptian_Hieroglyph_Format_Controls; Egyptian_Hieroglyph_Format_Controls
717    blk; Elymaic                          ; Elymaic
718    blk; Nandinagari                      ; Nandinagari
719    blk; Nyiakeng_Puachue_Hmong           ; Nyiakeng_Puachue_Hmong
720    blk; Ottoman_Siyaq_Numbers            ; Ottoman_Siyaq_Numbers
721    blk; Small_Kana_Ext                   ; Small_Kana_Extension
722    blk; Symbols_And_Pictographs_Ext_A    ; Symbols_And_Pictographs_Extended_A
723    blk; Tamil_Sup                        ; Tamil_Supplement
724    blk; Wancho                           ; Wancho
725  -> add to uchar.h
726    use long property names for enum constants,
727    for the trailing comment get the block start code point: diff old & new Blocks.txt
728  -> add to UCharacter.UnicodeBlock IDs
729    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
730            replace  public static final int \1_ID = \2; \3
731  -> add to UCharacter.UnicodeBlock objects
732    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
733            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \3
734
735    sc ; Elym                             ; Elymaic
736    sc ; Hmnp                             ; Nyiakeng_Puachue_Hmong
737    sc ; Nand                             ; Nandinagari
738    sc ; Wcho                             ; Wancho
739  -> uscript.h & com.ibm.icu.lang.UScript
740  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
741      and in com.ibm.icu.dev.test.lang.TestUScript.java
742
743* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
744    (not strictly necessary for NOT_ENCODED scripts)
745  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
746
747* update spoof checker UnicodeSet initializers:
748    inclusionPat & recommendedPat in uspoof.cpp
749    INCLUSION & RECOMMENDED in SpoofChecker.java
750- make sure that the Unicode Tools tree contains the latest security data files
751- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
752- update the hardcoded version number there in the DIRECTORY path
753- run the tool (no special environment variables needed)
754- copy & paste from the Console output into the .cpp & .java files
755
756* generate normalization data files
757  cd $ICU_ROOT/dbg/icu4c
758  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
759  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
760  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
761  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
762  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
763
764* build ICU (make install)
765  so that the tools build can pick up the new definitions from the installed header files.
766
767  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install &> out.txt ; tail -n 30 out.txt ; date
768
769* build Unicode tools using CMake+make
770
771$ICU_SRC/tools/unicode/c/icudefs.txt:
772
773# Location (--prefix) of where ICU was installed.
774set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
775# Location of the ICU4C source tree.
776set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/uni/src/icu4c)
777
778  $ICU_ROOT/dbg$
779    mkdir -p tools/unicode/c
780    cd tools/unicode/c
781
782  $ICU_ROOT/dbg/tools/unicode/c$
783    cmake ../../../../src/tools/unicode/c
784    make
785
786* generate core properties data files
787  $ICU_ROOT/dbg/tools/unicode/c$
788    genprops/genprops $ICU_SRC/icu4c
789    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c && \
790    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
791- rebuild ICU (make install) & tools
792
793* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
794  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
795- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
796- Unicode 6.0..12.0: U+2260, U+226E, U+226F
797- nothing new in this Unicode version, no test file to update
798
799* run & fix ICU4C tests
800- update test of default bidi classes:
801  Bidi range \U0001ED00-\U0001ED4F changes default from R to AL,
802  see diffs in DerivedBidiClass.txt
803  + /tsutil/cucdtst/TestUnicodeData enumDefaultsRange() defaultBidi[]
804  + UCharacterTest.java TestIteration() defaultBidi[]
805- Andy handles RBBI & spoof check test failures
806
807* collation: CLDR collation root, UCA DUCET
808
809- UCA DUCET goes into Mark's Unicode tools, see
810    https://sites.google.com/site/unicodetools/home#TOC-UCA
811  diff the main mapping file, look for bad changes
812  (for example, more bytes per weight for common characters)
813    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/UCA/12.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-12.txt
814    ~/svn.unitools/trunk$ meld ../frac-11.txt ../frac-12.txt
815
816- CLDR root data files are checked into $CLDR_SRC/common/uca/
817    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
818
819- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
820    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
821- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
822    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
823    (note removing the underscore before "Rules")
824    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
825- restore TODO diffs in UCARules.txt
826    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
827- update (ICU4C)/source/test/testdata/CollationTest_*.txt
828  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
829  from the CLDR root files (..._CLDR_..._SHORT.txt)
830    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
831    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
832    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
833- if CLDR common/uca/unihan-index.txt changes, then update
834  CLDR common/collation/root.xml <collation type="private-unihan">
835  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
836
837- run genuca, see command line above;
838  deal with
839    Error: Unknown script for first-primary sample character U+119CE on line 29233 of /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
840    FDD1 119CE;	[71 CD 02, 05, 05]	# Nandinagari first primary (compressible)
841        (add the character to genuca.cpp sampleCharsToScripts[])
842  + This time, I added code to genuca.cpp to use uscript_getSampleUnicodeString(script)
843    and cache its values.
844    Works as long as the script metadata is updated before the collation data.
845- rebuild ICU4C
846
847* Unihan collators
848    https://sites.google.com/site/unicodetools/unihan
849- run Unicode Tools
850    org.unicode.draft.GenerateUnihanCollators
851  with VM arguments
852    -ea
853    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
854    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
855    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
856    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
857    -DUVERSION=12.0.0
858- run Unicode Tools
859    org.unicode.draft.GenerateUnihanCollatorFiles
860  with the same arguments
861- check CLDR diffs
862    cd $CLDR_SRC
863    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
864    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
865- copy to CLDR
866    cd $CLDR_SRC
867    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
868    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
869- run CLDR unit tests, commit to CLDR
870- generate ICU zh collation data: run CLDR
871    org.unicode.cldr.icu.NewLdml2IcuConverter
872  with program arguments
873    -t collation
874    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
875    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
876    -d /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/coll
877    -p /usr/local/google/home/mscherer/icu/uni/src/icu4c/source/data/xml/collation
878    zh
879  and VM arguments
880    -ea
881    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
882- rebuild ICU4C
883
884* run & fix ICU4C tests, now with new CLDR collation root data
885- run all tests with the collation test data *_SHORT.txt or the full files
886  (the full ones have comments, useful for debugging)
887- note on intltest: if collate/UCAConformanceTest fails, then
888  utility/MultithreadTest/TestCollators will fail as well;
889  fix the conformance test before looking into the multi-thread test
890
891* update Java data files
892- refresh just the UCD/UCA-related/derived files, just to be safe
893- see (ICU4C)/source/data/icu4j-readme.txt
894- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
895- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
896  output:
897    ...
898    Unicode .icu files built to ./out/build/icudt63l
899    echo timestamp > uni-core-data
900    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt63b
901    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b
902    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
903    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt63l.dat ./out/icu4j/icudt63b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt63l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt63b
904    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt63b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt63b"
905    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt63b/
906    mkdir -p /tmp/icu4j/main/shared/data
907    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
908    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt63b/
909    mkdir -p /tmp/icu4j/main/shared/data
910    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
911    make[1]: Leaving directory '/usr/local/google/home/mscherer/icu/uni/dbg/icu4c/data'
912- copy the big-endian Unicode data files to another location,
913  separate from the other data files,
914  and then refresh ICU4J
915    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
916    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
917    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
918    cp -v com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
919    cp -v com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
920    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
921    cp -v com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
922    cp -v com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
923    cp -v com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
924    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
925
926* When refreshing all of ICU4J data from ICU4C
927- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
928- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
929or
930- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
931
932* update CollationFCD.java
933  + copy & paste the initializers of lcccIndex[] etc. from
934    ICU4C/source/i18n/collationfcd.cpp to
935    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
936
937* refresh Java test .txt files
938- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
939    cd $ICU_SRC/icu4c/source/data/unidata
940    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
941    cd ../../test/testdata
942    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
943    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
944
945* run & fix ICU4J tests
946
947*** API additions
948- send notice to icu-design about new born-@stable API (enum constants etc.)
949
950*** CLDR numbering systems
951- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
952  for example, look for
953    ~/icu/uni/src$ egrep ';gc=Nd.+;nv=4' icu4c/source/data/unidata/ppucd.txt
954    in new blocks (Blocks.txt)
955  Unicode 12: using Unicode 12 CLDR ticket #11478
956    hmnp 1E140..1E149 Nyiakeng_Puachue_Hmong
957    wcho 1E2F0..1E2F9 Wancho
958  Unicode 11: using Unicode 11 CLDR ticket #10978
959    rohg 10D30..10D39 Hanifi_Rohingya
960    gong 11DA0..11DA9 Gunjala_Gondi
961  Earlier: CLDR tickets specific to adding new numbering systems.
962  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
963  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
964
965*** merge the Unicode update branches back onto the trunk
966- do not merge the icudata.jar and testdata.jar,
967  instead rebuild them from merged & tested ICU4C
968- make sure that changes to Unicode tools are checked in:
969  http://www.unicode.org/utility/trac/log/trunk/unicodetools
970
971---------------------------------------------------------------------------- ***
972
973ICU 63 addition of ICU support of text layout properties InPC, InSC, vo
974
975* Command-line environment setup
976
977UNICODE_DATA=~/unidata/uni11/20180609
978CLDR_SRC=~/svn.cldr/uni
979ICU_ROOT=~/icu/mine
980ICU_SRC=$ICU_ROOT/src
981ICUDT=icudt62b
982ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
983ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
984export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
985
986*** Links
987
988https://unicode-org.atlassian.net/browse/ICU-8966 InPC & InSC
989https://unicode-org.atlassian.net/browse/ICU-12850 vo
990
991*** data files & enums & parser code
992
993* API additions
994- for each of the three new enumerated properties
995  + uchar.h: add the enum UProperty constant UCHAR_<long prop name>
996  + uchar.h: update UCHAR_INT_LIMIT
997  + uchar.h: add the enum U<long prop name>
998    with constants U_<short prop name>_<long value name>
999  + UProperty.java: add the constant <long prop name>
1000  + UProperty.java: update INT_LIMIT
1001  + UCharacter.java: add the interface <long prop name>
1002    with constants <long value name>
1003
1004* process and/or copy files
1005- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1006  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1007  + It also writes tools/unicode/c/genprops/pnames_data.h with property and value
1008    names and aliases.
1009  + For debugging, and tweaking how ppucd.txt is written,
1010    the tool has an --only_ppucd option:
1011    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1012
1013* preparseucd.py changes
1014- add new property short names (uppercase) to _prop_and_value_re
1015  so that ParseUCharHeader() parses the new enum constants
1016
1017* build ICU (make install)
1018  so that the tools build can pick up the new definitions from the installed header files.
1019
1020  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1021
1022* build Unicode tools using CMake+make
1023
1024$ICU_SRC/tools/unicode/c/icudefs.txt:
1025
1026# Location (--prefix) of where ICU was installed.
1027set(ICU_INST_DIR /usr/local/google/home/mscherer/icu/mine/inst/icu4c)
1028# Location of the ICU4C source tree.
1029set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/icu/mine/src/icu4c)
1030
1031  $ICU_ROOT/dbg$
1032    mkdir -p tools/unicode/c
1033    cd tools/unicode/c
1034
1035  $ICU_ROOT/dbg/tools/unicode/c$
1036    cmake ../../../../../src/tools/unicode/c
1037    make
1038
1039* generate core properties data files
1040  $ICU_ROOT/dbg/tools/unicode/c$
1041    genprops/genprops $ICU_SRC/icu4c
1042- rebuild ICU (make install) & tools
1043
1044* write data for runtime, hardcoded for now
1045- add genprops/layoutpropsbuilder.cpp with pieces from sibling files
1046- generate new icu4c/source/common/ulayout_props_data.h
1047- for each of the three new enumerated properties
1048  + int property max value
1049  + small, 8-bit UCPTrie
1050    (A small 16-bit trie with bit fields for these three properties
1051    is very nearly the same size as the sum of the three.)
1052
1053* wire into C++
1054- uprops.cpp: #include ulayout_props_data.h
1055- uprops.cpp: add getInPC() etc. functions
1056- uprops.cpp: add lines to intProps[], include max values
1057- uprops.h: add UPropertySource constants
1058- uprops.cpp: add uprops_addPropertyStarts(src)
1059- uniset_props.cpp: add to UnicodeSet_initInclusion()
1060- intltest/ucdtest.cpp: write unit tests
1061
1062* update Java data files
1063- refresh just the pnames.icu file with the new property [value] names, just to be safe
1064- see $ICU_SRC/icu4c/source/data/icu4j-readme.txt
1065- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1066- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1067- copy the big-endian Unicode data files to another location,
1068  separate from the other data files,
1069  and then refresh ICU4J
1070    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1071    cp com/ibm/icu/impl/data/$ICUDT/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1072    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1073
1074* wire into Java
1075- UCharacterProperty.java: add new SRC_INPC etc. constants as in C++
1076- UCharacterProperty.java: for each new property
1077  + create a nested class to hold its CodePointTrie
1078  + initialize it from a string literal
1079  + paste in the initializer printed by genprops
1080  + add a new IntProperty object to the intProps[] array
1081  + use the correct max int value for each property, also printed by genprops
1082- UCharacterProperty.java: add ulayout_addPropertyStarts(src, set)
1083- UnicodeSet.java: add to getInclusions()
1084- UCharacterTest.java: write unit tests
1085
1086---------------------------------------------------------------------------- ***
1087
1088Unicode 11.0 update for ICU 62
1089
1090http://www.unicode.org/versions/Unicode11.0.0/
1091http://unicode.org/versions/beta-11.0.0.html
1092https://www.unicode.org/review/pri372/
1093http://www.unicode.org/reports/uax-proposed-updates.html
1094http://www.unicode.org/reports/tr44/tr44-21.html
1095
1096* Command-line environment setup
1097
1098UNICODE_DATA=~/unidata/uni11/20180521
1099CLDR_SRC=~/svn.cldr/uni
1100ICU_ROOT=~/svn.icu/uni
1101ICU_SRC=$ICU_ROOT/src
1102ICUDT=icudt61b
1103ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1104ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1105export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1106
1107*** ICU Trac
1108
1109- ticket:13630: Unicode 11
1110- ^/branches/markus/uni11
1111
1112*** CLDR Trac
1113
1114- cldrbug 10978: Unicode 11
1115- ^/branches/markus/uni11
1116
1117*** Unicode version numbers
1118- makedata.mak
1119- uchar.h
1120- com.ibm.icu.util.VersionInfo
1121- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1122
1123- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1124  so that the makefiles see the new version number.
1125
1126*** data files & enums & parser code
1127
1128* download files
1129- mkdir -p $UNICODE_DATA
1130- download Unicode files into $UNICODE_DATA
1131  + subfolders: emoji, idna, security, ucd, uca
1132  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1133
1134* for manual diffs and for Unicode Tools input data updates:
1135  remove version suffixes from the file names
1136    ~$ unidata/desuffixucd.py $UNICODE_DATA
1137  (see https://sites.google.com/site/unicodetools/inputdata)
1138
1139* process and/or copy files
1140- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1141  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1142  + For debugging, and tweaking how ppucd.txt is written,
1143    the tool has an --only_ppucd option:
1144    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1145
1146- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1147
1148* build ICU (make install)
1149  so that the tools build can pick up the new definitions from the installed header files.
1150
1151  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1152
1153* preparseucd.py changes
1154- fix other errors
1155    NameError: unknown property Extended_Pictographic
1156  -> add Extended_Pictographic binary property
1157  -> add new short names for all Emoji properties
1158
1159* new constants for new property values
1160- preparseucd.py error:
1161    ValueError: missing uchar.h enum constants for some property values:
1162    [(u'blk', set([u'Georgian_Ext', u'Hanifi_Rohingya', u'Medefaidrin', u'Sogdian', u'Makasar',
1163                   u'Old_Sogdian', u'Dogra', u'Gunjala_Gondi', u'Chess_Symbols', u'Mayan_Numerals',
1164                   u'Indic_Siyaq_Numbers'])),
1165     (u'jg', set([u'Hanifi_Rohingya_Kinna_Ya', u'Hanifi_Rohingya_Pa'])),
1166     (u'sc', set([u'Medf', u'Sogd', u'Dogr', u'Rohg', u'Maka', u'Sogo', u'Gong'])),
1167     (u'GCB', set([u'LinkC', u'Virama'])),
1168     (u'WB', set([u'WSegSpace']))]
1169  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1170    blk; Chess_Symbols                    ; Chess_Symbols
1171    blk; Dogra                            ; Dogra
1172    blk; Georgian_Ext                     ; Georgian_Extended
1173    blk; Gunjala_Gondi                    ; Gunjala_Gondi
1174    blk; Hanifi_Rohingya                  ; Hanifi_Rohingya
1175    blk; Indic_Siyaq_Numbers              ; Indic_Siyaq_Numbers
1176    blk; Makasar                          ; Makasar
1177    blk; Mayan_Numerals                   ; Mayan_Numerals
1178    blk; Medefaidrin                      ; Medefaidrin
1179    blk; Old_Sogdian                      ; Old_Sogdian
1180    blk; Sogdian                          ; Sogdian
1181  -> add to uchar.h
1182    use long property names for enum constants,
1183    for the trailing comment get the block start code point: diff old & new Blocks.txt
1184  -> add to UCharacter.UnicodeBlock IDs
1185    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1186            replace  public static final int \1_ID = \2; \3
1187  -> add to UCharacter.UnicodeBlock objects
1188    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1189            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1190
1191    GCB; LinkC                            ; LinkingConsonant
1192    GCB; Virama                           ; Virama
1193  -> uchar.h & UCharacter.GraphemeClusterBreak
1194  -> these two later removed again: http://www.unicode.org/L2/L2018/18115.htm#155-A76
1195
1196    InSC; Consonant_Initial_Postfixed     ; Consonant_Initial_Postfixed
1197  -> ignore: ICU does not yet support this property
1198
1199    jg ; Hanifi_Rohingya_Kinna_Ya         ; Hanifi_Rohingya_Kinna_Ya
1200    jg ; Hanifi_Rohingya_Pa               ; Hanifi_Rohingya_Pa
1201  -> uchar.h & UCharacter.JoiningGroup
1202
1203    sc ; Dogr                             ; Dogra
1204    sc ; Gong                             ; Gunjala_Gondi
1205    sc ; Maka                             ; Makasar
1206    sc ; Medf                             ; Medefaidrin
1207    sc ; Rohg                             ; Hanifi_Rohingya
1208    sc ; Sogd                             ; Sogdian
1209    sc ; Sogo                             ; Old_Sogdian
1210  -> uscript.h & com.ibm.icu.lang.UScript
1211  -> Nushu had been added already
1212  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1213      and in com.ibm.icu.dev.test.lang.TestUScript.java
1214
1215    WB ; WSegSpace                        ; WSegSpace
1216  -> uchar.h & UCharacter.WordBreak
1217
1218* New short names for emoji properties
1219- see UTS #51
1220- short names set in preparseucd.py
1221
1222* New properties
1223- boolean emoji property Extended_Pictographic
1224  -> added in preparseucd.py
1225  -> uchar.h & UProperty.java
1226- misc. property Equivalent_Unified_Ideograph (EqUIdeo)
1227  as shown in PropertyValueAliases.txt
1228  -> ignore for now
1229
1230* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1231    (not strictly necessary for NOT_ENCODED scripts)
1232  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1233
1234* update spoof checker UnicodeSet initializers:
1235    inclusionPat & recommendedPat in uspoof.cpp
1236    INCLUSION & RECOMMENDED in SpoofChecker.java
1237- make sure that the Unicode Tools tree contains the latest security data files
1238- go to Unicode Tools org.unicode.text.tools.RecommendedSetGenerator
1239- update the hardcoded version number there in the DIRECTORY path
1240- run the tool (no special environment variables needed)
1241- copy & paste from the Console output into the .cpp & .java files
1242
1243* generate normalization data files
1244  cd $ICU_ROOT/dbg/icu4c
1245  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1246  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1247  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1248  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1249  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1250
1251* build ICU (make install)
1252  so that the tools build can pick up the new definitions from the installed header files.
1253
1254  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1255
1256* build Unicode tools using CMake+make
1257
1258$ICU_SRC/tools/unicode/c/icudefs.txt:
1259
1260# Location (--prefix) of where ICU was installed.
1261set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1262# Location of the ICU4C source tree.
1263set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c)
1264
1265  $ICU_ROOT/dbg$
1266    mkdir -p tools/unicode/c
1267    cd tools/unicode/c
1268
1269  $ICU_ROOT/dbg/tools/unicode/c$
1270    cmake ../../../../src/tools/unicode/c
1271    make
1272
1273* generate core properties data files
1274  $ICU_ROOT/dbg/tools/unicode/c$
1275    genprops/genprops $ICU_SRC/icu4c
1276    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1277    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1278- rebuild ICU (make install) & tools
1279
1280* Fix case props
1281    genprops error: casepropsbuilder: too many exceptions words
1282    genprops error: failure finalizing the data - U_BUFFER_OVERFLOW_ERROR
1283- With the addition of Georgian Mtavruli capital letters,
1284  there are now too many simple case mappings with big mapping deltas
1285  that yield uncompressible exceptions.
1286- Changing the data structure (now formatVersion 4),
1287  adding one bit for no-simple-case-folding (for Cherokee), and
1288  one optional slot for a big delta (for most faraway mappings),
1289  together with another bit for whether that is negative.
1290  This makes most Cherokee & Georgian etc. case mappings compressible,
1291  reducing the number of exceptions words.
1292- Further changes to gain one more bit for the exceptions index,
1293  for future growth. Details see casepropsbuilder.cpp.
1294
1295* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1296  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1297- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1298- Unicode 6.0..11.0: U+2260, U+226E, U+226F
1299- nothing new in this Unicode version, no test file to update
1300
1301* run & fix ICU4C tests
1302- Andy handles RBBI & spoof check test failures
1303
1304- Errors in char.txt, word.txt, word_POSIX.txt like
1305    createRuleBasedBreakIterator: ICU Error "U_BRK_RULE_EMPTY_SET"  at line 46, column 16
1306  because \p{Grapheme_Cluster_Break = EBG} and \p{Word_Break = EBG} are empty.
1307  -> Temporary(!) workaround: Add an arbitrary code point to these sets to make them
1308     not empty, just to get ICU building.
1309  -> Intermediate workaround: Remove $E_Base_GAZ and other now-unused variables
1310     and properties together with the rules that used them (GB 10, WB 14).
1311  -> Andy adjusts the rule sets further to sync with
1312     Unicode 11 grapheme, word, and line break spec changes.
1313
1314* collation: CLDR collation root, UCA DUCET
1315
1316- UCA DUCET goes into Mark's Unicode tools, see
1317    https://sites.google.com/site/unicodetools/home#TOC-UCA
1318  diff the main mapping file, look for bad changes
1319  (for example, more bytes per weight for common characters)
1320    ~/svn.unitools/trunk$ sed -r -f ~/svn.cldr/uni/tools/scripts/uca/blankweights.sed ../Generated/uca/11.0.0/CollationAuxiliary/FractionalUCA.txt > ../frac-11.txt
1321    ~/svn.unitools/trunk$ meld ../frac-10.txt ../frac-11.txt
1322
1323- CLDR root data files are checked into $CLDR_SRC/common/uca/
1324    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1325
1326- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1327    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1328- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1329    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1330    (note removing the underscore before "Rules")
1331    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1332- restore TODO diffs in UCARules.txt
1333    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1334- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1335  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1336  from the CLDR root files (..._CLDR_..._SHORT.txt)
1337    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1338    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1339    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1340- if CLDR common/uca/unihan-index.txt changes, then update
1341  CLDR common/collation/root.xml <collation type="private-unihan">
1342  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1343
1344- run genuca, see command line above;
1345  deal with
1346    Error: Unknown script for first-primary sample character U+1180B on line 28649 of /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/unidata/FractionalUCA.txt:
1347    FDD1 1180B;	[71 CC 02, 05, 05]	# Dogra first primary (compressible)
1348        (add the character to genuca.cpp sampleCharsToScripts[])
1349  + look up the USCRIPT_ code for the new sample characters
1350    (should be obvious from the comment in the error output)
1351  + *add* mappings to sampleCharsToScripts[], do not replace them
1352    (in case the script sample characters flip-flop)
1353  + insert new scripts in DUCET script order, see the top_byte table
1354    at the beginning of FractionalUCA.txt
1355- rebuild ICU4C
1356
1357* Unihan collators
1358    https://sites.google.com/site/unicodetools/unihan
1359- run Unicode Tools
1360    org.unicode.draft.GenerateUnihanCollators
1361  with VM arguments
1362    -ea
1363    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1364    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1365    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1366    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1367    -DUVERSION=11.0.0
1368- run Unicode Tools
1369    org.unicode.draft.GenerateUnihanCollatorFiles
1370  with the same arguments
1371- check CLDR diffs
1372    cd $CLDR_SRC
1373    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1374    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1375- copy to CLDR
1376    cd $CLDR_SRC
1377    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1378    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1379- run CLDR unit tests, commit to CLDR
1380- generate ICU zh collation data: run CLDR
1381    org.unicode.cldr.icu.NewLdml2IcuConverter
1382  with program arguments
1383    -t collation
1384    -s /usr/local/google/home/mscherer/svn.cldr/uni/common/collation
1385    -m /usr/local/google/home/mscherer/svn.cldr/uni/common/supplemental
1386    -d /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/coll
1387    -p /usr/local/google/home/mscherer/svn.icu/uni/src/icu4c/source/data/xml/collation
1388    zh
1389  and VM arguments
1390    -ea
1391    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni
1392- rebuild ICU4C
1393
1394* run & fix ICU4C tests, now with new CLDR collation root data
1395- run all tests with the collation test data *_SHORT.txt or the full files
1396  (the full ones have comments, useful for debugging)
1397- note on intltest: if collate/UCAConformanceTest fails, then
1398  utility/MultithreadTest/TestCollators will fail as well;
1399  fix the conformance test before looking into the multi-thread test
1400
1401* update Java data files
1402- refresh just the UCD/UCA-related/derived files, just to be safe
1403- see (ICU4C)/source/data/icu4j-readme.txt
1404- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1405- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1406  output:
1407    ...
1408    Unicode .icu files built to ./out/build/icudt61l
1409    echo timestamp > uni-core-data
1410    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1411    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b
1412    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1413    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt61l.dat ./out/icu4j/icudt61b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt61l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt61b
1414    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt61b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt61b"
1415    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt61b/
1416    mkdir -p /tmp/icu4j/main/shared/data
1417    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1418    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt61b/
1419    mkdir -p /tmp/icu4j/main/shared/data
1420    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1421    make[1]: Leaving directory '/usr/local/google/home/mscherer/svn.icu/uni/dbg/icu4c/data'
1422- copy the big-endian Unicode data files to another location,
1423  separate from the other data files,
1424  and then refresh ICU4J
1425    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1426    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1427    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1428    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1429    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1430    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1431    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1432    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1433    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1434    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1435
1436* When refreshing all of ICU4J data from ICU4C
1437- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1438- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1439or
1440- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1441
1442* update CollationFCD.java
1443  + copy & paste the initializers of lcccIndex[] etc. from
1444    ICU4C/source/i18n/collationfcd.cpp to
1445    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1446
1447* refresh Java test .txt files
1448- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1449    cd $ICU_SRC/icu4c/source/data/unidata
1450    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1451    cd ../../test/testdata
1452    cp BidiCharacterTest.txt BidiTest.txt IdnaTestV2.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1453    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1454
1455* run & fix ICU4J tests
1456
1457*** API additions
1458- send notice to icu-design about new born-@stable API (enum constants etc.)
1459
1460*** CLDR numbering systems
1461- look for new sets of decimal digits (gc=ND & nv=4) and add to CLDR
1462  Unicode 11: using Unicode 11 CLDR ticket #10978
1463    rohg 10D30..10D39 Hanifi_Rohingya
1464    gong 11DA0..11DA9 Gunjala_Gondi
1465  Earlier: CLDR tickets specific to adding new numbering systems.
1466  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1467  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1468
1469*** merge the Unicode update branches back onto the trunk
1470- do not merge the icudata.jar and testdata.jar,
1471  instead rebuild them from merged & tested ICU4C
1472- make sure that changes to Unicode tools are checked in:
1473  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1474
1475---------------------------------------------------------------------------- ***
1476
1477Unicode 10.0 update for ICU 60
1478
1479http://www.unicode.org/versions/Unicode10.0.0/
1480http://www.unicode.org/versions/beta-10.0.0.html
1481http://blog.unicode.org/2017/03/unicode-100-beta-review.html
1482http://www.unicode.org/review/pri350/
1483http://www.unicode.org/reports/uax-proposed-updates.html
1484http://www.unicode.org/reports/tr44/tr44-19.html
1485
1486* Command-line environment setup
1487
1488UNICODE_DATA=~/unidata/uni10/20170605
1489CLDR_SRC=~/svn.cldr/uni10
1490ICU_ROOT=~/svn.icu/uni10
1491ICU_SRC=$ICU_ROOT/src
1492ICUDT=icudt60b
1493ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in
1494ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata
1495export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib
1496
1497*** ICU Trac
1498
1499- ticket:12985: Unicode 10
1500- ticket:13061: undo hacks from emoji 5.0 update
1501- ticket:13062: add Emoji_Component property
1502- ^/branches/markus/uni10
1503
1504*** CLDR Trac
1505
1506- cldrbug 10055: Unicode 10
1507- cldrbug 9882: Unicode 10 script metadata
1508- cldrbug 10219: numbering systems for Unicode 10
1509
1510*** Unicode version numbers
1511- makedata.mak
1512- uchar.h
1513- com.ibm.icu.util.VersionInfo
1514- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1515
1516- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1517  so that the makefiles see the new version number.
1518
1519*** data files & enums & parser code
1520
1521* download files
1522- mkdir -p $UNICODE_DATA
1523- download Unicode 10.0 files into $UNICODE_DATA
1524  + subfolders: ucd, uca, idna, security
1525  + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
1526- download emoji 5.0 files into $UNICODE_DATA/emoji
1527
1528* for manual diffs: remove version suffixes from the file names
1529  ~$ unidata/desuffixucd.py $UNICODE_DATA
1530  (see https://sites.google.com/site/unicodetools/inputdata)
1531
1532* process and/or copy files
1533- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC
1534  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1535  + For debugging, and tweaking how ppucd.txt is written,
1536    the tool has an --only_ppucd option:
1537    py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile
1538
1539- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA
1540
1541* build ICU (make install)
1542  so that the tools build can pick up the new definitions from the installed header files.
1543
1544  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1545
1546* preparseucd.py changes
1547- remove or add new Unicode scripts from/to the
1548  only-in-ISO-15924 list according to the error messages:
1549    ValueError: remove ['Nshu'] from _scripts_only_in_iso15924
1550  -> adjust _scripts_only_in_iso15924 as indicated
1551- fix other errors
1552    Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo']
1553  -> add vo=Vertical_Orientation to _ignored_properties
1554  -> later removed again, parsing the file, even though we do not yet store data for runtime use
1555
1556* new constants for new property values
1557- preparseucd.py error:
1558    ValueError: missing uchar.h enum constants for some property values:
1559    [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F',
1560                   u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])),
1561     (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla',
1562                  u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra',
1563                  u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])),
1564     (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))]
1565  = PropertyValueAliases.txt new property values (diff old & new .txt files)
1566    blk; CJK_Ext_F                        ; CJK_Unified_Ideographs_Extension_F
1567    blk; Kana_Ext_A                       ; Kana_Extended_A
1568    blk; Masaram_Gondi                    ; Masaram_Gondi
1569    blk; Nushu                            ; Nushu
1570    blk; Soyombo                          ; Soyombo
1571    blk; Syriac_Sup                       ; Syriac_Supplement
1572    blk; Zanabazar_Square                 ; Zanabazar_Square
1573  -> add to uchar.h
1574    use long property names for enum constants,
1575    for the trailing comment get the block start code point: diff old & new Blocks.txt
1576  -> add to UCharacter.UnicodeBlock IDs
1577    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1578            replace  public static final int \1_ID = \2; \3
1579  -> add to UCharacter.UnicodeBlock objects
1580    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1581            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1582
1583    jg ; Malayalam_Bha                    ; Malayalam_Bha
1584    jg ; Malayalam_Ja                     ; Malayalam_Ja
1585    jg ; Malayalam_Lla                    ; Malayalam_Lla
1586    jg ; Malayalam_Llla                   ; Malayalam_Llla
1587    jg ; Malayalam_Nga                    ; Malayalam_Nga
1588    jg ; Malayalam_Nna                    ; Malayalam_Nna
1589    jg ; Malayalam_Nnna                   ; Malayalam_Nnna
1590    jg ; Malayalam_Nya                    ; Malayalam_Nya
1591    jg ; Malayalam_Ra                     ; Malayalam_Ra
1592    jg ; Malayalam_Ssa                    ; Malayalam_Ssa
1593    jg ; Malayalam_Tta                    ; Malayalam_Tta
1594  -> uchar.h & UCharacter.JoiningGroup
1595
1596    sc ; Gonm                             ; Masaram_Gondi
1597    sc ; Nshu                             ; Nushu
1598    sc ; Soyo                             ; Soyombo
1599    sc ; Zanb                             ; Zanabazar_Square
1600  -> uscript.h & com.ibm.icu.lang.UScript
1601  -> Nushu had been added already
1602  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1603      and in com.ibm.icu.dev.test.lang.TestUScript.java
1604
1605* New properties as shown in PropertyValueAliases.txt changes
1606- boolean Emoji_Component from emoji 5
1607  -> uchar.h & UProperty.java
1608- boolean
1609    # Regional_Indicator (RI)
1610
1611    RI ; N                                ; No                               ; F                                ; False
1612    RI ; Y                                ; Yes                              ; T                                ; True
1613  -> uchar.h & UProperty.java
1614  -> single immutable range, to be hardcoded
1615- boolean
1616    # Prepended_Concatenation_Mark (PCM)
1617
1618    PCM; N                                ; No                               ; F                                ; False
1619    PCM; Y                                ; Yes                              ; T                                ; True
1620  -> was new in Unicode 9
1621  -> uchar.h & UProperty.java
1622- enumerated
1623    # Vertical_Orientation (vo)
1624
1625    vo ; R                                ; Rotated
1626    vo ; Tr                               ; Transformed_Rotated
1627    vo ; Tu                               ; Transformed_Upright
1628    vo ; U                                ; Upright
1629  -> only pre-parsed for now, but not yet stored for runtime use
1630
1631* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1632    (not strictly necessary for NOT_ENCODED scripts)
1633  $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt
1634
1635* generate normalization data files
1636  cd $ICU_ROOT/dbg/icu4c
1637  bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource
1638  bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm     -s $ICU4C_UNIDATA/norm2 nfc.txt
1639  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm    -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt
1640  bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1641  bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm   -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt
1642
1643* build ICU (make install)
1644  so that the tools build can pick up the new definitions from the installed header files.
1645
1646  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1647
1648* build Unicode tools using CMake+make
1649
1650$ICU_SRC/tools/unicode/c/icudefs.txt:
1651
1652# Location (--prefix) of where ICU was installed.
1653set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1654# Location of the ICU4C source tree.
1655set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c)
1656
1657  $ICU_ROOT/dbg/tools/unicode/c$
1658    cmake ../../../../src/tools/unicode/c
1659    make
1660
1661* generate core properties data files
1662  $ICU_ROOT/dbg/tools/unicode/c$
1663    genprops/genprops $ICU_SRC/icu4c
1664    genuca/genuca --hanOrder implicit $ICU_SRC/icu4c
1665    genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c
1666- rebuild ICU (make install) & tools
1667
1668* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1669  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1670- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1671- Unicode 6.0..10.0: U+2260, U+226E, U+226F
1672- nothing new in this Unicode version, no test file to update
1673
1674* run & fix ICU4C tests
1675- Andy handles RBBI & spoof check test failures
1676
1677* collation: CLDR collation root, UCA DUCET
1678
1679- UCA DUCET goes into Mark's Unicode tools, see
1680  https://sites.google.com/site/unicodetools/home#TOC-UCA
1681- CLDR root data files are checked into $CLDR_SRC/common/uca/
1682    cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/
1683
1684- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1685    cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt
1686- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1687    cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt
1688    (note removing the underscore before "Rules")
1689    cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt
1690- restore TODO diffs in UCARules.txt
1691    meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt
1692- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1693  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1694  from the CLDR root files (..._CLDR_..._SHORT.txt)
1695    cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1696    cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1697    cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data
1698- if CLDR common/uca/unihan-index.txt changes, then update
1699  CLDR common/collation/root.xml <collation type="private-unihan">
1700  and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt
1701
1702- run genuca, see command line above;
1703  deal with
1704    Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt:
1705    FDD1 11D10;     [70 D5 02, 05, 05]      # Masaram_Gondi first primary (compressible)
1706        (add the character to genuca.cpp sampleCharsToScripts[])
1707  + look up the USCRIPT_ code for the new sample characters
1708    (should be obvious from the comment in the error output)
1709  + *add* mappings to sampleCharsToScripts[], do not replace them
1710    (in case the script sample characters flip-flop)
1711  + insert new scripts in DUCET script order, see the top_byte table
1712    at the beginning of FractionalUCA.txt
1713- rebuild ICU4C
1714
1715* Unihan collators
1716    https://sites.google.com/site/unicodetools/unihan
1717- run Unicode Tools
1718    org.unicode.draft.GenerateUnihanCollators
1719  with VM arguments
1720    -ea
1721    -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk
1722    -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools
1723    -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data
1724    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1725    -DUVERSION=10.0.0
1726- run Unicode Tools
1727    org.unicode.draft.GenerateUnihanCollatorFiles
1728  with the same arguments
1729- check CLDR diffs
1730    cd $CLDR_SRC
1731    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
1732    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
1733- copy to CLDR
1734    cd $CLDR_SRC
1735    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
1736    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
1737- run CLDR unit tests, commit to CLDR
1738- generate ICU zh collation data: run CLDR
1739    org.unicode.cldr.icu.NewLdml2IcuConverter
1740  with program arguments
1741    -t collation
1742    -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation
1743    -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental
1744    -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll
1745    -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation
1746    zh
1747  and VM arguments
1748    -ea
1749    -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10
1750- rebuild ICU4C
1751
1752* run & fix ICU4C tests, now with new CLDR collation root data
1753- run all tests with the collation test data *_SHORT.txt or the full files
1754  (the full ones have comments, useful for debugging)
1755- note on intltest: if collate/UCAConformanceTest fails, then
1756  utility/MultithreadTest/TestCollators will fail as well;
1757  fix the conformance test before looking into the multi-thread test
1758
1759* update Java data files
1760- refresh just the UCD/UCA-related/derived files, just to be safe
1761- see (ICU4C)/source/data/icu4j-readme.txt
1762- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1763- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1764  output:
1765    ...
1766    Unicode .icu files built to ./out/build/icudt60l
1767    echo timestamp > uni-core-data
1768    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1769    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b
1770    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1771    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b
1772    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b"
1773    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/
1774    mkdir -p /tmp/icu4j/main/shared/data
1775    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1776    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/
1777    mkdir -p /tmp/icu4j/main/shared/data
1778    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1779    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data'
1780- copy the big-endian Unicode data files to another location,
1781  separate from the other data files,
1782  and then refresh ICU4J
1783    cd $ICU_ROOT/dbg/icu4c/data/out/icu4j
1784    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1785    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1786    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1787    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1788    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1789    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1790    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1791    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1792    jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1793
1794* When refreshing all of ICU4J data from ICU4C
1795- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1796- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data
1797or
1798- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install
1799
1800* update CollationFCD.java
1801  + copy & paste the initializers of lcccIndex[] etc. from
1802    ICU4C/source/i18n/collationfcd.cpp to
1803    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1804
1805* refresh Java test .txt files
1806- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1807    cd $ICU_SRC/icu4c/source/data/unidata
1808    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1809    cd ../../test/testdata
1810    cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1811    cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1812
1813* run & fix ICU4J tests
1814
1815*** API additions
1816- send notice to icu-design about new born-@stable API (enum constants etc.)
1817
1818*** CLDR numbering systems
1819- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket
1820  Unicode 10: http://unicode.org/cldr/trac/ticket/10219
1821  Unicode 9: http://unicode.org/cldr/trac/ticket/9692
1822
1823*** merge the Unicode update branches back onto the trunk
1824- do not merge the icudata.jar and testdata.jar,
1825  instead rebuild them from merged & tested ICU4C
1826- make sure that changes to Unicode tools are checked in:
1827  http://www.unicode.org/utility/trac/log/trunk/unicodetools
1828
1829---------------------------------------------------------------------------- ***
1830
1831Emoji 5.0 update for ICU 59
1832- ICU 59 mostly remains on Unicode 9.0
1833- except updates bidi and segmentation data to Unicode 10 beta
1834
1835First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg.
1836
1837* Command-line environment setup
1838
1839ICU_ROOT=~/svn.icu/trunk
1840ICU_SRC_DIR=$ICU_ROOT/src
1841ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c
1842ICUDT=icudt59b
1843export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1844SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in
1845UNIDATA=$ICU4C_SRC_DIR/source/data/unidata
1846
1847*** ICU Trac
1848
1849- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released
1850- changes directly on trunk
1851
1852*** data files & enums & parser code
1853
1854* download files
1855
1856- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca)
1857- download emoji 5.0 beta files into the same uni90e50 folder
1858- download Unicode 10.0 beta files: ucd
1859  + copy Unicode 10 bidi files to the uni90e50/ucd folder:
1860    BidiBrackets.txt
1861    BidiCharacterTest.txt
1862    BidiMirroring.txt
1863    BidiTest.txt
1864    extracted/DerivedBidiClass.txt
1865  + copy Unicode 10 segmentation files to the uni90e50/ucd folder:
1866    LineBreak.txt
1867    auxiliary/*
1868
1869* preparseucd.py changes
1870- adjust for combined trunks
1871- write new copyright lines
1872- ignore new Emoji_Component property for now
1873
1874* process and/or copy files
1875- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR
1876  + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1877
1878- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA
1879
1880* build ICU (make install)
1881  so that the tools build can pick up the new definitions from the installed header files.
1882
1883  $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date
1884
1885* build Unicode tools using CMake+make
1886
1887~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt:
1888
1889# Location (--prefix) of where ICU was installed.
1890set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c)
1891# Location of the ICU4C source tree.
1892set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c)
1893
1894  ~/svn.icu/trunk/dbg/tools/unicode/c$
1895    cmake ../../../../src/tools/unicode/c
1896    make
1897
1898* generate core properties data files
1899  ~/svn.icu/trunk/dbg/tools/unicode/c$
1900    genprops/genprops $ICU4C_SRC_DIR
1901- rebuild ICU (make install) & tools
1902
1903* run & fix ICU4C tests
1904- Andy handles RBBI & spoof check test failures
1905
1906* update Java data files
1907- refresh just the UCD/UCA-related/derived files, just to be safe
1908- see (ICU4C)/source/data/icu4j-readme.txt
1909- mkdir /tmp/icu4j
1910- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1911  output:
1912    ...
1913    Unicode .icu files built to ./out/build/icudt59l
1914    echo timestamp > uni-core-data
1915    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1916    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b
1917    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
1918    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b
1919    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b"
1920    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/
1921    mkdir -p /tmp/icu4j/main/shared/data
1922    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1923    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/
1924    mkdir -p /tmp/icu4j/main/shared/data
1925    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1926    make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data'
1927- copy the big-endian Unicode data files to another location,
1928  separate from the other data files,
1929  and then refresh ICU4J
1930    cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j
1931    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1932    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1933    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1934    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1935    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1936    jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1937
1938* When refreshing all of ICU4J data from ICU4C
1939- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1940- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data
1941or
1942- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install
1943
1944* refresh Java test .txt files
1945- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1946    cd $ICU4C_SRC_DIR/source/data/unidata
1947    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1948    cd ../../test/testdata
1949    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1950    cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode
1951
1952* run & fix ICU4J tests
1953
1954---------------------------------------------------------------------------- ***
1955
1956Unicode 9.0 update for ICU 58
1957
1958* Command-line environment setup
1959
1960ICU_ROOT=~/svn.icu/trunk
1961ICU_SRC_DIR=$ICU_ROOT/src
1962ICUDT=icudt58b
1963export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
1964SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
1965UNIDATA=$ICU_SRC_DIR/source/data/unidata
1966
1967http://www.unicode.org/review/pri323/  -- beta review
1968http://www.unicode.org/reports/uax-proposed-updates.html
1969http://www.unicode.org/versions/beta-9.0.0.html
1970http://www.unicode.org/versions/Unicode9.0.0/
1971http://www.unicode.org/reports/tr44/tr44-17.html
1972
1973*** ICU Trac
1974
1975- ticket:12526: integrate Unicode 9
1976- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
1977- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
1978
1979*** CLDR Trac
1980
1981- cldrbug 9414: UCA 9
1982- ^/branches/markus/uni90 at r11518 from trunk at r11517
1983
1984- cldrbug 8745: Unicode 9.0 script metadata
1985
1986*** Unicode version numbers
1987- makedata.mak
1988- uchar.h
1989- com.ibm.icu.util.VersionInfo
1990- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1991
1992- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1993  so that the makefiles see the new version number.
1994
1995*** data files & enums & parser code
1996
1997* file preparation
1998
1999- download UCD & IDNA files
2000- make sure that the Unicode data folder passed into preparseucd.py
2001  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2002- only for manual diffs: remove version suffixes from the file names
2003  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2004  (see https://sites.google.com/site/unicodetools/inputdata)
2005- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2006- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2007- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2008
2009- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
2010  and copy to $UNIDATA
2011    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
2012
2013* preparseucd.py changes
2014- remove or add new Unicode scripts from/to the
2015  only-in-ISO-15924 list according to the error messages:
2016    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
2017    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
2018    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
2019    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
2020  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2021      and in com.ibm.icu.dev.test.lang.TestUScript.java
2022- DerivedNumericValues.txt new numeric values
2023    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
2024    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
2025    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
2026    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
2027    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
2028  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
2029     uchar.c, UCharacterProperty.java
2030     to support a new series of values
2031- adjust preparseucd.py for Tangut algorithmic names
2032  in ppucd.txt:
2033    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
2034  ->
2035    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
2036- avoid block-compressing most String/Miscellaneous property values,
2037  triggered by genprops not coping with a multi-code point Case_Folding on
2038    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
2039  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
2040
2041* PropertyAliases.txt changes
2042- 1 new property PCM=Prepended_Concatenation_Mark
2043  Ignore: Only useful for layout engines.
2044  Ok to list in ppucd.txt.
2045
2046* PropertyValueAliases.txt new property values
2047    blk; Adlam                            ; Adlam
2048    blk; Bhaiksuki                        ; Bhaiksuki
2049    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
2050    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
2051    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
2052    blk; Marchen                          ; Marchen
2053    blk; Mongolian_Sup                    ; Mongolian_Supplement
2054    blk; Newa                             ; Newa
2055    blk; Osage                            ; Osage
2056    blk; Tangut                           ; Tangut
2057    blk; Tangut_Components                ; Tangut_Components
2058  -> add to uchar.h
2059    use long property names for enum constants
2060  -> add to UCharacter.UnicodeBlock IDs
2061    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2062            replace  public static final int \1_ID = \2; \3
2063  -> add to UCharacter.UnicodeBlock objects
2064    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2065            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2066
2067    GCB; EB                               ; E_Base
2068    GCB; EBG                              ; E_Base_GAZ
2069    GCB; EM                               ; E_Modifier
2070    GCB; GAZ                              ; Glue_After_Zwj
2071    GCB; ZWJ                              ; ZWJ
2072  -> uchar.h & UCharacter.GraphemeClusterBreak
2073
2074    jg ; African_Feh                      ; African_Feh
2075    jg ; African_Noon                     ; African_Noon
2076    jg ; African_Qaf                      ; African_Qaf
2077  -> uchar.h & UCharacter.JoiningGroup
2078
2079    lb ; EB                               ; E_Base
2080    lb ; EM                               ; E_Modifier
2081    lb ; ZWJ                              ; ZWJ
2082  -> uchar.h & UCharacter.LineBreak
2083
2084    sc ; Adlm                             ; Adlam
2085    sc ; Bhks                             ; Bhaiksuki
2086    sc ; Marc                             ; Marchen
2087    sc ; Newa                             ; Newa
2088    sc ; Osge                             ; Osage
2089    sc ; Tang                             ; Tangut
2090  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2091
2092    WB ; EB                               ; E_Base
2093    WB ; EBG                              ; E_Base_GAZ
2094    WB ; EM                               ; E_Modifier
2095    WB ; GAZ                              ; Glue_After_Zwj
2096    WB ; ZWJ                              ; ZWJ
2097  -> uchar.h & UCharacter.WordBreak
2098
2099* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2100    (not strictly necessary for NOT_ENCODED scripts)
2101  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2102
2103* generate normalization data files
2104  cd $ICU_ROOT/dbg
2105  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2106  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2107  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2108  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2109  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2110
2111* build ICU (make install)
2112  so that the tools build can pick up the new definitions from the installed header files.
2113
2114  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
2115
2116* build Unicode tools using CMake+make
2117
2118~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2119
2120  # Location (--prefix) of where ICU was installed.
2121  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2122  # Location of the ICU source tree.
2123  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2124
2125  ~/svn.icutools/trunk/dbg/unicode/c$
2126    cmake ../../../src/unicode/c
2127    make
2128
2129* generate core properties data files
2130  ~/svn.icutools/trunk/dbg/unicode/c$
2131    genprops/genprops $ICU_SRC_DIR
2132    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2133    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2134- rebuild ICU (make install) & tools
2135
2136* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2137  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2138- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2139- Unicode 6.0..9.0: U+2260, U+226E, U+226F
2140- nothing new in 9.0, no test file to update
2141
2142* run & fix ICU4C tests
2143- Andy handles RBBI & spoof check test failures
2144
2145* collation: CLDR collation root, UCA DUCET
2146
2147- UCA DUCET goes into Mark's Unicode tools, see
2148  https://sites.google.com/site/unicodetools/home#TOC-UCA
2149- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2150    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
2151
2152- cd (CLDR UCA branch)/common/uca/
2153- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2154    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2155- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2156    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2157    (note removing the underscore before "Rules")
2158    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2159- restore TODO diffs in UCARules.txt
2160    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2161- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2162  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2163  from the CLDR root files (..._CLDR_..._SHORT.txt)
2164    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2165    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2166    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2167- if CLDR common/uca/unihan-index.txt changes, then update
2168  CLDR common/collation/root.xml <collation type="private-unihan">
2169  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2170
2171- run genuca, see command line above;
2172  deal with
2173    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
2174    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
2175        (add the character to genuca.cpp sampleCharsToScripts[])
2176  + look up the USCRIPT_ code for the new sample characters
2177    (should be obvious from the comment in the error output)
2178  + *add* mappings to sampleCharsToScripts[], do not replace them
2179    (in case the script sample characters flip-flop)
2180  + insert new scripts in DUCET script order, see the top_byte table
2181    at the beginning of FractionalUCA.txt
2182- rebuild ICU4C
2183
2184* Unihan collators
2185- run Unicode Tools
2186    org.unicode.draft.GenerateUnihanCollators
2187  with VM arguments
2188    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
2189    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
2190    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
2191    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2192    -DUVERSION=9.0.0
2193    -ea
2194- run Unicode Tools
2195    org.unicode.draft.GenerateUnihanCollatorFiles
2196  with the same arguments
2197- check CLDR diffs
2198    cd ~/svn.cldr/trunk
2199    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
2200    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
2201- copy to CLDR
2202    cd ~/svn.cldr/trunk
2203    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
2204    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
2205- commit to CLDR
2206- generate ICU zh collation data: run CLDR
2207    org.unicode.cldr.icu.NewLdml2IcuConverter
2208  with program arguments
2209    -t collation
2210    -s /home/mscherer/svn.cldr/trunk/common/collation
2211    -m /home/mscherer/svn.cldr/trunk/common/supplemental
2212    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
2213    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
2214    zh
2215  and VM arguments
2216    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
2217- rebuild ICU4C
2218
2219* run & fix ICU4C tests, now with new CLDR collation root data
2220- run all tests with the collation test data *_SHORT.txt or the full files
2221  (the full ones have comments, useful for debugging)
2222- note on intltest: if collate/UCAConformanceTest fails, then
2223  utility/MultithreadTest/TestCollators will fail as well;
2224  fix the conformance test before looking into the multi-thread test
2225
2226* update Java data files
2227- refresh just the UCD/UCA-related/derived files, just to be safe
2228- see (ICU4C)/source/data/icu4j-readme.txt
2229- mkdir /tmp/icu4j
2230- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2231  output:
2232    ...
2233    Unicode .icu files built to ./out/build/icudt58l
2234    echo timestamp > uni-core-data
2235    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2236    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
2237    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2238    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
2239    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
2240    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
2241    mkdir -p /tmp/icu4j/main/shared/data
2242    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2243    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
2244    mkdir -p /tmp/icu4j/main/shared/data
2245    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2246    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2247- copy the big-endian Unicode data files to another location,
2248  separate from the other data files,
2249  and then refresh ICU4J
2250    cd ~/svn.icu/trunk/dbg/data/out/icu4j
2251    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2252    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2253    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2254    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2255    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2256    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2257    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2258    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2259    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2260
2261* When refreshing all of ICU4J data from ICU4C
2262- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2263- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2264or
2265- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2266
2267* update CollationFCD.java
2268  + copy & paste the initializers of lcccIndex[] etc. from
2269    ICU4C/source/i18n/collationfcd.cpp to
2270    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2271
2272* refresh Java test .txt files
2273- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2274    cd $ICU_SRC_DIR/source/data/unidata
2275    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2276    cd ../../test/testdata
2277    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2278    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2279
2280* run & fix ICU4J tests
2281
2282*** LayoutEngine script information
2283
2284* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2285  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2286  in the working directory.
2287
2288  (It also generates ScriptRunData.cpp, which is no longer needed.)
2289
2290  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2291  (a plain text file)
2292  which maps ICU versions to the numbers of script/language constants
2293  that were added then.
2294  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2295
2296  The generated files have a current copyright date and "@deprecated" statement.
2297
2298* Review changes, fix Java tool if necessary, and copy to ICU4C
2299  cd ~/svn.icu4j/trunk/src
2300  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2301  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2302  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2303
2304*** API additions
2305- send notice to icu-design about new born-@stable API (enum constants etc.)
2306
2307*** merge the Unicode update branches back onto the trunk
2308- do not merge the icudata.jar and testdata.jar,
2309  instead rebuild them from merged & tested ICU4C
2310- make sure that changes to Unicode tools & ICU tools are checked in
2311  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2312  http://bugs.icu-project.org/trac/log/tools/trunk
2313
2314---------------------------------------------------------------------------- ***
2315
2316New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
2317
2318Adding
2319- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
2320- new combination/alias codes: Hanb, Jamo
2321  - used in CLDR 29 and in spoof checker
2322- new Z* code: Zsye
2323
2324Add new codes to uscript.h & UScript.java, see Unicode update logs.
2325  -> com.ibm.icu.lang.UScript
2326    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2327    replace  public static final int \1 = \2; \3
2328
2329Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
2330add new script codes.
2331"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
2332
2333Note: If we have to run preparseucd.py again before the Unicode 9 update,
2334then we need to manually keep/restore the new script codes.
2335
2336ICU_ROOT=~/svn.icu/trunk
2337ICU_SRC_DIR=$ICU_ROOT/src
2338ICUDT=icudt57b
2339export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2340SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2341UNIDATA=$ICU_SRC_DIR/source/data/unidata
2342
2343Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
2344see http://bugs.icu-project.org/trac/ticket/12141
2345
2346make install, then icutools cmake & make, then
2347~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2348
2349Generate Java data as usual, only update pnames.icu & uprops.icu.
2350
2351*** LayoutEngine script information
2352
2353* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2354  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2355  in the working directory.
2356
2357  (It also generates ScriptRunData.cpp, which is no longer needed.)
2358
2359  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2360  (a plain text file)
2361  which maps ICU versions to the numbers of script/language constants
2362  that were added then.
2363  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2364
2365  The generated files have a current copyright date and "@deprecated" statement.
2366
2367* Review changes, fix Java tool if necessary, and copy to ICU4C
2368  cd ~/svn.icu4j/trunk/src
2369  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2370  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2371  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2372
2373---------------------------------------------------------------------------- ***
2374
2375Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
2376
2377Edit preparseucd.py to add & parse new properties.
2378They share the UCD property namespace but are not listed in PropertyAliases.txt.
2379
2380Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
2381Initial data from emoji/2.0/
2382
2383ICU_ROOT=~/svn.icu/trunk
2384ICU_SRC_DIR=$ICU_ROOT/src
2385ICUDT=icudt56b
2386export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2387SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2388UNIDATA=$ICU_SRC_DIR/source/data/unidata
2389
2390Add binary-property constants to uchar.h enum UProperty & UProperty.java.
2391
2392~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2393(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
2394
2395Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
2396
2397make install, then icutools cmake & make, then
2398~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
2399
2400Generate Java data as usual, only update pnames.icu & uprops.icu.
2401
2402---------------------------------------------------------------------------- ***
2403
2404Unicode 8.0 update for ICU 56
2405
2406* Command-line environment setup
2407
2408ICU_ROOT=~/svn.icu/trunk
2409ICU_SRC_DIR=$ICU_ROOT/src
2410ICUDT=icudt56b
2411export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2412SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2413UNIDATA=$ICU_SRC_DIR/source/data/unidata
2414
2415http://www.unicode.org/review/pri297/  -- beta review
2416http://www.unicode.org/reports/uax-proposed-updates.html
2417http://unicode.org/versions/beta-8.0.0.html
2418http://www.unicode.org/versions/Unicode8.0.0/
2419http://www.unicode.org/reports/tr44/tr44-15.html
2420
2421*** ICU Trac
2422
2423- ticket:11574: Unicode 8
2424- C++ branches/markus/uni80 at r37351 from trunk at r37343
2425- Java branches/markus/uni80 at r37352 from trunk at r37338
2426
2427*** CLDR Trac
2428
2429- cldrbug 8311: UCA 8
2430- branches/markus/uni80 at r11518 from trunk at r11517
2431
2432- cldrbug 8109: Unicode 8.0 script metadata
2433- cldrbug 8418: Updated segmentation for Unicode 8.0
2434
2435*** Unicode version numbers
2436- makedata.mak
2437- uchar.h
2438- com.ibm.icu.util.VersionInfo
2439- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2440
2441- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2442  so that the makefiles see the new version number.
2443
2444*** data files & enums & parser code
2445
2446* file preparation
2447
2448- download UCD & IDNA files
2449- make sure that the Unicode data folder passed into preparseucd.py
2450  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2451- only for manual diffs: remove version suffixes from the file names
2452  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2453  (see https://sites.google.com/site/unicodetools/inputdata)
2454- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2455- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2456- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2457
2458- also: from http://unicode.org/Public/security/8.0.0/ download new
2459  confusables.txt & confusablesWholeScript.txt
2460  and copy to $UNIDATA
2461    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
2462    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
2463
2464* initial preparseucd.py changes
2465- remove new Unicode scripts from the
2466  only-in-ISO-15924 list according to the error message:
2467    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
2468    from _scripts_only_in_iso15924
2469  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2470      and in com.ibm.icu.dev.test.lang.TestUScript.java
2471- property and file name change:
2472    IndicMatraCategory -> IndicPositionalCategory
2473- UnicodeData.txt unusual numeric values (improper fractions)
2474    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
2475    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
2476    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
2477    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
2478    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
2479    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
2480    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
2481    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
2482    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
2483    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
2484  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
2485     which are listed in DerivedNumericValues.txt;
2486     keeps storage in data file simple
2487
2488* PropertyValueAliases.txt changes
2489- 10 new Block (blk) values:
2490    blk; Ahom                             ; Ahom
2491    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
2492    blk; Cherokee_Sup                     ; Cherokee_Supplement
2493    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
2494    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
2495    blk; Hatran                           ; Hatran
2496    blk; Multani                          ; Multani
2497    blk; Old_Hungarian                    ; Old_Hungarian
2498    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
2499    blk; Sutton_SignWriting               ; Sutton_SignWriting
2500  -> add to uchar.h
2501    use long property names for enum constants
2502  -> add to UCharacter.UnicodeBlock IDs
2503    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2504            replace  public static final int \1_ID = \2; \3
2505  -> add to UCharacter.UnicodeBlock objects
2506    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2507            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2508- 6 new Script (sc) values:
2509    sc ; Ahom                             ; Ahom
2510    sc ; Hatr                             ; Hatran
2511    sc ; Hluw                             ; Anatolian_Hieroglyphs
2512    sc ; Hung                             ; Old_Hungarian
2513    sc ; Mult                             ; Multani
2514    sc ; Sgnw                             ; SignWriting
2515  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
2516
2517* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2518    (not strictly necessary for NOT_ENCODED scripts)
2519  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2520
2521* generate normalization data files
2522  cd $ICU_ROOT/dbg
2523  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2524  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2525  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2526  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2527  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2528
2529* build ICU (make install)
2530  so that the tools build can pick up the new definitions from the installed header files.
2531
2532  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2533
2534* build Unicode tools using CMake+make
2535
2536~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2537
2538  # Location (--prefix) of where ICU was installed.
2539  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
2540  # Location of the ICU source tree.
2541  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
2542
2543  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2544  ~/svn.icutools/trunk/dbg/unicode/c$ make
2545
2546* generate core properties data files
2547- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2548- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
2549- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
2550- rebuild ICU (make install) & tools
2551- run genuca again (see step above) so that it picks up the new nfc.nrm
2552- rebuild ICU (make install) & tools
2553
2554* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2555  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2556- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2557- Unicode 6.0..8.0: U+2260, U+226E, U+226F
2558- nothing new in 8.0, no test file to update
2559
2560* run & fix ICU4C tests
2561- bad Cherokee case folding due to difference in fallbacks:
2562  UCD case folding falls back to no mapping,
2563  ICU runtime case folding falls back to lowercasing;
2564  fixed casepropsbuilder.cpp to generate scf mappings to self
2565  when there is an slc mapping but no scf
2566- Andy handles RBBI & spoof check test failures
2567
2568* collation: CLDR collation root, UCA DUCET
2569
2570- UCA DUCET goes into Mark's Unicode tools, see
2571  https://sites.google.com/site/unicodetools/home#TOC-UCA
2572- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
2573- cd (CLDR UCA branch)/common/uca/
2574- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2575  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
2576- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2577    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
2578    (note removing the underscore before "Rules")
2579    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2580- restore TODO diffs in UCARules.txt
2581    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
2582- update (ICU4C)/source/test/testdata/CollationTest_*.txt
2583  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2584  from the CLDR root files (..._CLDR_..._SHORT.txt)
2585    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
2586    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
2587    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
2588- if CLDR common/uca/unihan-index.txt changes, then update
2589  CLDR common/collation/root.xml <collation type="private-unihan">
2590  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
2591- run genuca, see command line above;
2592  deal with
2593    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
2594        (add the character to genuca.cpp sampleCharsToScripts[])
2595  + look up the script for the new sample characters
2596    (e.g., in FractionalUCA.txt)
2597  + *add* mappings to sampleCharsToScripts[], do not replace them
2598    (in case the script sample characters flip-flop)
2599  + insert new scripts in DUCET script order, see the top_byte table
2600    at the beginning of FractionalUCA.txt
2601- rebuild ICU4C
2602
2603* run & fix ICU4C tests, now with new CLDR collation root data
2604- run all tests with the collation test data *_SHORT.txt or the full files
2605  (the full ones have comments, useful for debugging)
2606- note on intltest: if collate/UCAConformanceTest fails, then
2607  utility/MultithreadTest/TestCollators will fail as well;
2608  fix the conformance test before looking into the multi-thread test
2609- fixed bug in CollationWeights::getWeightRanges()
2610  exposed by new data and CollationTest::TestRootElements
2611
2612* update Java data files
2613- refresh just the UCD/UCA-related/derived files, just to be safe
2614- see (ICU4C)/source/data/icu4j-readme.txt
2615- mkdir /tmp/icu4j
2616- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2617  output:
2618    ...
2619    Unicode .icu files built to ./out/build/icudt56l
2620    echo timestamp > uni-core-data
2621    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2622    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
2623    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
2624    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
2625    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
2626    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
2627    mkdir -p /tmp/icu4j/main/shared/data
2628    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2629    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
2630    mkdir -p /tmp/icu4j/main/shared/data
2631    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2632    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
2633- copy the big-endian Unicode data files to another location,
2634  separate from the other data files,
2635  and then refresh ICU4J
2636    cd ~/svn.icu/trunk/dbg/data/out/icu4j
2637    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2638    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2639    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2640    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2641    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2642    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2643    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2644    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2645    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2646
2647* When refreshing all of ICU4J data from ICU4C
2648- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2649- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2650or
2651- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2652
2653* update CollationFCD.java
2654  + copy & paste the initializers of lcccIndex[] etc. from
2655    ICU4C/source/i18n/collationfcd.cpp to
2656    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2657
2658* refresh Java test .txt files
2659- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2660    cd $ICU_SRC_DIR/source/data/unidata
2661    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2662    cd ../../test/testdata
2663    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2664    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2665
2666* run & fix ICU4J tests
2667
2668*** LayoutEngine script information
2669
2670* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
2671  because the layout engine was deprecated in ICU 54.
2672  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
2673  to write lines that we used to add manually.
2674
2675* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
2676  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
2677  in the working directory.
2678
2679  (It also generates ScriptRunData.cpp, which is no longer needed.)
2680
2681  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
2682  (a plain text file)
2683  which maps ICU versions to the numbers of script/language constants
2684  that were added then.
2685  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
2686
2687  The generated files have a current copyright date and "@deprecated" statement.
2688
2689* Review changes, fix Java tool if necessary, and copy to ICU4C
2690  cd ~/svn.icu4j/trunk/src
2691  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
2692  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
2693  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
2694
2695*** API additions
2696- send notice to icu-design about new born-@stable API (enum constants etc.)
2697
2698*** merge the Unicode update branches back onto the trunk
2699- do not merge the icudata.jar and testdata.jar,
2700  instead rebuild them from merged & tested ICU4C
2701- make sure that changes to Unicode tools & ICU tools are checked in
2702  http://www.unicode.org/utility/trac/log/trunk/unicodetools
2703  http://bugs.icu-project.org/trac/log/tools/trunk
2704
2705---------------------------------------------------------------------------- ***
2706
2707Unicode 7.0 update for ICU 54
2708
2709http://www.unicode.org/review/pri271/  -- beta review
2710http://www.unicode.org/reports/uax-proposed-updates.html
2711http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
2712http://www.unicode.org/reports/tr44/tr44-13.html
2713
2714*** ICU Trac
2715
2716- ticket 10821: Unicode 7.0, UCA 7.0
2717- C++ branches/markus/uni70 at r35584 from trunk at r35580
2718- Java branches/markus/uni70 at r35587 from trunk at r35545
2719
2720*** CLDR Trac
2721
2722- ticket 7195: UCA 7.0 CLDR root collation
2723- branches/markus/uni70 at r10062 from trunk at r10061
2724
2725- ticket 6762: script metadata for Unicode 7.0 new scripts
2726
2727*** Unicode version numbers
2728- makedata.mak
2729- uchar.h
2730- com.ibm.icu.util.VersionInfo
2731- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
2732
2733- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
2734  so that the makefiles see the new version number.
2735
2736*** data files & enums & parser code
2737
2738* file preparation
2739
2740- download UCD & IDNA files
2741- make sure that the Unicode data folder passed into preparseucd.py
2742  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
2743- only for manual diffs: remove version suffixes from the file names
2744  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
2745  (see https://sites.google.com/site/unicodetools/inputdata)
2746- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
2747- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
2748- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
2749- Restore TODO diffs in source/data/unidata/UCARules.txt
2750    cd $ICU_SRC_DIR
2751    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
2752- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
2753
2754- also: from http://unicode.org/Public/security/7.0.0/ download new
2755  confusables.txt & confusablesWholeScript.txt
2756  and copy to $ICU_ROOT/src/source/data/unidata/
2757
2758* initial preparseucd.py changes
2759- remove new Unicode scripts from the
2760  only-in-ISO-15924 list according to the error message:
2761    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
2762                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
2763                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
2764    from _scripts_only_in_iso15924
2765  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
2766      and in com.ibm.icu.dev.test.lang.TestUScript.java
2767- NamesList.txt now has a heading with a non-ASCII character
2768  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
2769  + escape non-ASCII characters in heading comments
2770- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
2771  + get the copyright from the first file whose copyright line contains the current year
2772
2773* PropertyValueAliases.txt changes
2774- 32 new Block (blk) values:
2775    blk; Bassa_Vah                        ; Bassa_Vah
2776    blk; Caucasian_Albanian               ; Caucasian_Albanian
2777    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
2778    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
2779    blk; Duployan                         ; Duployan
2780    blk; Elbasan                          ; Elbasan
2781    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
2782    blk; Grantha                          ; Grantha
2783    blk; Khojki                           ; Khojki
2784    blk; Khudawadi                        ; Khudawadi
2785    blk; Latin_Ext_E                      ; Latin_Extended_E
2786    blk; Linear_A                         ; Linear_A
2787    blk; Mahajani                         ; Mahajani
2788    blk; Manichaean                       ; Manichaean
2789    blk; Mende_Kikakui                    ; Mende_Kikakui
2790    blk; Modi                             ; Modi
2791    blk; Mro                              ; Mro
2792    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
2793    blk; Nabataean                        ; Nabataean
2794    blk; Old_North_Arabian                ; Old_North_Arabian
2795    blk; Old_Permic                       ; Old_Permic
2796    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
2797    blk; Pahawh_Hmong                     ; Pahawh_Hmong
2798    blk; Palmyrene                        ; Palmyrene
2799    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
2800    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
2801    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
2802    blk; Siddham                          ; Siddham
2803    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
2804    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
2805    blk; Tirhuta                          ; Tirhuta
2806    blk; Warang_Citi                      ; Warang_Citi
2807  -> add to uchar.h
2808    use long property names for enum constants
2809  -> add to UCharacter.UnicodeBlock IDs
2810    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
2811            replace  public static final int \1_ID = \2; \3
2812  -> add to UCharacter.UnicodeBlock objects
2813    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
2814            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2815- 28 new Joining_Group (jg) values:
2816    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
2817    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
2818    jg ; Manichaean_Beth                  ; Manichaean_Beth
2819    jg ; Manichaean_Daleth                ; Manichaean_Daleth
2820    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
2821    jg ; Manichaean_Five                  ; Manichaean_Five
2822    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
2823    jg ; Manichaean_Heth                  ; Manichaean_Heth
2824    jg ; Manichaean_Hundred               ; Manichaean_Hundred
2825    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
2826    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
2827    jg ; Manichaean_Mem                   ; Manichaean_Mem
2828    jg ; Manichaean_Nun                   ; Manichaean_Nun
2829    jg ; Manichaean_One                   ; Manichaean_One
2830    jg ; Manichaean_Pe                    ; Manichaean_Pe
2831    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
2832    jg ; Manichaean_Resh                  ; Manichaean_Resh
2833    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
2834    jg ; Manichaean_Samekh                ; Manichaean_Samekh
2835    jg ; Manichaean_Taw                   ; Manichaean_Taw
2836    jg ; Manichaean_Ten                   ; Manichaean_Ten
2837    jg ; Manichaean_Teth                  ; Manichaean_Teth
2838    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
2839    jg ; Manichaean_Twenty                ; Manichaean_Twenty
2840    jg ; Manichaean_Waw                   ; Manichaean_Waw
2841    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
2842    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
2843    jg ; Straight_Waw                     ; Straight_Waw
2844  -> uchar.h & UCharacter.JoiningGroup
2845- 23 new Script (sc) values:
2846    sc ; Aghb                             ; Caucasian_Albanian
2847    sc ; Bass                             ; Bassa_Vah
2848    sc ; Dupl                             ; Duployan
2849    sc ; Elba                             ; Elbasan
2850    sc ; Gran                             ; Grantha
2851    sc ; Hmng                             ; Pahawh_Hmong
2852    sc ; Khoj                             ; Khojki
2853    sc ; Lina                             ; Linear_A
2854    sc ; Mahj                             ; Mahajani
2855    sc ; Mani                             ; Manichaean
2856    sc ; Mend                             ; Mende_Kikakui
2857    sc ; Modi                             ; Modi
2858    sc ; Mroo                             ; Mro
2859    sc ; Narb                             ; Old_North_Arabian
2860    sc ; Nbat                             ; Nabataean
2861    sc ; Palm                             ; Palmyrene
2862    sc ; Pauc                             ; Pau_Cin_Hau
2863    sc ; Perm                             ; Old_Permic
2864    sc ; Phlp                             ; Psalter_Pahlavi
2865    sc ; Sidd                             ; Siddham
2866    sc ; Sind                             ; Khudawadi
2867    sc ; Tirh                             ; Tirhuta
2868    sc ; Wara                             ; Warang_Citi
2869  -> uscript.h (many were added before)
2870    comment "Mende Kikakui" for USCRIPT_MENDE
2871    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
2872  -> com.ibm.icu.lang.UScript
2873    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2874    replace  public static final int \1 = \2; \3
2875- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
2876  (added 2012-11-01)
2877    Ahom        338     Ahom
2878    Hatr        127     Hatran
2879    Mult        323     Multani
2880  (added 2013-10-12)
2881    Modi        324     Modi
2882    Pauc        263     Pau Cin Hau
2883    Sidd        302     Siddham
2884  -> uscript.h (some overlap with additions from Unicode)
2885  -> com.ibm.icu.lang.UScript
2886    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
2887    replace  public static final int \1 = \2; \3
2888  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
2889  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
2890      and in com.ibm.icu.dev.test.lang.TestUScript.java
2891
2892* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
2893    (not strictly necessary for NOT_ENCODED scripts)
2894  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
2895
2896* generate normalization data files
2897- cd $ICU_ROOT/dbg
2898- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
2899- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
2900- UNIDATA=$ICU_SRC_DIR/source/data/unidata
2901- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
2902- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
2903- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
2904- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
2905- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
2906
2907* build ICU (make install)
2908  so that the tools build can pick up the new definitions from the installed header files.
2909
2910~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
2911
2912* build Unicode tools using CMake+make
2913
2914~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
2915
2916# Location (--prefix) of where ICU was installed.
2917set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
2918# Location of the ICU source tree.
2919set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
2920
2921~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
2922~/svn.icutools/trunk/dbg/unicode/c$ make
2923
2924* genprops work
2925- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
2926  + add second array of Joining_Group values for at most 10800..10FFF
2927    icutools: unicode/c/genprops/bidipropsbuilder.cpp
2928    icu: source/common/ubidi_props.h/.c/_data.h
2929    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
2930
2931* generate core properties data files
2932- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
2933- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
2934- rebuild ICU (make install) & tools
2935- run genuca again (see step above) so that it picks up the new nfc.nrm
2936- rebuild ICU (make install) & tools
2937
2938* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2939  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2940- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2941- Unicode 6.0..7.0: U+2260, U+226E, U+226F
2942- nothing new in 7.0, no test file to update
2943
2944* run & fix ICU4C tests
2945
2946* update Java data files
2947- refresh just the UCD-related files, just to be safe
2948- see (ICU4C)/source/data/icu4j-readme.txt
2949- mkdir /tmp/icu4j
2950- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2951  output:
2952    ...
2953    Unicode .icu files built to ./out/build/icudt53l
2954    echo timestamp > uni-core-data
2955    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2956    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
2957    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2958    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
2959    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
2960    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
2961    mkdir -p /tmp/icu4j/main/shared/data
2962    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2963    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
2964    mkdir -p /tmp/icu4j/main/shared/data
2965    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
2966    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
2967- copy the big-endian Unicode data files to another location,
2968  separate from the other data files
2969    ICUDT=icudt54b
2970    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2971    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2972    cd ~/svn.icu/uni70/dbg/data/out/icu4j
2973    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2974    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2975    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
2976    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
2977    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
2978    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
2979- refresh ICU4J
2980    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
2981
2982* update CollationFCD.java
2983  + copy & paste the initializers of lcccIndex[] etc. from
2984    ICU4C/source/i18n/collationfcd.cpp to
2985    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
2986
2987* refresh Java test .txt files
2988- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2989    cd $ICU_SRC_DIR/source/data/unidata
2990    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2991    cd ../../test/testdata
2992    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2993    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
2994
2995* UCA
2996
2997- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
2998- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
2999- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
3000- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
3001- output files are in ~/svn.unitools/Generated/uca/7.0.0/
3002- review data; compare files, use blankweights.sed or similar
3003  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
3004- cd ~/svn.unitools/Generated/uca/7.0.0/
3005- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3006  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
3007- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3008    (note removing the underscore before "Rules")
3009    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
3010- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3011  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3012  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3013    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
3014    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
3015    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
3016- run genuca, see command line above
3017- rebuild ICU4C
3018- refresh ICU4J collation data:
3019  (subset of instructions above for properties data refresh, except copies all coll/*)
3020    ICUDT=icudt54b
3021    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3022    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3023    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
3024    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
3025- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3026- note on intltest: if collate/UCAConformanceTest fails, then
3027  utility/MultithreadTest/TestCollators will fail as well;
3028  fix the conformance test before looking into the multi-thread test
3029- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
3030- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
3031  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
3032
3033* When refreshing all of ICU4J data from ICU4C
3034- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3035- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3036or
3037- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3038
3039* run & fix ICU4J tests
3040
3041*** LayoutEngine script information
3042
3043(For details see the Unicode 5.2 change log below.)
3044
3045* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3046  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3047  in the working directory.
3048  (It also generates ScriptRunData.cpp, which is no longer needed.)
3049
3050  The generated files have a current copyright date and "@stable" statement.
3051  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
3052  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
3053  which may not contain dots any more.
3054
3055- diff current <icu>/source/layout files vs. generated ones
3056    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3057  review and manually merge desired changes;
3058  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
3059  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3060- if you just copy the above files, then
3061  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3062  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3063
3064*** API additions
3065- send notice to icu-design about new born-@stable API (enum constants etc.)
3066
3067*** merge the Unicode update branches back onto the trunk
3068- do not merge the icudata.jar and testdata.jar,
3069  instead rebuild them from merged & tested ICU4C
3070
3071---------------------------------------------------------------------------- ***
3072
3073Unicode 6.3 update
3074
3075http://www.unicode.org/review/pri249/  -- beta review
3076http://www.unicode.org/reports/uax-proposed-updates.html
3077http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
3078http://www.unicode.org/reports/tr44/tr44-11.html
3079
3080*** ICU Trac
3081
3082- ticket 10128: update ICU to Unicode 6.3 beta
3083- ticket 10168: update ICU to Unicode 6.3 final
3084- C++ branches/markus/uni63 at r33552 from trunk at r33551
3085- Java branches/markus/uni63 at r33550 from trunk at r33553
3086
3087- ticket 10142: implement Unicode 6.3 bidi algorithm additions
3088
3089*** Unicode version numbers
3090- makedata.mak
3091- uchar.h
3092  (configure.in & configure: have been modified to extract the version from uchar.h)
3093- com.ibm.icu.util.VersionInfo
3094- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3095
3096- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
3097  so that the makefiles see the new version number.
3098
3099*** data files & enums & parser code
3100
3101* file preparation
3102
3103- download UCD, UCA & IDNA files
3104- make sure that the Unicode data folder passed into preparseucd.py
3105  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3106- modify preparseucd.py:
3107  parse new file BidiBrackets.txt
3108  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
3109- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
3110- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3111- Check test file diffs for previously commented-out, known-failing data lines;
3112  probably need to keep those commented out.
3113
3114* PropertyAliases.txt changes
3115- 1 new Enumerated Property
3116  bpt                      ; Bidi_Paired_Bracket_Type
3117  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
3118  -> ubidi_props.h & .c & UBiDiProps.java
3119  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
3120  -> uprops.cpp
3121  -> change ubidi.icu format version from 2.0 to 2.1
3122- 1 new Miscellaneous Property
3123  bpb                      ; Bidi_Paired_Bracket
3124  -> uchar.h & UProperty.java
3125  -> ppucd.h & .cpp
3126
3127* PropertyValueAliases.txt changes
3128- 3 Bidi_Paired_Bracket_Type (bpt) values:
3129  bpt; c                                ; Close
3130  bpt; n                                ; None
3131  bpt; o                                ; Open
3132  -> uchar.h & UCharacter.BidiPairedBracketType
3133  -> ubidi_props.h & .c & UBiDiProps.java
3134  -> change ubidi.icu format version from 2.0 to 2.1
3135- 4 new Bidi_Class (bc) values:
3136  bc ; FSI                              ; First_Strong_Isolate
3137  bc ; LRI                              ; Left_To_Right_Isolate
3138  bc ; RLI                              ; Right_To_Left_Isolate
3139  bc ; PDI                              ; Pop_Directional_Isolate
3140  -> uchar.h & UCharacterEnums.ECharacterDirection
3141  -> until the bidi code gets updated,
3142     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
3143- 3 new Word_Break (WB) values:
3144  WB ; HL                               ; Hebrew_Letter
3145  WB ; SQ                               ; Single_Quote
3146  WB ; DQ                               ; Double_Quote
3147  -> uchar.h & UCharacter.WordBreak
3148  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
3149- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3150  (added 2012-10-16)
3151  Aghb  239     Caucasian Albanian
3152  Mahj  314     Mahajani
3153  -> uscript.h
3154  -> com.ibm.icu.lang.UScript
3155    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3156    replace  public static final int \1 = \2;\3
3157  -> preparseucd.py _scripts_only_in_iso15924
3158  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3159      and in com.ibm.icu.dev.test.lang.TestUScript.java
3160  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
3161     (not strictly necessary for NOT_ENCODED scripts)
3162
3163* generate normalization data files
3164- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
3165- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
3166- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
3167- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3168- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3169- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3170- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3171
3172* build ICU (make install)
3173  so that the tools build can pick up the new definitions from the installed header files.
3174
3175~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
3176
3177* build Unicode tools using CMake+make
3178
3179~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
3180
3181# Location (--prefix) of where ICU was installed.
3182set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
3183# Location of the ICU source tree.
3184set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
3185
3186~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
3187~/svn.icutools/trunk/dbg/unicode/c$ make
3188
3189* generate core properties data files
3190- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
3191- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
3192- rebuild ICU (make install) & tools
3193- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3194- rebuild ICU (make install) & tools
3195
3196* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3197  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3198- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3199- Unicode 6.0..6.3: U+2260, U+226E, U+226F
3200- nothing new in 6.3, no test file to update
3201
3202* update Java data files
3203- refresh just the UCD-related files, just to be safe
3204- see (ICU4C)/source/data/icu4j-readme.txt
3205- mkdir /tmp/icu4j
3206- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3207  output:
3208    ...
3209    Unicode .icu files built to ./out/build/icudt52l
3210    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3211    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
3212    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3213    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
3214    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
3215    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
3216    mkdir -p /tmp/icu4j/main/shared/data
3217    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3218    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
3219    mkdir -p /tmp/icu4j/main/shared/data
3220    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3221    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
3222- copy the big-endian Unicode data files to another location,
3223  separate from the other data files
3224    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3225    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3226    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3227    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
3228    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
3229    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3230    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
3231- refresh ICU4J
3232    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3233
3234* refresh Java test .txt files
3235- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3236
3237* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
3238
3239- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3240- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3241- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3242- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3243  (note removing the underscore before "Rules")
3244- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3245  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3246  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3247- check test file diffs for previously commented-out, known-failing data lines;
3248  probably need to keep those commented out
3249- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3250- run genuca, see command line above
3251- rebuild ICU4C
3252- refresh ICU4J collation data:
3253  (subset of instructions above for properties data refresh, except copies all coll/*)
3254    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3255    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3256    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
3257    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
3258- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3259- note on intltest: if collate/UCAConformanceTest fails, then
3260  utility/MultithreadTest/TestCollators will fail as well;
3261  fix the conformance test before looking into the multi-thread test
3262
3263* test ICU, fix test code where necessary
3264
3265* When refreshing all of ICU4J data from ICU4C
3266- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3267- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3268or
3269- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3270
3271*** LayoutEngine script information
3272- skipped for Unicode 6.3: no new scripts
3273
3274*** merge the Unicode update branches back onto the trunk
3275- do not merge the icudata.jar and testdata.jar,
3276  instead rebuild them from merged & tested ICU4C
3277
3278---------------------------------------------------------------------------- ***
3279
3280Unicode 6.2 update
3281
3282http://www.unicode.org/review/pri230/
3283http://www.unicode.org/versions/beta-6.2.0.html
3284http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
3285http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
3286http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
3287http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
3288http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
3289http://unicode.org/Public/idna/6.2.0/
3290
3291*** ICU Trac
3292
3293- ticket 9515: Unicode 6.2: final ICU update
3294
3295- ticket 9514: UCA 6.2: fix UCARules.txt
3296
3297- ticket 9437: update ICU to Unicode 6.2
3298- C++ branches/markus/uni62 at r32050 from trunk at r32041
3299- Java branches/markus/uni62 at r32068 from trunk at r32066
3300
3301*** Unicode version numbers
3302- makedata.mak
3303- uchar.h
3304  (configure.in & configure: have been modified to extract the version from uchar.h)
3305- com.ibm.icu.util.VersionInfo
3306- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
3307
3308*** data files & enums & parser code
3309
3310* file preparation
3311
3312- download UCD, UCA & IDNA files
3313- make sure that the Unicode data folder passed into preparseucd.py
3314  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
3315- modify preparseucd.py: NamesList.txt is now in UTF-8
3316- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
3317- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3318- Check test file diffs for previously commented-out, known-failing data lines;
3319  probably need to keep those commented out.
3320
3321* PropertyValueAliases.txt changes
3322- 1 new Line_Break (lb) value:
3323  lb ; RI                               ; Regional_Indicator
3324  -> uchar.h & UCharacter.LineBreak
3325- 1 new Word_Break (WB) value:
3326  WB ; RI                               ; Regional_Indicator
3327  -> uchar.h & UCharacter.WordBreak
3328- 1 new Grapheme_Cluster_Break (GCB) value:
3329  GCB; RI                               ; Regional_Indicator
3330  -> uchar.h & UCharacter.GraphemeClusterBreak
3331
3332* 3 new numeric values
3333  The new value -1, which was really supposed to be NaN but that would have required
3334  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
3335  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
3336    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
3337    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
3338  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
3339    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
3340    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
3341  -> uprops.h, uchar.c & UCharacterProperty.java
3342  -> cucdtst.c & UCharacterTest.java
3343
3344* generate normalization data files
3345- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
3346- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
3347- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
3348- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3349- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3350- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3351- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3352
3353* build ICU (make install)
3354  so that the tools build can pick up the new definitions from the installed header files.
3355* build Unicode tools using CMake+make
3356
3357* generate core properties data files
3358- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
3359- in initial bootstrapping, change the UCA version
3360  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3361- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
3362- rebuild ICU (make install) & tools
3363  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3364    check if the UCA version in FractionalUCA.txt matches the new Unicode version
3365    (see step above)
3366- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
3367- rebuild ICU (make install) & tools
3368
3369* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3370  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3371- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3372- Unicode 6.0..6.2: U+2260, U+226E, U+226F
3373- nothing new in 6.2, no test file to update
3374
3375* update Java data files
3376- refresh just the UCD-related files, just to be safe
3377- see (ICU4C)/source/data/icu4j-readme.txt
3378- mkdir /tmp/icu4j
3379- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3380  output:
3381    ...
3382    Unicode .icu files built to ./out/build/icudt50l
3383    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3384    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
3385    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3386    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
3387    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
3388    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
3389    mkdir -p /tmp/icu4j/main/shared/data
3390    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3391    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
3392    mkdir -p /tmp/icu4j/main/shared/data
3393    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3394    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
3395- copy the big-endian Unicode data files to another location,
3396  separate from the other data files
3397    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3398    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3399    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3400    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
3401    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
3402    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3403    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
3404- refresh ICU4J
3405    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3406
3407* refresh Java test .txt files
3408- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3409
3410* UCA
3411
3412- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
3413- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
3414- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3415- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3416  (note removing the underscore before "Rules")
3417- update (ICU4C)/source/test/testdata/CollationTest_*.txt
3418  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3419  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3420- check test file diffs for previously commented-out, known-failing data lines;
3421  probably need to keep those commented out
3422- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3423- run genuca, see command line above
3424- rebuild ICU4C
3425- refresh ICU4J collation data:
3426  (subset of instructions above for properties data refresh, except copies all coll/*)
3427    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3428    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3429    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
3430    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
3431- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3432- note on intltest: if collate/UCAConformanceTest fails, then
3433  utility/MultithreadTest/TestCollators will fail as well;
3434  fix the conformance test before looking into the multi-thread test
3435
3436* test ICU, fix test code where necessary
3437
3438* When refreshing all of ICU4J data from ICU4C
3439- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3440- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3441or
3442- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3443
3444*** LayoutEngine script information
3445- skipped for Unicode 6.2: no new scripts
3446
3447*** merge the Unicode update branches back onto the trunk
3448- do not merge the icudata.jar and testdata.jar,
3449  instead rebuild them from merged & tested ICU4C
3450
3451---------------------------------------------------------------------------- ***
3452
3453Future Unicode update
3454
3455Tools simplified since the Unicode 6.1 update. See
3456- http://site.icu-project.org/design/props/ppucd
3457- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
3458
3459* Unicode version numbers
3460- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
3461
3462* file preparation
3463- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
3464- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
3465- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
3466- Check test file diffs for previously commented-out, known-failing data lines;
3467  probably need to keep those commented out.
3468
3469* PropertyValueAliases.txt changes
3470- Script codes that are in ISO 15924 but not in Unicode are now listed in
3471  preparseucd.py, in the _scripts_only_in_iso15924 variable.
3472  If there are new ISO codes, then add them.
3473  If Unicode adds some of them, then remove them from the .py variable.
3474
3475* UnicodeData.txt changes
3476- No more manual changes for CJK ranges for algorithmic names;
3477  those are now written to ppucd.txt and genprops reads them from there.
3478
3479* generate core properties data files (makeprops.sh was deleted)
3480- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
3481
3482* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
3483- it is now generated by preparseucd.py
3484
3485* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
3486- it is now generated by preparseucd.py
3487- make sure that the Unicode data folder passed into preparseucd.py
3488  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3489  (can be in some subfolder)
3490
3491* generate normalization data files
3492- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
3493- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
3494- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
3495- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
3496- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
3497- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
3498- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
3499
3500* build ICU (make install)
3501* build Unicode tools using CMake+make
3502
3503* new way to call genuca (makeuca.sh was deleted)
3504- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
3505
3506---------------------------------------------------------------------------- ***
3507
3508Unicode 6.1 update
3509
3510*** ICU Trac
3511
3512- ticket 8995 final update to Unicode 6.1
3513- ticket 8994 regenerate source/layout/CanonData.cpp
3514
3515- ticket 8961 support Unicode "Age" value *names*
3516- ticket 8963 support multiple character name aliases & types
3517
3518- ticket 8827 "update ICU to Unicode 6.1"
3519- C++ branches/markus/uni61 at r30864 from trunk at r30843
3520- Java branches/markus/uni61 at r30865 from trunk at r30863
3521
3522*** Unicode version numbers
3523- makedata.mak
3524- uchar.h
3525  (configure.in & configure: have been modified to extract the version from uchar.h)
3526- com.ibm.icu.util.VersionInfo
3527- icutools/unicode/makedefs.sh
3528  + also review & update other definitions in that file,
3529    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
3530
3531*** data files & enums & parser code
3532
3533* file preparation
3534
3535~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
3536- This prepares both unidata and testdata files in respective output subfolders.
3537- Check test file diffs for previously commented-out, known-failing data lines;
3538  probably need to keep those commented out.
3539
3540* PropertyValueAliases.txt changes
3541- 11 new block names:
3542  Arabic_Extended_A
3543  Arabic_Mathematical_Alphabetic_Symbols
3544  Chakma
3545  Meetei_Mayek_Extensions
3546  Meroitic_Cursive
3547  Meroitic_Hieroglyphs
3548  Miao
3549  Sharada
3550  Sora_Sompeng
3551  Sundanese_Supplement
3552  Takri
3553  -> add to uchar.h
3554  -> add to UCharacter.UnicodeBlock IDs
3555    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
3556            replace  public static final int \1_ID = \2; \3
3557  -> add to UCharacter.UnicodeBlock objects
3558    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3559            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3560- 1 new Joining_Group (jg) value:
3561  Rohingya_Yeh
3562  -> uchar.h & UCharacter.JoiningGroup
3563- 2 new Line_Break (lb) values:
3564  CJ=Conditional_Japanese_Starter
3565  HL=Hebrew_Letter
3566  -> uchar.h & UCharacter.LineBreak
3567- 7 new scripts:
3568  sc ; Cakm      ; Chakma
3569  sc ; Merc      ; Meroitic_Cursive
3570  sc ; Mero      ; Meroitic_Hieroglyphs
3571  sc ; Plrd      ; Miao
3572  sc ; Shrd      ; Sharada
3573  sc ; Sora      ; Sora_Sompeng
3574  sc ; Takr      ; Takri
3575  -> remove these from SyntheticPropertyValueAliases.txt
3576  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3577      and in com.ibm.icu.dev.test.lang.TestUScript.java
3578- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3579  (added 2011-06-21)
3580  Khoj        322     Khojki
3581  Tirh        326     Tirhuta
3582    and another one added 2011-12-09
3583  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
3584  -> uscript.h
3585  -> com.ibm.icu.lang.UScript
3586    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3587    replace  public static final int \1 = \2;\3
3588  -> SyntheticPropertyValueAliases.txt
3589  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3590      and in com.ibm.icu.dev.test.lang.TestUScript.java
3591
3592* UnicodeData.txt changes
3593- the last Unihan code point changes from U+9FCB to U+9FCC
3594  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
3595  + do change gennames.c
3596  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
3597
3598* DerivedBidiClass.txt changes
3599- 2 new default-AL blocks:
3600#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
3601#     Arabic Mathematical Alphabetic Symbols:
3602#                       U+1EE00  - U+1EEFF  (was default-R)
3603- 2 new default-R blocks:
3604#     Meroitic Hieroglyphs:
3605#                        U+10980 - U+1099F
3606#     Meroitic Cursive:  U+109A0 - U+109FF
3607  -> should be picked up by the explicit data in the file
3608
3609* NameAliases.txt changes
3610- from
3611    # Each line has two fields
3612    # First field: Code point
3613    # Second field: Alias
3614- to
3615    # Each line has three fields, as described here:
3616    #
3617    # First field:  Code point
3618    # Second field: Alias
3619    # Third field:  Type
3620- Also, the file previously allowed multiple aliases but only now does it
3621  actually provide multiple, even multiple of the same type. For example,
3622    FEFF;BYTE ORDER MARK;alternate
3623    FEFF;BOM;abbreviation
3624    FEFF;ZWNBSP;abbreviation
3625- This breaks our gennames parser, unames.icu data structure, and API.
3626  Fix gennames to only pick up "correction" aliases.
3627  New ticket #8963 for further changes.
3628
3629* run genpname/preparse.pl (on Linux)
3630  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3631  + make sure that data.h is writable
3632  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3633  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3634
3635* build ICU (make install)
3636  so that the tools build can pick up the new definitions from the installed header files.
3637* build Unicode tools (at least genpname) using CMake+make
3638
3639* run genpname
3640  (builds both pnames.icu and propname_data.h)
3641- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3642- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3643
3644* build ICU (make install)
3645* build Unicode tools using CMake+make
3646
3647* update source/data/unidata/norm2/nfkc_cf.txt
3648- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3649
3650* update source/data/unidata/norm2/uts46.txt
3651- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
3652  to ~/svn.icu/tools/trunk/src/unicode/py
3653- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
3654- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3655- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3656
3657* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3658  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3659- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3660- Unicode 6.0..6.1: U+2260, U+226E, U+226F
3661- nothing new in 6.1, no test file to update
3662
3663* generate core properties data files
3664- in initial bootstrapping, change the UCA version
3665  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
3666- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3667- rebuild ICU & tools
3668  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
3669    check if the UCA version in FractionalUCA.txt matches the new Unicode version
3670    (see step above)
3671- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
3672  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3673- rebuild ICU & tools
3674
3675* update Java data files
3676- refresh just the UCD-related files, just to be safe
3677- see (ICU4C)/source/data/icu4j-readme.txt
3678- mkdir /tmp/icu4j
3679- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3680  output:
3681    ...
3682    Unicode .icu files built to ./out/build/icudt49l
3683    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3684    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
3685    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3686    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
3687    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
3688    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
3689    mkdir -p /tmp/icu4j/main/shared/data
3690    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3691    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
3692    mkdir -p /tmp/icu4j/main/shared/data
3693    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
3694    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
3695- copy the big-endian Unicode data files to another location,
3696  separate from the other data files
3697    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3698    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3699    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3700    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
3701    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
3702    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3703    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
3704- refresh ICU4J
3705    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3706
3707* refresh Java test .txt files
3708- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
3709
3710* test ICU so far, fix test code where necessary
3711- temporarily ignore collation issues that look like UCA/UCD mismatches,
3712  until UCA data is updated
3713
3714* UCA
3715
3716- get output from Mark's tools; look in
3717    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
3718- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
3719- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
3720  (note removing the underscore before "Rules")
3721- update (ICU)/source/test/testdata/CollationTest_*.txt
3722  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
3723  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
3724- check test file diffs for previously commented-out, known-failing data lines;
3725  probably need to keep those commented out
3726- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
3727- run makeuca.sh:
3728  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3729- rebuild ICU4C
3730- refresh ICU4J collation data:
3731  (subset of instructions above for properties data refresh, except copies all coll/*)
3732    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3733    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3734    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
3735    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
3736- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
3737- note on intltest: if collate/UCAConformanceTest fails, then
3738  utility/MultithreadTest/TestCollators will fail as well;
3739  fix the conformance test before looking into the multi-thread test
3740
3741* When refreshing all of ICU4J data from ICU4C
3742- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3743- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
3744or
3745- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
3746
3747*** LayoutEngine script information
3748
3749(For details see the Unicode 5.2 change log below.)
3750
3751* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
3752  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
3753  in the working directory.
3754  (It also generates ScriptRunData.cpp, which is no longer needed.)
3755
3756  The generated files have a current copyright date and "@draft" statement.
3757
3758- diff current <icu>/source/layout files vs. generated ones
3759    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
3760  review and manually merge desired changes;
3761  fix gratuitous changes, incorrect @draft and missing aliases;
3762  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
3763- if you just copy the above files, then
3764  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
3765  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
3766
3767*** merge the Unicode update branches back onto the trunk
3768- do not merge the icudata.jar and testdata.jar,
3769  instead rebuild them from merged & tested ICU4C
3770
3771---------------------------------------------------------------------------- ***
3772
3773ICU 4.8 (no Unicode update, just new script codes)
3774
3775* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3776  (added 2010-12-21)
3777    Afak    439     Afaka
3778    Jurc    510     Jurchen
3779    Mroo    199     Mro, Mru
3780    Nshu    499     Nüshu
3781    Shrd    319     Sharada, Śāradā
3782    Sora    398     Sora Sompeng
3783    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
3784    Tang    520     Tangut
3785    Wole    480     Woleai
3786  -> uscript.h
3787  -> com.ibm.icu.lang.UScript
3788    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3789    replace  public static final int \1 = \2;\3
3790  -> genpname/SyntheticPropertyValueAliases.txt
3791  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3792      and in com.ibm.icu.dev.test.lang.TestUScript.java
3793
3794* run genpname/preparse.pl (on Linux)
3795  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3796  + make sure that data.h is writable
3797  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3798  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3799
3800* rebuild Unicode tools (at least genpname) using make
3801- You might first need to "make install" ICU so that the tools build can pick
3802  up the new definitions from the installed header files.
3803
3804* run genpname
3805  (builds both pnames.icu and propname_data.h)
3806- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3807- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
3808- rebuild ICU & tools
3809
3810* run genprops
3811- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3812- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
3813- rebuild ICU & tools
3814
3815* update Java data files
3816- refresh just the UCD-related files, just to be safe
3817- see (ICU4C)/source/data/icu4j-readme.txt
3818- mkdir /tmp/icu4j
3819- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3820- copy the big-endian Unicode data files to another location,
3821  separate from the other data files
3822    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3823    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3824    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
3825- refresh ICU4J
3826    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
3827
3828* should have updated the layout engine script codes but forgot
3829
3830---------------------------------------------------------------------------- ***
3831
3832Unicode 6.0 update
3833
3834*** related ICU Trac tickets
3835
38367264 Unicode 6.0 Update
3837
3838*** Unicode version numbers
3839- makedata.mak
3840- uchar.h
3841  (configure.in & configure: have been modified to extract the version from uchar.h)
3842- com.ibm.icu.util.VersionInfo
3843
3844*** data files & enums & parser code
3845
3846* file preparation
3847
3848~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
3849- This now prepares both unidata and testdata files in respective output subfolders.
3850
3851* PropertyAliases.txt changes
3852- new Script_Extensions property defined in the new ScriptExtensions.txt file
3853  but not listed in PropertyAliases.txt; reported to unicode.org;
3854  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
3855    scx; Script_Extensions
3856  -> uchar.h with new UProperty section
3857  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
3858
3859* PropertyValueAliases.txt changes
3860- 12 new block names:
3861  Alchemical_Symbols
3862  Bamum_Supplement
3863  Batak
3864  Brahmi
3865  CJK_Unified_Ideographs_Extension_D
3866  Emoticons
3867  Ethiopic_Extended_A
3868  Kana_Supplement
3869  Mandaic
3870  Miscellaneous_Symbols_And_Pictographs
3871  Playing_Cards
3872  Transport_And_Map_Symbols
3873  -> add to uchar.h
3874  -> add to UCharacter.UnicodeBlock
3875    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
3876            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
3877- Joining_Group (jg) values:
3878  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
3879  -> uchar.h & UCharacter.JoiningGroup
3880- 3 new scripts:
3881  sc ; Batk      ; Batak
3882  sc ; Brah      ; Brahmi
3883  sc ; Mand      ; Mandaic
3884  -> remove these from SyntheticPropertyValueAliases.txt
3885  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
3886  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
3887      and in com.ibm.icu.dev.test.lang.TestUScript.java
3888- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
3889  (added 2009-11-11..2010-07-18)
3890  Bass        259     Bassa Vah
3891  Dupl        755     Duployan shortand
3892  Elba        226     Elbasan
3893  Gran        343     Grantha
3894  Kpel        436     Kpelle
3895  Loma        437     Loma
3896  Mend        438     Mende
3897  Merc        101     Meroitic Cursive
3898  Narb        106     Old North Arabian
3899  Nbat        159     Nabataean
3900  Palm        126     Palmyrene
3901  Sind        318     Sindhi
3902  Wara        262     Warang Citi
3903  -> uscript.h
3904  -> com.ibm.icu.lang.UScript
3905    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
3906    replace  public static final int \1 = \2;\3
3907  -> SyntheticPropertyValueAliases.txt
3908  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
3909      and in com.ibm.icu.dev.test.lang.TestUScript.java
3910- ISO 15924 name change
3911  Mero        100     Meroitic Hieroglyphs (was Meroitic)
3912  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
3913- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
3914
3915* UnicodeData.txt changes
3916- new CJK block:
3917  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
3918  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
3919  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
3920
3921* build Unicode tools using CMake+make
3922
3923* run genpname/preparse.pl (on Linux)
3924  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
3925  + make sure that data.h is writable
3926  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
3927  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
3928
3929* rebuild Unicode tools (at least genpname) using make
3930- You might first need to "make install" ICU so that the tools build can pick
3931  up the new definitions from the installed header files.
3932
3933* run genpname
3934- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
3935- rebuild ICU & tools
3936
3937* update source/data/unidata/norm2/nfkc_cf.txt
3938- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
3939
3940* update source/data/unidata/norm2/uts46.txt
3941- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
3942  to ~/svn.icu/tools/trunk/src/unicode/py
3943- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
3944- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
3945- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
3946
3947* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
3948  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
3949- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
3950- Unicode 6.0: U+2260, U+226E, U+226F
3951
3952* generate core properties data files
3953- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3954- rebuild ICU & tools
3955- run makeuca.sh so that genuca picks up the new nfc.nrm:
3956  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
3957- rebuild ICU & tools
3958
3959* implement new Script_Extensions property (provisional)
3960- parser & generator: genprops & uprops.icu
3961- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
3962- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
3963
3964* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
3965- (one-time change)
3966- genbidi/gencase/genprops tools changes
3967- re-run makeprops.sh (see above)
3968- UCharacterProperty.java, UCharacterTypeIterator.java,
3969  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
3970  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
3971
3972* update Java data files
3973- refresh just the UCD-related files, just to be safe
3974- see (ICU4C)/source/data/icu4j-readme.txt
3975- mkdir /tmp/icu4j
3976- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
3977  output:
3978    ...
3979    Unicode .icu files built to ./out/build/icudt45l
3980    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3981    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
3982    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
3983    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
3984    mkdir -p /tmp/icu4j/main/shared/data
3985    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
3986- copy the big-endian Unicode data files to another location,
3987  separate from the other data files
3988    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3989    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3990    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3991    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
3992    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
3993    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
3994    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
3995- refresh ICU4J
3996    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
3997
3998* refresh Java test .txt files
3999- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
4000
4001* un-hardcode normalization skippable (NF*_Inert) test data
4002- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
4003
4004* copy updated break iterator test files
4005- now handled by early ucdcopy.py and
4006  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
4007  (old instructions:
4008   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
4009   to ~/svn.icu/trunk/src/source/test/testdata)
4010- they are not used in ICU4J
4011
4012* UCA
4013
4014- get output from Mark's tools; look in
4015    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
4016    http://www.macchiato.com/unicode/utc/additional-uca-files
4017    http://www.unicode.org/Public/UCA/6.0.0/
4018    http://www.unicode.org/~mdavis/uca/
4019- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
4020- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
4021- update Han-implicit ranges for new CJK extensions:
4022  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
4023- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
4024  do not add it into invuca so that tailoring primary-after an ignorable works
4025- genuca: permit space between [variable top] bytes
4026- ucol.cpp: treat noncharacters like unassigned rather than ignorable
4027- run makeuca.sh:
4028  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
4029- rebuild ICU4C
4030- refresh ICU4J collation data:
4031  (subset of instructions above for properties data refresh, except copies all coll/*)
4032    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4033    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4034    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
4035    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
4036- update (ICU)/source/test/testdata/CollationTest_*.txt
4037  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
4038  with output from Mark's Unicode tools
4039- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4040- note on intltest: if collate/UCAConformanceTest fails, then
4041  utility/MultithreadTest/TestCollators will fail as well;
4042  fix the conformance test before looking into the multi-thread test
4043
4044* When refreshing all of ICU4J data from ICU4C
4045- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
4046- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
4047or
4048- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
4049
4050*** LayoutEngine script information
4051
4052(For details see the Unicode 5.2 change log below.)
4053
4054* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4055ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4056ScriptRunData.cpp, which is no longer needed.)
4057
4058The generated files have a current copyright date and "@draft" statement.
4059
4060* copy the above files into <icu>/source/layout, replacing the old files.
4061* fix mixed line endings
4062* review the diffs and fix incorrect @draft and missing aliases;
4063  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
4064* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4065
4066---------------------------------------------------------------------------- ***
4067
4068Unicode 5.2 update
4069
4070*** related ICU Trac tickets
4071
40727084 Unicode 5.2
4073
40747167 verify collation bytes
40757235 Java test NAME_ALIAS
40767236 Java DerivedCoreProperties.txt test
40777237 Java BidiTest.txt
40787238 UTrie2 in core unidata
40797239 test for tailoring gaps
40807240 Java fix CollationMiscTest
40817243 update layout engine for Unicode 5.2
4082
4083*** Unicode version numbers
4084- makedata.mak
4085- uchar.h
4086- configure.in & configure
4087- update ucdVersion in gennames.c if an algorithmic range changes
4088
4089*** data files & enums & parser code
4090
4091* file preparation
4092
4093python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
4094- includes finding files regardless of version numbers,
4095  copying them, and performing the equivalent processing of the
4096  ucdstrip and ucdmerge tools on the desired set of files
4097
4098* notes on changes
4099- PropertyAliases.txt
4100  moved from numeric to enumerated:
4101    ccc       ; Canonical_Combining_Class
4102  new string properties:
4103    NFKC_CF   ; NFKC_Casefold
4104    Name_Alias; Name_Alias
4105  new binary properties:
4106    Cased     ; Cased
4107    CI        ; Case_Ignorable
4108    CWCF      ; Changes_When_Casefolded
4109    CWCM      ; Changes_When_Casemapped
4110    CWKCF     ; Changes_When_NFKC_Casefolded
4111    CWL       ; Changes_When_Lowercased
4112    CWT       ; Changes_When_Titlecased
4113    CWU       ; Changes_When_Uppercased
4114  new CJK Unihan properties (not supported by ICU)
4115- PropertyValueAliases.txt
4116  new block names
4117  new scripts
4118  one script code change:
4119    sc ; Qaai      ; Inherited
4120    ->
4121    sc ; Zinh      ; Inherited                        ; Qaai
4122  new Line_Break (lb) value:
4123    lb ; CP        ; Close_Parenthesis
4124  new Joining_Group (jg) values: Farsi_Yeh, Nya
4125  other new values:
4126    ccc; 214; ATA  ; Attached_Above
4127- DerivedBidiClass.txt
4128  new default-R range: U+1E800 - U+1EFFF
4129- UnicodeData.txt
4130  all of the ISO comments are gone
4131  new CJK block end:
4132    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
4133  new CJK block:
4134    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
4135    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
4136
4137* genpname
4138- run preparse.pl
4139  + cd \svn\icuproj\icu\trunk\source\tools\genpname
4140  + make sure that data.h is writable
4141  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
4142  + preparse.pl complains with errors like the following:
4143      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
4144    This is because ICU 4.0 had scripts from ISO 15924 which are now
4145    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
4146    and PropertyValueAliases.txt.
4147    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4148       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
4149  + preparse.pl complains with errors about block names missing from uchar.h; add them
4150
4151* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4152- new block & script values
4153  + 26 new blocks
4154    copy new blocks from Blocks.txt
4155    MS VC++ 2008 regular expression:
4156      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
4157      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
4158  + several new script values already added in ICU 4.0 for ISO 15924 coverage
4159    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
4160  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
4161  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
4162    (added to SyntheticPropertyValueAliases.txt)
4163- new Joining Group (JG) values: Farsi_Yeh, Nya
4164- new Line_Break (lb) value:
4165    lb ; CP        ; Close_Parenthesis
4166
4167* hardcoded Unihan range end/limit
4168- Unihan range end moves from 9FC3 to 9FCB
4169  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
4170  + do change gennames.c
4171
4172* Compare definitions of new binary properties with what we used to use
4173  in algorithms, to see if the definitions changed.
4174- Verified that definitions for Cased and Case_Ignorable are unchanged.
4175  The gencase tool now parses the newly public Case_Ignorable values
4176  in case the definition changes in the future.
4177
4178* uchar.c & uprops.h & uprops.c & genprops
4179- new numeric values that didn't exist in Unicode data before:
4180    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
4181  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
4182  therefore redesign the encoding of numeric types and values for formatVersion 6;
4183  design for simple numbers up to at least 144 ("one gross"),
4184  large values up to at least 10^20,
4185  and fractions with numerators -1..17 and denominators 1..16
4186  to cover current and expected future values
4187  (e.g., more Han numeric values, Meroitic twelfths)
4188
4189* reimplement Hangul_Syllable_Type for new Jamo characters
4190- the old code assumed that all Jamo characters are in the 11xx block
4191- Unicode 5.2 fills holes there and adds new Jamo characters in
4192    A960..A97F; Hangul Jamo Extended-A
4193  and in
4194    D7B0..D7FF; Hangul Jamo Extended-B
4195- Hangul_Syllable_Type can be trivially derived from a subset of
4196  Grapheme_Cluster_Break values
4197
4198* build Unicode data source code for hardcoding core data
4199C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
4200
4201ICU data make path is \svn\icuproj\icu\trunk\source\data\
4202ICU root path is \svn\icuproj\icu\trunk
4203Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4204Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4205Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4206Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4207Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4208Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4209Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4210Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
4211Creating data file for Unicode Property Names
4212Creating data file for Unicode Character Properties
4213Creating data file for Unicode Case Mapping Properties
4214Creating data file for Unicode BiDi/Shaping Properties
4215Creating data file for Unicode Normalization
4216Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
4217Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
4218
4219- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
4220  and rebuild the common library
4221
4222*** UCA
4223
4224- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
4225- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
4226- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
4227[ Begin obsolete instructions:
4228  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
4229    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
4230      on Windows:
4231        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
4232        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
4233  End obsolete instructions]
4234- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
4235  not just the *_STUB.txt files
4236- note on intltest: if collate/UCAConformanceTest fails, then
4237  utility/MultithreadTest/TestCollators will fail as well;
4238  fix the conformance test before looking into the multi-thread test
4239
4240*** Implement Cased & Case_Ignorable properties
4241- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
4242- Problem: These properties should be disjoint, but aren't
4243- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
4244- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
4245
4246*** Implement Changes_When_Xyz properties
4247- without stored data
4248
4249*** Implement Name_Alias property
4250- add it as another name field in unames.icu
4251- make it available via u_charName() and UCharNameChoice and
4252- consider it in u_charFromName()
4253
4254*** Break iterators
4255
4256* Update break iterator rules to new UAX versions and new property values
4257* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
4258
4259*** new BidiTest file
4260- review format and data
4261- copy BidiTest.txt to source/test/testdata
4262- write test code using this data
4263- fix ICU code where it fails the conformance test
4264
4265*** Java
4266- generally, find and update code corresponding to C/C++
4267- UCharacter.UnicodeBlock constants:
4268  a) add an _ID integer per new block, update COUNT
4269  b) add a class instance per new block
4270     Visual Studio regex:
4271        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
4272        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
4273- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
4274
4275- port test changes to Java
4276
4277*** LayoutEngine script information
4278
4279(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
4280
4281* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
4282ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
4283ScriptRunData.cpp, which is no longer needed.)
4284
4285The generated files have a current copyright date and "@draft" statement.
4286
4287-> Eric Mader wrote in email on 20090930:
4288    "I think the tool has been modified to update @draft to @stable for
4289     older scripts and to add @draft for new scripts.
4290     (I worked with an intern on this last year.)
4291     You should check the output after you run it."
4292
4293* copy the above files into <icu>/source/layout, replacing the old files.
4294* fix mixed line endings
4295* review the diffs and fix incorrect @draft and missing aliases
4296* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
4297
4298Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4299and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4300
4301-> Eric Mader wrote in email on 20090930:
4302    "This is just a matter of making sure that all the per-script tables have
4303     entries for any new scripts that were added.
4304     If any new Indic characters were added, then the class tables in
4305     IndicClassTables.cpp should be updated to reflect this.
4306     John Emmons should know how to do this if it's required."
4307
4308* rebuild the layout and layoutex libraries.
4309
4310*** Documentation
4311- Update User Guide
4312  + Jamo_Short_Name, sfc->scf, binary property value aliases
4313
4314---------------------------------------------------------------------------- ***
4315
4316Unicode 5.1 update
4317
4318*** related ICU Trac tickets
4319
43205696 Update to Unicode 5.1
4321
4322*** Unicode version numbers
4323- makedata.mak
4324- uchar.h
4325- configure.in & configure
4326- update ucdVersion in gennames.c if an algorithmic range changes
4327
4328*** data files & enums & parser code
4329
4330* file preparation
4331- ucdstrip:
4332    DerivedCoreProperties.txt
4333    DerivedNormalizationProps.txt
4334    NormalizationTest.txt
4335    PropList.txt
4336    Scripts.txt
4337    GraphemeBreakProperty.txt
4338    SentenceBreakProperty.txt
4339    WordBreakProperty.txt
4340- ucdstrip and ucdmerge:
4341    EastAsianWidth.txt
4342    LineBreak.txt
4343
4344* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4345copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
4346copy 5.1.0\ucd\Blocks.txt ..\unidata\
4347copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
4348copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
4349copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4350copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4351copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4352copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4353copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
4354copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
4355copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
4356copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
4357copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
4358
4359ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4360ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4361ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4362ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
4363ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4364ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4365ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4366ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4367ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4368ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4369
4370* genpname
4371- run preparse.pl
4372  + cd \svn\icuproj\icu\uni51\source\tools\genpname
4373  + make sure that data.h is writable
4374  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
4375  + preparse.pl complains with errors like the following:
4376      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
4377    This is because ICU 3.8 had scripts from ISO 15924 which are now
4378    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
4379    and PropertyValueAliases.txt.
4380    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
4381       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
4382  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
4383      N/Y, No/Yes, F/T, False/True
4384    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
4385       It will use further values from the file if present.
4386
4387* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4388- new block & script values
4389  + 17 new blocks
4390  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
4391    (removed from SyntheticPropertyValueAliases.txt)
4392  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
4393    (added to SyntheticPropertyValueAliases.txt)
4394- uprops.icu (uprops.h) only provides 7 bits for script codes.
4395  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
4396  There is none above 127 yet which is the script code for an
4397  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
4398  script code values greater than 127.
4399  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
4400  in a parallel bit field, and that overflows now.
4401  Also, future values >=128 would be incompatible anyway.
4402  uprops.h is modified to move around several of the bit fields
4403  in the properties vector words, and now uses 8 bits for the script code.
4404  Two other bit fields also grow to accommodate future growth:
4405  Block (current count: 172) grows from 8 to 9 bits,
4406  and Word_Break grows from 4 to 5 bits.
4407- renamed property Simple_Case_Folding (sfc->scf)
4408  + nothing to be done: handled as normal alias
4409- new property JSN Jamo_Short_Name
4410  + no new API: only contributes to the Name property
4411- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
4412- new Joining Group (JG) value: Burushashki_Yeh_Barree
4413- new Sentence_Break (SB) values:
4414    SB ; CR        ; CR
4415    SB ; EX        ; Extend
4416    SB ; LF        ; LF
4417    SB ; SC        ; SContinue
4418- new Word_Break (WB) values:
4419    WB ; CR        ; CR
4420    WB ; Extend    ; Extend
4421    WB ; LF        ; LF
4422    WB ; MB        ; MidNumLet
4423
4424* Further changes in the 2008-02-29 update:
4425- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
4426  because they should not normally be invisible.
4427- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
4428- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
4429- new Word_Break (WB) value: NL=Newline
4430
4431* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
4432- Unihan range end moves from 9FBB to 9FC3
4433  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
4434  + do change gennames.c
4435
4436* build Unicode data source code for hardcoding core data
4437C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
4438
4439ICU data make path is \svn\icuproj\icu\uni51\source\data\
4440ICU root path is \svn\icuproj\icu\uni51
4441Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4442Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
4443Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
4444Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
4445Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
4446Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
4447Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
4448Creating data file for Unicode Character Properties
4449Creating data file for Unicode Case Mapping Properties
4450Creating data file for Unicode BiDi/Shaping Properties
4451Creating data file for Unicode Normalization
4452Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
4453Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
4454
4455- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
4456  and rebuild the common library
4457
4458*** Break iterators
4459
4460* Update break iterator rules to new UAX versions and new property values
4461
4462*** UCA
4463
4464* update FractionalUCA.txt and UCARules.txt with new canonical closure
4465
4466*** Test suites
4467- Test that APIs using Unicode property value aliases (like UnicodeSet)
4468  support all of the boolean values N/Y, No/Yes, F/T, False/True
4469  -> TestBinaryValues() tests in both cintltst and intltest
4470
4471*** LayoutEngine script information
4472* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4473ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4474ScriptRunData.cpp, which is no longer needed.)
4475
4476The generated files have a current copyright date and "@draft" statement.
4477
4478* copy the above files into <icu>/source/layout, replacing the old files.
4479
4480Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4481and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4482
4483* rebuild the layout and layoutex libraries.
4484
4485*** Documentation
4486- Update User Guide
4487  + Jamo_Short_Name, sfc->scf, binary property value aliases
4488
4489---------------------------------------------------------------------------- ***
4490
4491Unicode 5.0 update
4492
4493*** related Jitterbugs
4494
44955084 RFE: Update to Unicode 5.0
4496
4497*** data files & enums & parser code
4498
4499* file preparation
4500- ucdstrip:
4501    DerivedCoreProperties.txt
4502    DerivedNormalizationProps.txt
4503    NormalizationTest.txt
4504    PropList.txt
4505    Scripts.txt
4506    GraphemeBreakProperty.txt
4507    SentenceBreakProperty.txt
4508    WordBreakProperty.txt
4509- ucdstrip and ucdmerge:
4510    EastAsianWidth.txt
4511    LineBreak.txt
4512
4513* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
4514copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
4515copy 5.0.0\ucd\Blocks.txt ..\unidata\
4516copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
4517copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
4518copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
4519copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
4520copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
4521copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
4522copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
4523copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
4524copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
4525copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
4526copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
4527
4528ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
4529ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
4530ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
4531ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
4532ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
4533ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
4534ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
4535ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
4536ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
4537ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
4538
4539* update FractionalUCA.txt and UCARules.txt with new canonical closure
4540
4541* genpname
4542- run preparse.pl
4543  + make sure that data.h is writable
4544  + perl preparse.pl \cvs\oss\icu > out.txt
4545
4546* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4547- new block & script values
4548  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
4549
4550* build Unicode data source code for hardcoding core data
4551C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
4552
4553ICU data make path is \cvs\oss\icu\source\data\
4554ICU root path is \cvs\oss\icu
4555Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
4556[etc.]
4557Creating data file for Unicode Character Properties
4558Creating data file for Unicode Case Mapping Properties
4559Creating data file for Unicode BiDi/Shaping Properties
4560Creating data file for Unicode Normalization
4561Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
4562Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
4563
4564- copy the .c source files to C:\cvs\oss\icu\source\common
4565  and rebuild the common library
4566
4567*** Unicode version numbers
4568- makedata.mak
4569- uchar.h
4570- configure.in
4571
4572*** LayoutEngine script information
4573* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
4574ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
4575ScriptRunData.cpp, which is no longer needed.)
4576
4577The generated files have a current copyright date and "@draft" statement.
4578
4579* copy the above files into <icu>/source/layout, replacing the old files.
4580
4581Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
4582and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
4583
4584* rebuild the layout and layoutex libraries.
4585
4586---------------------------------------------------------------------------- ***
4587
4588Unicode 4.1 update
4589
4590*** related Jitterbugs
4591
45924332 RFE: Update to Unicode 4.1
45934157 RBBI, TR29 4.1 updates
4594
4595*** data files & enums & parser code
4596
4597* file preparation
4598- ucdstrip:
4599    DerivedCoreProperties.txt
4600    DerivedNormalizationProps.txt
4601    NormalizationTest.txt
4602    GraphemeBreakProperty.txt
4603    SentenceBreakProperty.txt
4604    WordBreakProperty.txt
4605- ucdstrip and ucdmerge:
4606    EastAsianWidth.txt
4607    LineBreak.txt
4608
4609* add new files to the repository
4610    GraphemeBreakProperty.txt
4611    SentenceBreakProperty.txt
4612    WordBreakProperty.txt
4613
4614* update FractionalUCA.txt and UCARules.txt with new canonical closure
4615
4616* genpname
4617- handle new enumerated properties in sub read_uchar
4618- run preparse.pl
4619
4620* uchar.h & uscript.h & uprops.h & uprops.c & genprops
4621- new binary properties
4622  + Pattern_Syntax
4623  + Pattern_White_Space
4624- new enumerated properties
4625  + Grapheme_Cluster_Break
4626  + Sentence_Break
4627  + Word_Break
4628- new block & script & line break values
4629
4630* gencase
4631- case-ignorable changes
4632  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4633  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
4634
4635*** Unicode version numbers
4636- makedata.mak
4637- uchar.h
4638- configure.in
4639
4640*** tests
4641- verify that u_charMirror() round-trips
4642- test all new properties and some new values of old properties
4643
4644*** other code
4645
4646* hardcoded Unihan range end/limit
4647- Unihan range end moves from 9FA5 to 9FBB
4648  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
4649  + do not modify BOCU/BOCSU code because that would change the encoding
4650    and break binary compatibility!
4651  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
4652    NamePrepProfile.txt
4653  + ignore trietest.c: test data is arbitrary
4654  + ignore tstnorm.cpp: test optimization, not important
4655  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
4656  + do change line_th.txt and word_th.txt
4657    by replacing hardcoded ranges with the new property values
4658  + do change gennames.c
4659
4660source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4661source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
4662source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
4663
4664* case mappings
4665- compare new special casing context conditions with previous ones
4666  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
4667
4668* genpname
4669- consider storing only the short name if it is the same as the long name
4670
4671*** other reviews
4672- UAX #29 changes (grapheme/word/sentence breaks)
4673- UAX #14 changes (line breaks)
4674- Pattern_Syntax & Pattern_White_Space
4675
4676---------------------------------------------------------------------------- ***
4677
4678Unicode 4.0.1 update
4679
4680*** related Jitterbugs
4681
46823170 RFE: Update to Unicode 4.0.1
46833171 Add new Unicode 4.0.1 properties
46843520 use Unicode 4.0.1 updates for break iteration
4685
4686*** data files & enums & parser code
4687
4688* file preparation
4689- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
4690- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
4691
4692* file fixes
4693- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
4694  according to PRI #26
4695  http://www.unicode.org/review/resolved-pri.html#pri26
4696- undone again because no corrigendum in sight;
4697  instead modified tests to not check consistency on this for Unicode 4.0.1
4698
4699* ucdterms.txt
4700- update from http://www.unicode.org/copyright.html
4701  formatted for plain text
4702
4703* uchar.h & uprops.h & uprops.c & genprops
4704- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
4705- add U_LB_INSEPARABLE due to a spelling fix
4706  + put short name comment only on line with new constant
4707    for genpname perl script parser
4708- new binary properties
4709  + STerm
4710  + Variation_Selector
4711
4712* genpname
4713- fix genpname perl script so that it doesn't choke on more than 2 names per property value
4714- perl script: correctly calculate the maximum number of fields per row
4715
4716* uscript.h
4717- new script code Hrkt=Katakana_Or_Hiragana
4718
4719* gennorm.c track changes in DerivedNormalizationProps.txt
4720- "FNC" -> "FC_NFKC"
4721- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
4722
4723* genprops/props2.c track changes in DerivedNumericValues.txt
4724- changed from 3 columns to 2, dropping the numeric type
4725  + assume that the type is always numeric for Han characters,
4726    and that only those are added in addition to what UnicodeData.txt lists
4727
4728*** Unicode version numbers
4729- makedata.mak
4730- uchar.h
4731- configure.in
4732
4733*** tests
4734- update test of default bidi classes according to PRI #28
4735  /tsutil/cucdtst/TestUnicodeData
4736  http://www.unicode.org/review/resolved-pri.html#pri28
4737- bidi tests: change exemplar character for ES depending on Unicode version
4738- change hardcoded expected property values where they change
4739
4740*** other code
4741
4742* name matching
4743- read UCD.html
4744
4745* scripts
4746- use new Hrkt=Katakana_Or_Hiragana
4747
4748* ZWJ & ZWNJ
4749- are now part of combining character sequences
4750- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
4751