• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2004-2015, International Business Machines
2* Corporation and others.  All Rights Reserved.
3*
4*   file name:  changes.txt
5*   encoding:   US-ASCII
6*   tab size:   8 (not used)
7*   indentation:4
8*
9*   created on: 2004may06
10*   created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
15
16* New ISO 15924 script codes
17
18Starting with ICU 55, we do not add UScriptCode constants any more until their scripts
19are encoded in Unicode, or can be assumed to be encoded in the next Unicode version.
20Script enum constant names want to follow the Unicode script property value aliases,
21which are assigned only when the scripts are encoded.
22When we encode scripts early and guess wrong, then we have confusing enum constants
23and have sometimes added aliases.
24
25Exception: Script codes like Latf and Aran that are not subject to separate encoding
26can be added at any time.
27
28Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html
29
30Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561
31- Adlm  166     Adlam
32- Aran  161     Arabic (Nastaliq variant)
33- Kitl  505     Khitan large script
34- Kits  288     Khitan small script
35- Marc  332     Marchen
36- Osge  219     Osage
37
38Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time.
39
40Adlam, Marchen, and Osage are expected to go into Unicode 9;
41we should assign Unicode script property value aliases for them
42soon after Unicode 8 is released, and add them in ICU 56.
43
44Khitan scripts will be encoded later.
45
46---------------------------------------------------------------------------- ***
47
48Unicode 8.0 update for ICU ??
49
50* UCA issue from 7.0
51
52- U+1DE9 COMBINING LATIN SMALL LETTER BETA
53  sorts with Greek Beta, should sort with Latin B?
54  + Ken says:
55    No, it was deliberate:
56
57    03B2;GREEK SMALL LETTER BETA;Ll;;;;0392;;0392
58    1D5D;MODIFIER LETTER SMALL BETA;Lm;<super> 03B2;;;;;
59    1DE9;COMBINING LATIN SMALL LETTER BETA;Mn;<sort> 03B2;;;;;
60    1D66;GREEK SUBSCRIPT SMALL LETTER BETA;Ll;<sub> 03B2;;;;;
61
62    Note the relationship to U+1D5D.
63
64    When the disunified *Latin* beta base letter shows up in Unicode 8.0:
65
66    U+A7B4 LATIN CAPITAL LETTER BETA
67    U+A7B5 LATIN SMALL LETTER BETA
68
69    we could re-evaluate what U+1DE9 equates to, for collation,
70    but currently there isn’t any Latin beta to serve that function
71    in Unicode 7.0.
72
73- ICU_ROOT=~/svn.icu/trunk
74- ICU_SRC_DIR=$ICU_ROOT/src
75- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
76- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
77
78
79---------------------------------------------------------------------------- ***
80
81Unicode 7.0 update for ICU 54
82
83http://www.unicode.org/review/pri271/  -- beta review
84http://www.unicode.org/reports/uax-proposed-updates.html
85http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
86http://www.unicode.org/reports/tr44/tr44-13.html
87
88*** ICU Trac
89
90- ticket 10821: Unicode 7.0, UCA 7.0
91- C++ branches/markus/uni70 at r35584 from trunk at r35580
92- Java branches/markus/uni70 at r35587 from trunk at r35545
93
94*** CLDR Trac
95
96- ticket 7195: UCA 7.0 CLDR root collation
97- branches/markus/uni70 at r10062 from trunk at r10061
98
99- ticket 6762: script metadata for Unicode 7.0 new scripts
100
101*** Unicode version numbers
102- makedata.mak
103- uchar.h
104- com.ibm.icu.util.VersionInfo
105- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
106
107- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
108  so that the makefiles see the new version number.
109
110*** data files & enums & parser code
111
112* file preparation
113
114- download UCD & IDNA files
115- make sure that the Unicode data folder passed into preparseucd.py
116  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
117- only for manual diffs: remove version suffixes from the file names
118  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
119  (see https://sites.google.com/site/unicodetools/inputdata)
120- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
121- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
122- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
123- Restore TODO diffs in source/data/unidata/UCARules.txt
124    cd $ICU_SRC_DIR
125    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
126- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
127
128- also: from http://unicode.org/Public/security/7.0.0/ download new
129  confusables.txt & confusablesWholeScript.txt
130  and copy to $ICU_ROOT/src/source/data/unidata/
131
132* initial preparseucd.py changes
133- remove new Unicode scripts from the
134  only-in-ISO-15924 list according to the error message:
135    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
136                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
137                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
138    from _scripts_only_in_iso15924
139  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
140      and in com.ibm.icu.dev.test.lang.TestUScript.java
141- NamesList.txt now has a heading with a non-ASCII character
142  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
143  + escape non-ASCII characters in heading comments
144- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
145  + get the copyright from the first file whose copyright line contains the current year
146
147* PropertyValueAliases.txt changes
148- 32 new Block (blk) values:
149    blk; Bassa_Vah                        ; Bassa_Vah
150    blk; Caucasian_Albanian               ; Caucasian_Albanian
151    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
152    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
153    blk; Duployan                         ; Duployan
154    blk; Elbasan                          ; Elbasan
155    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
156    blk; Grantha                          ; Grantha
157    blk; Khojki                           ; Khojki
158    blk; Khudawadi                        ; Khudawadi
159    blk; Latin_Ext_E                      ; Latin_Extended_E
160    blk; Linear_A                         ; Linear_A
161    blk; Mahajani                         ; Mahajani
162    blk; Manichaean                       ; Manichaean
163    blk; Mende_Kikakui                    ; Mende_Kikakui
164    blk; Modi                             ; Modi
165    blk; Mro                              ; Mro
166    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
167    blk; Nabataean                        ; Nabataean
168    blk; Old_North_Arabian                ; Old_North_Arabian
169    blk; Old_Permic                       ; Old_Permic
170    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
171    blk; Pahawh_Hmong                     ; Pahawh_Hmong
172    blk; Palmyrene                        ; Palmyrene
173    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
174    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
175    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
176    blk; Siddham                          ; Siddham
177    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
178    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
179    blk; Tirhuta                          ; Tirhuta
180    blk; Warang_Citi                      ; Warang_Citi
181  -> add to uchar.h
182    use long property names for enum constants
183  -> add to UCharacter.UnicodeBlock IDs
184    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
185            replace  public static final int \1_ID = \2; \3
186  -> add to UCharacter.UnicodeBlock objects
187    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
188            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
189- 28 new Joining_Group (jg) values:
190    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
191    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
192    jg ; Manichaean_Beth                  ; Manichaean_Beth
193    jg ; Manichaean_Daleth                ; Manichaean_Daleth
194    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
195    jg ; Manichaean_Five                  ; Manichaean_Five
196    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
197    jg ; Manichaean_Heth                  ; Manichaean_Heth
198    jg ; Manichaean_Hundred               ; Manichaean_Hundred
199    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
200    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
201    jg ; Manichaean_Mem                   ; Manichaean_Mem
202    jg ; Manichaean_Nun                   ; Manichaean_Nun
203    jg ; Manichaean_One                   ; Manichaean_One
204    jg ; Manichaean_Pe                    ; Manichaean_Pe
205    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
206    jg ; Manichaean_Resh                  ; Manichaean_Resh
207    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
208    jg ; Manichaean_Samekh                ; Manichaean_Samekh
209    jg ; Manichaean_Taw                   ; Manichaean_Taw
210    jg ; Manichaean_Ten                   ; Manichaean_Ten
211    jg ; Manichaean_Teth                  ; Manichaean_Teth
212    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
213    jg ; Manichaean_Twenty                ; Manichaean_Twenty
214    jg ; Manichaean_Waw                   ; Manichaean_Waw
215    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
216    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
217    jg ; Straight_Waw                     ; Straight_Waw
218  -> uchar.h & UCharacter.JoiningGroup
219- 23 new Script (sc) values:
220    sc ; Aghb                             ; Caucasian_Albanian
221    sc ; Bass                             ; Bassa_Vah
222    sc ; Dupl                             ; Duployan
223    sc ; Elba                             ; Elbasan
224    sc ; Gran                             ; Grantha
225    sc ; Hmng                             ; Pahawh_Hmong
226    sc ; Khoj                             ; Khojki
227    sc ; Lina                             ; Linear_A
228    sc ; Mahj                             ; Mahajani
229    sc ; Mani                             ; Manichaean
230    sc ; Mend                             ; Mende_Kikakui
231    sc ; Modi                             ; Modi
232    sc ; Mroo                             ; Mro
233    sc ; Narb                             ; Old_North_Arabian
234    sc ; Nbat                             ; Nabataean
235    sc ; Palm                             ; Palmyrene
236    sc ; Pauc                             ; Pau_Cin_Hau
237    sc ; Perm                             ; Old_Permic
238    sc ; Phlp                             ; Psalter_Pahlavi
239    sc ; Sidd                             ; Siddham
240    sc ; Sind                             ; Khudawadi
241    sc ; Tirh                             ; Tirhuta
242    sc ; Wara                             ; Warang_Citi
243  -> uscript.h (many were added before)
244    comment "Mende Kikakui" for USCRIPT_MENDE
245    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
246  -> com.ibm.icu.lang.UScript
247    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
248    replace  public static final int \1 = \2; \3
249- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
250  (added 2012-11-01)
251    Ahom        338     Ahom
252    Hatr        127     Hatran
253    Mult        323     Multani
254  (added 2013-10-12)
255    Modi        324     Modi
256    Pauc        263     Pau Cin Hau
257    Sidd        302     Siddham
258  -> uscript.h (some overlap with additions from Unicode)
259  -> com.ibm.icu.lang.UScript
260    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
261    replace  public static final int \1 = \2; \3
262  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
263  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
264      and in com.ibm.icu.dev.test.lang.TestUScript.java
265
266* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
267    (not strictly necessary for NOT_ENCODED scripts)
268  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
269
270* generate normalization data files
271- cd $ICU_ROOT/dbg
272- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
273- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
274- UNIDATA=$ICU_SRC_DIR/source/data/unidata
275- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
276- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
277- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
278- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
279- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
280
281* build ICU (make install)
282  so that the tools build can pick up the new definitions from the installed header files.
283
284~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
285
286* build Unicode tools using CMake+make
287
288~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
289
290# Location (--prefix) of where ICU was installed.
291set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
292# Location of the ICU source tree.
293set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
294
295~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
296~/svn.icutools/trunk/dbg/unicode/c$ make
297
298* genprops work
299- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
300  + add second array of Joining_Group values for at most 10800..10FFF
301    icutools: unicode/c/genprops/bidipropsbuilder.cpp
302    icu: source/common/ubidi_props.h/.c/_data.h
303    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
304
305* generate core properties data files
306- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
307- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
308- rebuild ICU (make install) & tools
309- run genuca again (see step above) so that it picks up the new nfc.nrm
310- rebuild ICU (make install) & tools
311
312* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
313  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
314- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
315- Unicode 6.0..7.0: U+2260, U+226E, U+226F
316- nothing new in 7.0, no test file to update
317
318* run & fix ICU4C tests
319
320* update Java data files
321- refresh just the UCD-related files, just to be safe
322- see (ICU4C)/source/data/icu4j-readme.txt
323- mkdir /tmp/icu4j
324- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
325  output:
326    ...
327    Unicode .icu files built to ./out/build/icudt53l
328    echo timestamp > uni-core-data
329    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
330    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
331    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
332    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
333    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
334    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
335    mkdir -p /tmp/icu4j/main/shared/data
336    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
337    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
338    mkdir -p /tmp/icu4j/main/shared/data
339    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
340    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
341- copy the big-endian Unicode data files to another location,
342  separate from the other data files
343    ICUDT=icudt54b
344    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
345    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
346    cd ~/svn.icu/uni70/dbg/data/out/icu4j
347    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
348    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
349    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
350    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
351    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
352    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
353- refresh ICU4J
354    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
355
356* update CollationFCD.java
357  + copy & paste the initializers of lcccIndex[] etc. from
358    ICU4C/source/i18n/collationfcd.cpp to
359    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
360
361* refresh Java test .txt files
362- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
363    cd $ICU_SRC_DIR/source/data/unidata
364    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
365    cd ../../test/testdata
366    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
367    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
368
369* UCA
370
371- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
372- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
373- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
374- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
375- output files are in ~/svn.unitools/Generated/uca/7.0.0/
376- review data; compare files, use blankweights.sed or similar
377  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
378- cd ~/svn.unitools/Generated/uca/7.0.0/
379- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
380  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
381- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
382    (note removing the underscore before "Rules")
383    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
384- update (ICU4C)/source/test/testdata/CollationTest_*.txt
385  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
386  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
387    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
388    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
389    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
390- run genuca, see command line above
391- rebuild ICU4C
392- refresh ICU4J collation data:
393  (subset of instructions above for properties data refresh, except copies all coll/*)
394    ICUDT=icudt54b
395    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
396    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
397    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
398    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
399- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
400- note on intltest: if collate/UCAConformanceTest fails, then
401  utility/MultithreadTest/TestCollators will fail as well;
402  fix the conformance test before looking into the multi-thread test
403- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
404- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
405  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
406
407* When refreshing all of ICU4J data from ICU4C
408- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
409- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
410or
411- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
412
413* run & fix ICU4J tests
414
415*** LayoutEngine script information
416
417(For details see the Unicode 5.2 change log below.)
418
419* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
420  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
421  in the working directory.
422  (It also generates ScriptRunData.cpp, which is no longer needed.)
423
424  The generated files have a current copyright date and "@stable" statement.
425  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
426  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
427  which may not contain dots any more.
428
429- diff current <icu>/source/layout files vs. generated ones
430    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
431  review and manually merge desired changes;
432  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
433  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
434- if you just copy the above files, then
435  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
436  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
437
438*** API additions
439- send notice to icu-design about new born-@stable API (enum constants etc.)
440
441*** merge the Unicode update branches back onto the trunk
442- do not merge the icudata.jar and testdata.jar,
443  instead rebuild them from merged & tested ICU4C
444
445---------------------------------------------------------------------------- ***
446
447Unicode 6.3 update
448
449http://www.unicode.org/review/pri249/  -- beta review
450http://www.unicode.org/reports/uax-proposed-updates.html
451http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
452http://www.unicode.org/reports/tr44/tr44-11.html
453
454*** ICU Trac
455
456- ticket 10128: update ICU to Unicode 6.3 beta
457- ticket 10168: update ICU to Unicode 6.3 final
458- C++ branches/markus/uni63 at r33552 from trunk at r33551
459- Java branches/markus/uni63 at r33550 from trunk at r33553
460
461- ticket 10142: implement Unicode 6.3 bidi algorithm additions
462
463*** Unicode version numbers
464- makedata.mak
465- uchar.h
466  (configure.in & configure: have been modified to extract the version from uchar.h)
467- com.ibm.icu.util.VersionInfo
468- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
469
470- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
471  so that the makefiles see the new version number.
472
473*** data files & enums & parser code
474
475* file preparation
476
477- download UCD, UCA & IDNA files
478- make sure that the Unicode data folder passed into preparseucd.py
479  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
480- modify preparseucd.py:
481  parse new file BidiBrackets.txt
482  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
483- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
484- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
485- Check test file diffs for previously commented-out, known-failing data lines;
486  probably need to keep those commented out.
487
488* PropertyAliases.txt changes
489- 1 new Enumerated Property
490  bpt                      ; Bidi_Paired_Bracket_Type
491  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
492  -> ubidi_props.h & .c & UBiDiProps.java
493  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
494  -> uprops.cpp
495  -> change ubidi.icu format version from 2.0 to 2.1
496- 1 new Miscellaneous Property
497  bpb                      ; Bidi_Paired_Bracket
498  -> uchar.h & UProperty.java
499  -> ppucd.h & .cpp
500
501* PropertyValueAliases.txt changes
502- 3 Bidi_Paired_Bracket_Type (bpt) values:
503  bpt; c                                ; Close
504  bpt; n                                ; None
505  bpt; o                                ; Open
506  -> uchar.h & UCharacter.BidiPairedBracketType
507  -> ubidi_props.h & .c & UBiDiProps.java
508  -> change ubidi.icu format version from 2.0 to 2.1
509- 4 new Bidi_Class (bc) values:
510  bc ; FSI                              ; First_Strong_Isolate
511  bc ; LRI                              ; Left_To_Right_Isolate
512  bc ; RLI                              ; Right_To_Left_Isolate
513  bc ; PDI                              ; Pop_Directional_Isolate
514  -> uchar.h & UCharacterEnums.ECharacterDirection
515  -> until the bidi code gets updated,
516     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
517- 3 new Word_Break (WB) values:
518  WB ; HL                               ; Hebrew_Letter
519  WB ; SQ                               ; Single_Quote
520  WB ; DQ                               ; Double_Quote
521  -> uchar.h & UCharacter.WordBreak
522  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
523- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
524  (added 2012-10-16)
525  Aghb  239     Caucasian Albanian
526  Mahj  314     Mahajani
527  -> uscript.h
528  -> com.ibm.icu.lang.UScript
529    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
530    replace  public static final int \1 = \2;\3
531  -> preparseucd.py _scripts_only_in_iso15924
532  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
533      and in com.ibm.icu.dev.test.lang.TestUScript.java
534  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
535     (not strictly necessary for NOT_ENCODED scripts)
536
537* generate normalization data files
538- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
539- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
540- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
541- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
542- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
543- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
544- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
545
546* build ICU (make install)
547  so that the tools build can pick up the new definitions from the installed header files.
548
549~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
550
551* build Unicode tools using CMake+make
552
553~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
554
555# Location (--prefix) of where ICU was installed.
556set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
557# Location of the ICU source tree.
558set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
559
560~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
561~/svn.icutools/trunk/dbg/unicode/c$ make
562
563* generate core properties data files
564- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
565- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
566- rebuild ICU (make install) & tools
567- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
568- rebuild ICU (make install) & tools
569
570* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
571  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
572- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
573- Unicode 6.0..6.3: U+2260, U+226E, U+226F
574- nothing new in 6.3, no test file to update
575
576* update Java data files
577- refresh just the UCD-related files, just to be safe
578- see (ICU4C)/source/data/icu4j-readme.txt
579- mkdir /tmp/icu4j
580- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
581  output:
582    ...
583    Unicode .icu files built to ./out/build/icudt52l
584    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
585    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
586    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
587    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
588    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
589    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
590    mkdir -p /tmp/icu4j/main/shared/data
591    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
592    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
593    mkdir -p /tmp/icu4j/main/shared/data
594    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
595    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
596- copy the big-endian Unicode data files to another location,
597  separate from the other data files
598    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
599    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
600    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
601    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
602    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
603    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
604    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
605- refresh ICU4J
606    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
607
608* refresh Java test .txt files
609- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
610
611* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
612
613- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
614- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
615- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
616- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
617  (note removing the underscore before "Rules")
618- update (ICU4C)/source/test/testdata/CollationTest_*.txt
619  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
620  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
621- check test file diffs for previously commented-out, known-failing data lines;
622  probably need to keep those commented out
623- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
624- run genuca, see command line above
625- rebuild ICU4C
626- refresh ICU4J collation data:
627  (subset of instructions above for properties data refresh, except copies all coll/*)
628    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
629    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
630    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
631    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
632- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
633- note on intltest: if collate/UCAConformanceTest fails, then
634  utility/MultithreadTest/TestCollators will fail as well;
635  fix the conformance test before looking into the multi-thread test
636
637* test ICU, fix test code where necessary
638
639* When refreshing all of ICU4J data from ICU4C
640- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
641- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
642or
643- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
644
645*** LayoutEngine script information
646- skipped for Unicode 6.3: no new scripts
647
648*** merge the Unicode update branches back onto the trunk
649- do not merge the icudata.jar and testdata.jar,
650  instead rebuild them from merged & tested ICU4C
651
652---------------------------------------------------------------------------- ***
653
654Unicode 6.2 update
655
656http://www.unicode.org/review/pri230/
657http://www.unicode.org/versions/beta-6.2.0.html
658http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
659http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
660http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
661http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
662http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
663http://unicode.org/Public/idna/6.2.0/
664
665*** ICU Trac
666
667- ticket 9515: Unicode 6.2: final ICU update
668
669- ticket 9514: UCA 6.2: fix UCARules.txt
670
671- ticket 9437: update ICU to Unicode 6.2
672- C++ branches/markus/uni62 at r32050 from trunk at r32041
673- Java branches/markus/uni62 at r32068 from trunk at r32066
674
675*** Unicode version numbers
676- makedata.mak
677- uchar.h
678  (configure.in & configure: have been modified to extract the version from uchar.h)
679- com.ibm.icu.util.VersionInfo
680- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
681
682*** data files & enums & parser code
683
684* file preparation
685
686- download UCD, UCA & IDNA files
687- make sure that the Unicode data folder passed into preparseucd.py
688  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
689- modify preparseucd.py: NamesList.txt is now in UTF-8
690- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
691- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
692- Check test file diffs for previously commented-out, known-failing data lines;
693  probably need to keep those commented out.
694
695* PropertyValueAliases.txt changes
696- 1 new Line_Break (lb) value:
697  lb ; RI                               ; Regional_Indicator
698  -> uchar.h & UCharacter.LineBreak
699- 1 new Word_Break (WB) value:
700  WB ; RI                               ; Regional_Indicator
701  -> uchar.h & UCharacter.WordBreak
702- 1 new Grapheme_Cluster_Break (GCB) value:
703  GCB; RI                               ; Regional_Indicator
704  -> uchar.h & UCharacter.GraphemeClusterBreak
705
706* 3 new numeric values
707  The new value -1, which was really supposed to be NaN but that would have required
708  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
709  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
710    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
711    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
712  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
713    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
714    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
715  -> uprops.h, uchar.c & UCharacterProperty.java
716  -> cucdtst.c & UCharacterTest.java
717
718* generate normalization data files
719- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
720- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
721- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
722- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
723- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
724- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
725- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
726
727* build ICU (make install)
728  so that the tools build can pick up the new definitions from the installed header files.
729* build Unicode tools using CMake+make
730
731* generate core properties data files
732- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
733- in initial bootstrapping, change the UCA version
734  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
735- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
736- rebuild ICU (make install) & tools
737  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
738    check if the UCA version in FractionalUCA.txt matches the new Unicode version
739    (see step above)
740- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
741- rebuild ICU (make install) & tools
742
743* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
744  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
745- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
746- Unicode 6.0..6.2: U+2260, U+226E, U+226F
747- nothing new in 6.2, no test file to update
748
749* update Java data files
750- refresh just the UCD-related files, just to be safe
751- see (ICU4C)/source/data/icu4j-readme.txt
752- mkdir /tmp/icu4j
753- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
754  output:
755    ...
756    Unicode .icu files built to ./out/build/icudt50l
757    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
758    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
759    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
760    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
761    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
762    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
763    mkdir -p /tmp/icu4j/main/shared/data
764    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
765    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
766    mkdir -p /tmp/icu4j/main/shared/data
767    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
768    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
769- copy the big-endian Unicode data files to another location,
770  separate from the other data files
771    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
772    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
773    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
774    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
775    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
776    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
777    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
778- refresh ICU4J
779    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
780
781* refresh Java test .txt files
782- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
783
784* UCA
785
786- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
787- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
788- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
789- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
790  (note removing the underscore before "Rules")
791- update (ICU4C)/source/test/testdata/CollationTest_*.txt
792  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
793  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
794- check test file diffs for previously commented-out, known-failing data lines;
795  probably need to keep those commented out
796- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
797- run genuca, see command line above
798- rebuild ICU4C
799- refresh ICU4J collation data:
800  (subset of instructions above for properties data refresh, except copies all coll/*)
801    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
802    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
803    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
804    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
805- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
806- note on intltest: if collate/UCAConformanceTest fails, then
807  utility/MultithreadTest/TestCollators will fail as well;
808  fix the conformance test before looking into the multi-thread test
809
810* test ICU, fix test code where necessary
811
812* When refreshing all of ICU4J data from ICU4C
813- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
814- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
815or
816- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
817
818*** LayoutEngine script information
819- skipped for Unicode 6.2: no new scripts
820
821*** merge the Unicode update branches back onto the trunk
822- do not merge the icudata.jar and testdata.jar,
823  instead rebuild them from merged & tested ICU4C
824
825---------------------------------------------------------------------------- ***
826
827Future Unicode update
828
829Tools simplified since the Unicode 6.1 update. See
830- http://site.icu-project.org/design/props/ppucd
831- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
832
833* Unicode version numbers
834- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
835
836* file preparation
837- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
838- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
839- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
840- Check test file diffs for previously commented-out, known-failing data lines;
841  probably need to keep those commented out.
842
843* PropertyValueAliases.txt changes
844- Script codes that are in ISO 15924 but not in Unicode are now listed in
845  preparseucd.py, in the _scripts_only_in_iso15924 variable.
846  If there are new ISO codes, then add them.
847  If Unicode adds some of them, then remove them from the .py variable.
848
849* UnicodeData.txt changes
850- No more manual changes for CJK ranges for algorithmic names;
851  those are now written to ppucd.txt and genprops reads them from there.
852
853* generate core properties data files (makeprops.sh was deleted)
854- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
855
856* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
857- it is now generated by preparseucd.py
858
859* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
860- it is now generated by preparseucd.py
861- make sure that the Unicode data folder passed into preparseucd.py
862  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
863  (can be in some subfolder)
864
865* generate normalization data files
866- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
867- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
868- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
869- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
870- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
871- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
872- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
873
874* build ICU (make install)
875* build Unicode tools using CMake+make
876
877* new way to call genuca (makeuca.sh was deleted)
878- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
879
880---------------------------------------------------------------------------- ***
881
882Unicode 6.1 update
883
884*** ICU Trac
885
886- ticket 8995 final update to Unicode 6.1
887- ticket 8994 regenerate source/layout/CanonData.cpp
888
889- ticket 8961 support Unicode "Age" value *names*
890- ticket 8963 support multiple character name aliases & types
891
892- ticket 8827 "update ICU to Unicode 6.1"
893- C++ branches/markus/uni61 at r30864 from trunk at r30843
894- Java branches/markus/uni61 at r30865 from trunk at r30863
895
896*** Unicode version numbers
897- makedata.mak
898- uchar.h
899  (configure.in & configure: have been modified to extract the version from uchar.h)
900- com.ibm.icu.util.VersionInfo
901- icutools/unicode/makedefs.sh
902  + also review & update other definitions in that file,
903    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
904
905*** data files & enums & parser code
906
907* file preparation
908
909~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
910- This prepares both unidata and testdata files in respective output subfolders.
911- Check test file diffs for previously commented-out, known-failing data lines;
912  probably need to keep those commented out.
913
914* PropertyValueAliases.txt changes
915- 11 new block names:
916  Arabic_Extended_A
917  Arabic_Mathematical_Alphabetic_Symbols
918  Chakma
919  Meetei_Mayek_Extensions
920  Meroitic_Cursive
921  Meroitic_Hieroglyphs
922  Miao
923  Sharada
924  Sora_Sompeng
925  Sundanese_Supplement
926  Takri
927  -> add to uchar.h
928  -> add to UCharacter.UnicodeBlock IDs
929    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
930            replace  public static final int \1_ID = \2; \3
931  -> add to UCharacter.UnicodeBlock objects
932    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
933            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
934- 1 new Joining_Group (jg) value:
935  Rohingya_Yeh
936  -> uchar.h & UCharacter.JoiningGroup
937- 2 new Line_Break (lb) values:
938  CJ=Conditional_Japanese_Starter
939  HL=Hebrew_Letter
940  -> uchar.h & UCharacter.LineBreak
941- 7 new scripts:
942  sc ; Cakm      ; Chakma
943  sc ; Merc      ; Meroitic_Cursive
944  sc ; Mero      ; Meroitic_Hieroglyphs
945  sc ; Plrd      ; Miao
946  sc ; Shrd      ; Sharada
947  sc ; Sora      ; Sora_Sompeng
948  sc ; Takr      ; Takri
949  -> remove these from SyntheticPropertyValueAliases.txt
950  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
951      and in com.ibm.icu.dev.test.lang.TestUScript.java
952- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
953  (added 2011-06-21)
954  Khoj        322     Khojki
955  Tirh        326     Tirhuta
956    and another one added 2011-12-09
957  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
958  -> uscript.h
959  -> com.ibm.icu.lang.UScript
960    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
961    replace  public static final int \1 = \2;\3
962  -> SyntheticPropertyValueAliases.txt
963  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
964      and in com.ibm.icu.dev.test.lang.TestUScript.java
965
966* UnicodeData.txt changes
967- the last Unihan code point changes from U+9FCB to U+9FCC
968  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
969  + do change gennames.c
970  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
971
972* DerivedBidiClass.txt changes
973- 2 new default-AL blocks:
974#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
975#     Arabic Mathematical Alphabetic Symbols:
976#                       U+1EE00  - U+1EEFF  (was default-R)
977- 2 new default-R blocks:
978#     Meroitic Hieroglyphs:
979#                        U+10980 - U+1099F
980#     Meroitic Cursive:  U+109A0 - U+109FF
981  -> should be picked up by the explicit data in the file
982
983* NameAliases.txt changes
984- from
985    # Each line has two fields
986    # First field: Code point
987    # Second field: Alias
988- to
989    # Each line has three fields, as described here:
990    #
991    # First field:  Code point
992    # Second field: Alias
993    # Third field:  Type
994- Also, the file previously allowed multiple aliases but only now does it
995  actually provide multiple, even multiple of the same type. For example,
996    FEFF;BYTE ORDER MARK;alternate
997    FEFF;BOM;abbreviation
998    FEFF;ZWNBSP;abbreviation
999- This breaks our gennames parser, unames.icu data structure, and API.
1000  Fix gennames to only pick up "correction" aliases.
1001  New ticket #8963 for further changes.
1002
1003* run genpname/preparse.pl (on Linux)
1004  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1005  + make sure that data.h is writable
1006  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1007  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1008
1009* build ICU (make install)
1010  so that the tools build can pick up the new definitions from the installed header files.
1011* build Unicode tools (at least genpname) using CMake+make
1012
1013* run genpname
1014  (builds both pnames.icu and propname_data.h)
1015- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1016- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1017
1018* build ICU (make install)
1019* build Unicode tools using CMake+make
1020
1021* update source/data/unidata/norm2/nfkc_cf.txt
1022- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1023
1024* update source/data/unidata/norm2/uts46.txt
1025- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1026  to ~/svn.icu/tools/trunk/src/unicode/py
1027- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1028- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1029- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1030
1031* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1032  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1033- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1034- Unicode 6.0..6.1: U+2260, U+226E, U+226F
1035- nothing new in 6.1, no test file to update
1036
1037* generate core properties data files
1038- in initial bootstrapping, change the UCA version
1039  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1040- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1041- rebuild ICU & tools
1042  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1043    check if the UCA version in FractionalUCA.txt matches the new Unicode version
1044    (see step above)
1045- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1046  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1047- rebuild ICU & tools
1048
1049* update Java data files
1050- refresh just the UCD-related files, just to be safe
1051- see (ICU4C)/source/data/icu4j-readme.txt
1052- mkdir /tmp/icu4j
1053- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1054  output:
1055    ...
1056    Unicode .icu files built to ./out/build/icudt49l
1057    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1058    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1059    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1060    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1061    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1062    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1063    mkdir -p /tmp/icu4j/main/shared/data
1064    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1065    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1066    mkdir -p /tmp/icu4j/main/shared/data
1067    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1068    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1069- copy the big-endian Unicode data files to another location,
1070  separate from the other data files
1071    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1072    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1073    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1074    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1075    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1076    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1077    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1078- refresh ICU4J
1079    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1080
1081* refresh Java test .txt files
1082- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1083
1084* test ICU so far, fix test code where necessary
1085- temporarily ignore collation issues that look like UCA/UCD mismatches,
1086  until UCA data is updated
1087
1088* UCA
1089
1090- get output from Mark's tools; look in
1091    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1092- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1093- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1094  (note removing the underscore before "Rules")
1095- update (ICU)/source/test/testdata/CollationTest_*.txt
1096  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1097  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1098- check test file diffs for previously commented-out, known-failing data lines;
1099  probably need to keep those commented out
1100- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1101- run makeuca.sh:
1102  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1103- rebuild ICU4C
1104- refresh ICU4J collation data:
1105  (subset of instructions above for properties data refresh, except copies all coll/*)
1106    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1107    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1108    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1109    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1110- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1111- note on intltest: if collate/UCAConformanceTest fails, then
1112  utility/MultithreadTest/TestCollators will fail as well;
1113  fix the conformance test before looking into the multi-thread test
1114
1115* When refreshing all of ICU4J data from ICU4C
1116- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1117- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1118or
1119- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1120
1121*** LayoutEngine script information
1122
1123(For details see the Unicode 5.2 change log below.)
1124
1125* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1126  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1127  in the working directory.
1128  (It also generates ScriptRunData.cpp, which is no longer needed.)
1129
1130  The generated files have a current copyright date and "@draft" statement.
1131
1132- diff current <icu>/source/layout files vs. generated ones
1133    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1134  review and manually merge desired changes;
1135  fix gratuitous changes, incorrect @draft and missing aliases;
1136  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1137- if you just copy the above files, then
1138  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1139  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1140
1141*** merge the Unicode update branches back onto the trunk
1142- do not merge the icudata.jar and testdata.jar,
1143  instead rebuild them from merged & tested ICU4C
1144
1145---------------------------------------------------------------------------- ***
1146
1147ICU 4.8 (no Unicode update, just new script codes)
1148
1149* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1150  (added 2010-12-21)
1151    Afak    439     Afaka
1152    Jurc    510     Jurchen
1153    Mroo    199     Mro, Mru
1154    Nshu    499     Nüshu
1155    Shrd    319     Sharada, Śāradā
1156    Sora    398     Sora Sompeng
1157    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
1158    Tang    520     Tangut
1159    Wole    480     Woleai
1160  -> uscript.h
1161  -> com.ibm.icu.lang.UScript
1162    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1163    replace  public static final int \1 = \2;\3
1164  -> genpname/SyntheticPropertyValueAliases.txt
1165  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1166      and in com.ibm.icu.dev.test.lang.TestUScript.java
1167
1168* run genpname/preparse.pl (on Linux)
1169  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1170  + make sure that data.h is writable
1171  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1172  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1173
1174* rebuild Unicode tools (at least genpname) using make
1175- You might first need to "make install" ICU so that the tools build can pick
1176  up the new definitions from the installed header files.
1177
1178* run genpname
1179  (builds both pnames.icu and propname_data.h)
1180- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1181- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1182- rebuild ICU & tools
1183
1184* run genprops
1185- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1186- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1187- rebuild ICU & tools
1188
1189* update Java data files
1190- refresh just the UCD-related files, just to be safe
1191- see (ICU4C)/source/data/icu4j-readme.txt
1192- mkdir /tmp/icu4j
1193- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1194- copy the big-endian Unicode data files to another location,
1195  separate from the other data files
1196    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1197    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1198    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1199- refresh ICU4J
1200    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1201
1202* should have updated the layout engine script codes but forgot
1203
1204---------------------------------------------------------------------------- ***
1205
1206Unicode 6.0 update
1207
1208*** related ICU Trac tickets
1209
12107264 Unicode 6.0 Update
1211
1212*** Unicode version numbers
1213- makedata.mak
1214- uchar.h
1215  (configure.in & configure: have been modified to extract the version from uchar.h)
1216- com.ibm.icu.util.VersionInfo
1217
1218*** data files & enums & parser code
1219
1220* file preparation
1221
1222~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1223- This now prepares both unidata and testdata files in respective output subfolders.
1224
1225* PropertyAliases.txt changes
1226- new Script_Extensions property defined in the new ScriptExtensions.txt file
1227  but not listed in PropertyAliases.txt; reported to unicode.org;
1228  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1229    scx; Script_Extensions
1230  -> uchar.h with new UProperty section
1231  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1232
1233* PropertyValueAliases.txt changes
1234- 12 new block names:
1235  Alchemical_Symbols
1236  Bamum_Supplement
1237  Batak
1238  Brahmi
1239  CJK_Unified_Ideographs_Extension_D
1240  Emoticons
1241  Ethiopic_Extended_A
1242  Kana_Supplement
1243  Mandaic
1244  Miscellaneous_Symbols_And_Pictographs
1245  Playing_Cards
1246  Transport_And_Map_Symbols
1247  -> add to uchar.h
1248  -> add to UCharacter.UnicodeBlock
1249    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1250            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1251- Joining_Group (jg) values:
1252  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1253  -> uchar.h & UCharacter.JoiningGroup
1254- 3 new scripts:
1255  sc ; Batk      ; Batak
1256  sc ; Brah      ; Brahmi
1257  sc ; Mand      ; Mandaic
1258  -> remove these from SyntheticPropertyValueAliases.txt
1259  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1260  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1261      and in com.ibm.icu.dev.test.lang.TestUScript.java
1262- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1263  (added 2009-11-11..2010-07-18)
1264  Bass        259     Bassa Vah
1265  Dupl        755     Duployan shortand
1266  Elba        226     Elbasan
1267  Gran        343     Grantha
1268  Kpel        436     Kpelle
1269  Loma        437     Loma
1270  Mend        438     Mende
1271  Merc        101     Meroitic Cursive
1272  Narb        106     Old North Arabian
1273  Nbat        159     Nabataean
1274  Palm        126     Palmyrene
1275  Sind        318     Sindhi
1276  Wara        262     Warang Citi
1277  -> uscript.h
1278  -> com.ibm.icu.lang.UScript
1279    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1280    replace  public static final int \1 = \2;\3
1281  -> SyntheticPropertyValueAliases.txt
1282  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1283      and in com.ibm.icu.dev.test.lang.TestUScript.java
1284- ISO 15924 name change
1285  Mero        100     Meroitic Hieroglyphs (was Meroitic)
1286  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
1287- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
1288
1289* UnicodeData.txt changes
1290- new CJK block:
1291  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
1292  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
1293  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
1294
1295* build Unicode tools using CMake+make
1296
1297* run genpname/preparse.pl (on Linux)
1298  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1299  + make sure that data.h is writable
1300  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1301  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1302
1303* rebuild Unicode tools (at least genpname) using make
1304- You might first need to "make install" ICU so that the tools build can pick
1305  up the new definitions from the installed header files.
1306
1307* run genpname
1308- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1309- rebuild ICU & tools
1310
1311* update source/data/unidata/norm2/nfkc_cf.txt
1312- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1313
1314* update source/data/unidata/norm2/uts46.txt
1315- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
1316  to ~/svn.icu/tools/trunk/src/unicode/py
1317- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
1318- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1319- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1320
1321* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1322  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1323- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1324- Unicode 6.0: U+2260, U+226E, U+226F
1325
1326* generate core properties data files
1327- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1328- rebuild ICU & tools
1329- run makeuca.sh so that genuca picks up the new nfc.nrm:
1330  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1331- rebuild ICU & tools
1332
1333* implement new Script_Extensions property (provisional)
1334- parser & generator: genprops & uprops.icu
1335- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
1336- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
1337
1338* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
1339- (one-time change)
1340- genbidi/gencase/genprops tools changes
1341- re-run makeprops.sh (see above)
1342- UCharacterProperty.java, UCharacterTypeIterator.java,
1343  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
1344  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
1345
1346* update Java data files
1347- refresh just the UCD-related files, just to be safe
1348- see (ICU4C)/source/data/icu4j-readme.txt
1349- mkdir /tmp/icu4j
1350- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1351  output:
1352    ...
1353    Unicode .icu files built to ./out/build/icudt45l
1354    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1355    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1356    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
1357    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
1358    mkdir -p /tmp/icu4j/main/shared/data
1359    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1360- copy the big-endian Unicode data files to another location,
1361  separate from the other data files
1362    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1363    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1364    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1365    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
1366    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
1367    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1368    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
1369- refresh ICU4J
1370    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1371
1372* refresh Java test .txt files
1373- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1374
1375* un-hardcode normalization skippable (NF*_Inert) test data
1376- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
1377
1378* copy updated break iterator test files
1379- now handled by early ucdcopy.py and
1380  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
1381  (old instructions:
1382   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
1383   to ~/svn.icu/trunk/src/source/test/testdata)
1384- they are not used in ICU4J
1385
1386* UCA
1387
1388- get output from Mark's tools; look in
1389    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
1390    http://www.macchiato.com/unicode/utc/additional-uca-files
1391    http://www.unicode.org/Public/UCA/6.0.0/
1392    http://www.unicode.org/~mdavis/uca/
1393- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1394- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1395- update Han-implicit ranges for new CJK extensions:
1396  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1397- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
1398  do not add it into invuca so that tailoring primary-after an ignorable works
1399- genuca: permit space between [variable top] bytes
1400- ucol.cpp: treat noncharacters like unassigned rather than ignorable
1401- run makeuca.sh:
1402  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1403- rebuild ICU4C
1404- refresh ICU4J collation data:
1405  (subset of instructions above for properties data refresh, except copies all coll/*)
1406    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1407    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1408    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
1409    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
1410- update (ICU)/source/test/testdata/CollationTest_*.txt
1411  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1412  with output from Mark's Unicode tools
1413- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1414- note on intltest: if collate/UCAConformanceTest fails, then
1415  utility/MultithreadTest/TestCollators will fail as well;
1416  fix the conformance test before looking into the multi-thread test
1417
1418* When refreshing all of ICU4J data from ICU4C
1419- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1420- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1421or
1422- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1423
1424*** LayoutEngine script information
1425
1426(For details see the Unicode 5.2 change log below.)
1427
1428* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1429ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1430ScriptRunData.cpp, which is no longer needed.)
1431
1432The generated files have a current copyright date and "@draft" statement.
1433
1434* copy the above files into <icu>/source/layout, replacing the old files.
1435* fix mixed line endings
1436* review the diffs and fix incorrect @draft and missing aliases;
1437  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1438* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1439
1440---------------------------------------------------------------------------- ***
1441
1442Unicode 5.2 update
1443
1444*** related ICU Trac tickets
1445
14467084 Unicode 5.2
1447
14487167 verify collation bytes
14497235 Java test NAME_ALIAS
14507236 Java DerivedCoreProperties.txt test
14517237 Java BidiTest.txt
14527238 UTrie2 in core unidata
14537239 test for tailoring gaps
14547240 Java fix CollationMiscTest
14557243 update layout engine for Unicode 5.2
1456
1457*** Unicode version numbers
1458- makedata.mak
1459- uchar.h
1460- configure.in & configure
1461- update ucdVersion in gennames.c if an algorithmic range changes
1462
1463*** data files & enums & parser code
1464
1465* file preparation
1466
1467python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
1468- includes finding files regardless of version numbers,
1469  copying them, and performing the equivalent processing of the
1470  ucdstrip and ucdmerge tools on the desired set of files
1471
1472* notes on changes
1473- PropertyAliases.txt
1474  moved from numeric to enumerated:
1475    ccc       ; Canonical_Combining_Class
1476  new string properties:
1477    NFKC_CF   ; NFKC_Casefold
1478    Name_Alias; Name_Alias
1479  new binary properties:
1480    Cased     ; Cased
1481    CI        ; Case_Ignorable
1482    CWCF      ; Changes_When_Casefolded
1483    CWCM      ; Changes_When_Casemapped
1484    CWKCF     ; Changes_When_NFKC_Casefolded
1485    CWL       ; Changes_When_Lowercased
1486    CWT       ; Changes_When_Titlecased
1487    CWU       ; Changes_When_Uppercased
1488  new CJK Unihan properties (not supported by ICU)
1489- PropertyValueAliases.txt
1490  new block names
1491  new scripts
1492  one script code change:
1493    sc ; Qaai      ; Inherited
1494    ->
1495    sc ; Zinh      ; Inherited                        ; Qaai
1496  new Line_Break (lb) value:
1497    lb ; CP        ; Close_Parenthesis
1498  new Joining_Group (jg) values: Farsi_Yeh, Nya
1499  other new values:
1500    ccc; 214; ATA  ; Attached_Above
1501- DerivedBidiClass.txt
1502  new default-R range: U+1E800 - U+1EFFF
1503- UnicodeData.txt
1504  all of the ISO comments are gone
1505  new CJK block end:
1506    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
1507  new CJK block:
1508    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
1509    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
1510
1511* genpname
1512- run preparse.pl
1513  + cd \svn\icuproj\icu\trunk\source\tools\genpname
1514  + make sure that data.h is writable
1515  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
1516  + preparse.pl complains with errors like the following:
1517      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
1518    This is because ICU 4.0 had scripts from ISO 15924 which are now
1519    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
1520    and PropertyValueAliases.txt.
1521    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1522       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
1523  + preparse.pl complains with errors about block names missing from uchar.h; add them
1524
1525* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1526- new block & script values
1527  + 26 new blocks
1528    copy new blocks from Blocks.txt
1529    MS VC++ 2008 regular expression:
1530      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
1531      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
1532  + several new script values already added in ICU 4.0 for ISO 15924 coverage
1533    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
1534  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
1535  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
1536    (added to SyntheticPropertyValueAliases.txt)
1537- new Joining Group (JG) values: Farsi_Yeh, Nya
1538- new Line_Break (lb) value:
1539    lb ; CP        ; Close_Parenthesis
1540
1541* hardcoded Unihan range end/limit
1542- Unihan range end moves from 9FC3 to 9FCB
1543  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
1544  + do change gennames.c
1545
1546* Compare definitions of new binary properties with what we used to use
1547  in algorithms, to see if the definitions changed.
1548- Verified that definitions for Cased and Case_Ignorable are unchanged.
1549  The gencase tool now parses the newly public Case_Ignorable values
1550  in case the definition changes in the future.
1551
1552* uchar.c & uprops.h & uprops.c & genprops
1553- new numeric values that didn't exist in Unicode data before:
1554    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
1555  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
1556  therefore redesign the encoding of numeric types and values for formatVersion 6;
1557  design for simple numbers up to at least 144 ("one gross"),
1558  large values up to at least 10^20,
1559  and fractions with numerators -1..17 and denominators 1..16
1560  to cover current and expected future values
1561  (e.g., more Han numeric values, Meroitic twelfths)
1562
1563* reimplement Hangul_Syllable_Type for new Jamo characters
1564- the old code assumed that all Jamo characters are in the 11xx block
1565- Unicode 5.2 fills holes there and adds new Jamo characters in
1566    A960..A97F; Hangul Jamo Extended-A
1567  and in
1568    D7B0..D7FF; Hangul Jamo Extended-B
1569- Hangul_Syllable_Type can be trivially derived from a subset of
1570  Grapheme_Cluster_Break values
1571
1572* build Unicode data source code for hardcoding core data
1573C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
1574
1575ICU data make path is \svn\icuproj\icu\trunk\source\data\
1576ICU root path is \svn\icuproj\icu\trunk
1577Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1578Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1579Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1580Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1581Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1582Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1583Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1584Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
1585Creating data file for Unicode Property Names
1586Creating data file for Unicode Character Properties
1587Creating data file for Unicode Case Mapping Properties
1588Creating data file for Unicode BiDi/Shaping Properties
1589Creating data file for Unicode Normalization
1590Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
1591Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
1592
1593- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
1594  and rebuild the common library
1595
1596*** UCA
1597
1598- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
1599- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
1600- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
1601[ Begin obsolete instructions:
1602  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
1603    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
1604      on Windows:
1605        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
1606        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
1607  End obsolete instructions]
1608- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
1609  not just the *_STUB.txt files
1610- note on intltest: if collate/UCAConformanceTest fails, then
1611  utility/MultithreadTest/TestCollators will fail as well;
1612  fix the conformance test before looking into the multi-thread test
1613
1614*** Implement Cased & Case_Ignorable properties
1615- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
1616- Problem: These properties should be disjoint, but aren't
1617- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
1618- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
1619
1620*** Implement Changes_When_Xyz properties
1621- without stored data
1622
1623*** Implement Name_Alias property
1624- add it as another name field in unames.icu
1625- make it available via u_charName() and UCharNameChoice and
1626- consider it in u_charFromName()
1627
1628*** Break iterators
1629
1630* Update break iterator rules to new UAX versions and new property values
1631* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
1632
1633*** new BidiTest file
1634- review format and data
1635- copy BidiTest.txt to source/test/testdata
1636- write test code using this data
1637- fix ICU code where it fails the conformance test
1638
1639*** Java
1640- generally, find and update code corresponding to C/C++
1641- UCharacter.UnicodeBlock constants:
1642  a) add an _ID integer per new block, update COUNT
1643  b) add a class instance per new block
1644     Visual Studio regex:
1645        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
1646        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1647- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
1648
1649- port test changes to Java
1650
1651*** LayoutEngine script information
1652
1653(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
1654
1655* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
1656ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
1657ScriptRunData.cpp, which is no longer needed.)
1658
1659The generated files have a current copyright date and "@draft" statement.
1660
1661-> Eric Mader wrote in email on 20090930:
1662    "I think the tool has been modified to update @draft to @stable for
1663     older scripts and to add @draft for new scripts.
1664     (I worked with an intern on this last year.)
1665     You should check the output after you run it."
1666
1667* copy the above files into <icu>/source/layout, replacing the old files.
1668* fix mixed line endings
1669* review the diffs and fix incorrect @draft and missing aliases
1670* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1671
1672Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1673and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1674
1675-> Eric Mader wrote in email on 20090930:
1676    "This is just a matter of making sure that all the per-script tables have
1677     entries for any new scripts that were added.
1678     If any new Indic characters were added, then the class tables in
1679     IndicClassTables.cpp should be updated to reflect this.
1680     John Emmons should know how to do this if it's required."
1681
1682* rebuild the layout and layoutex libraries.
1683
1684*** Documentation
1685- Update User Guide
1686  + Jamo_Short_Name, sfc->scf, binary property value aliases
1687
1688---------------------------------------------------------------------------- ***
1689
1690Unicode 5.1 update
1691
1692*** related ICU Trac tickets
1693
16945696 Update to Unicode 5.1
1695
1696*** Unicode version numbers
1697- makedata.mak
1698- uchar.h
1699- configure.in & configure
1700- update ucdVersion in gennames.c if an algorithmic range changes
1701
1702*** data files & enums & parser code
1703
1704* file preparation
1705- ucdstrip:
1706    DerivedCoreProperties.txt
1707    DerivedNormalizationProps.txt
1708    NormalizationTest.txt
1709    PropList.txt
1710    Scripts.txt
1711    GraphemeBreakProperty.txt
1712    SentenceBreakProperty.txt
1713    WordBreakProperty.txt
1714- ucdstrip and ucdmerge:
1715    EastAsianWidth.txt
1716    LineBreak.txt
1717
1718* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
1719copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
1720copy 5.1.0\ucd\Blocks.txt ..\unidata\
1721copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
1722copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
1723copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
1724copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
1725copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
1726copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
1727copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
1728copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
1729copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
1730copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
1731copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
1732
1733ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
1734ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
1735ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
1736ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
1737ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
1738ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
1739ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
1740ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
1741ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
1742ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
1743
1744* genpname
1745- run preparse.pl
1746  + cd \svn\icuproj\icu\uni51\source\tools\genpname
1747  + make sure that data.h is writable
1748  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
1749  + preparse.pl complains with errors like the following:
1750      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
1751    This is because ICU 3.8 had scripts from ISO 15924 which are now
1752    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
1753    and PropertyValueAliases.txt.
1754    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
1755       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
1756  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
1757      N/Y, No/Yes, F/T, False/True
1758    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
1759       It will use further values from the file if present.
1760
1761* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1762- new block & script values
1763  + 17 new blocks
1764  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
1765    (removed from SyntheticPropertyValueAliases.txt)
1766  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
1767    (added to SyntheticPropertyValueAliases.txt)
1768- uprops.icu (uprops.h) only provides 7 bits for script codes.
1769  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
1770  There is none above 127 yet which is the script code for an
1771  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
1772  script code values greater than 127.
1773  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
1774  in a parallel bit field, and that overflows now.
1775  Also, future values >=128 would be incompatible anyway.
1776  uprops.h is modified to move around several of the bit fields
1777  in the properties vector words, and now uses 8 bits for the script code.
1778  Two other bit fields also grow to accommodate future growth:
1779  Block (current count: 172) grows from 8 to 9 bits,
1780  and Word_Break grows from 4 to 5 bits.
1781- renamed property Simple_Case_Folding (sfc->scf)
1782  + nothing to be done: handled as normal alias
1783- new property JSN Jamo_Short_Name
1784  + no new API: only contributes to the Name property
1785- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
1786- new Joining Group (JG) value: Burushashki_Yeh_Barree
1787- new Sentence_Break (SB) values:
1788    SB ; CR        ; CR
1789    SB ; EX        ; Extend
1790    SB ; LF        ; LF
1791    SB ; SC        ; SContinue
1792- new Word_Break (WB) values:
1793    WB ; CR        ; CR
1794    WB ; Extend    ; Extend
1795    WB ; LF        ; LF
1796    WB ; MB        ; MidNumLet
1797
1798* Further changes in the 2008-02-29 update:
1799- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
1800  because they should not normally be invisible.
1801- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
1802- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
1803- new Word_Break (WB) value: NL=Newline
1804
1805* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
1806- Unihan range end moves from 9FBB to 9FC3
1807  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
1808  + do change gennames.c
1809
1810* build Unicode data source code for hardcoding core data
1811C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
1812
1813ICU data make path is \svn\icuproj\icu\uni51\source\data\
1814ICU root path is \svn\icuproj\icu\uni51
1815Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1816Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
1817Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
1818Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
1819Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
1820Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
1821Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
1822Creating data file for Unicode Character Properties
1823Creating data file for Unicode Case Mapping Properties
1824Creating data file for Unicode BiDi/Shaping Properties
1825Creating data file for Unicode Normalization
1826Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
1827Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
1828
1829- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
1830  and rebuild the common library
1831
1832*** Break iterators
1833
1834* Update break iterator rules to new UAX versions and new property values
1835
1836*** UCA
1837
1838* update FractionalUCA.txt and UCARules.txt with new canonical closure
1839
1840*** Test suites
1841- Test that APIs using Unicode property value aliases (like UnicodeSet)
1842  support all of the boolean values N/Y, No/Yes, F/T, False/True
1843  -> TestBinaryValues() tests in both cintltst and intltest
1844
1845*** LayoutEngine script information
1846* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
1847ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
1848ScriptRunData.cpp, which is no longer needed.)
1849
1850The generated files have a current copyright date and "@draft" statement.
1851
1852* copy the above files into <icu>/source/layout, replacing the old files.
1853
1854Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1855and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1856
1857* rebuild the layout and layoutex libraries.
1858
1859*** Documentation
1860- Update User Guide
1861  + Jamo_Short_Name, sfc->scf, binary property value aliases
1862
1863---------------------------------------------------------------------------- ***
1864
1865Unicode 5.0 update
1866
1867*** related Jitterbugs
1868
18695084 RFE: Update to Unicode 5.0
1870
1871*** data files & enums & parser code
1872
1873* file preparation
1874- ucdstrip:
1875    DerivedCoreProperties.txt
1876    DerivedNormalizationProps.txt
1877    NormalizationTest.txt
1878    PropList.txt
1879    Scripts.txt
1880    GraphemeBreakProperty.txt
1881    SentenceBreakProperty.txt
1882    WordBreakProperty.txt
1883- ucdstrip and ucdmerge:
1884    EastAsianWidth.txt
1885    LineBreak.txt
1886
1887* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
1888copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
1889copy 5.0.0\ucd\Blocks.txt ..\unidata\
1890copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
1891copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
1892copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
1893copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
1894copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
1895copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
1896copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
1897copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
1898copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
1899copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
1900copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
1901
1902ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
1903ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
1904ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
1905ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
1906ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
1907ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
1908ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
1909ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
1910ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
1911ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
1912
1913* update FractionalUCA.txt and UCARules.txt with new canonical closure
1914
1915* genpname
1916- run preparse.pl
1917  + make sure that data.h is writable
1918  + perl preparse.pl \cvs\oss\icu > out.txt
1919
1920* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1921- new block & script values
1922  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
1923
1924* build Unicode data source code for hardcoding core data
1925C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
1926
1927ICU data make path is \cvs\oss\icu\source\data\
1928ICU root path is \cvs\oss\icu
1929Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
1930[etc.]
1931Creating data file for Unicode Character Properties
1932Creating data file for Unicode Case Mapping Properties
1933Creating data file for Unicode BiDi/Shaping Properties
1934Creating data file for Unicode Normalization
1935Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
1936Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
1937
1938- copy the .c source files to C:\cvs\oss\icu\source\common
1939  and rebuild the common library
1940
1941*** Unicode version numbers
1942- makedata.mak
1943- uchar.h
1944- configure.in
1945
1946*** LayoutEngine script information
1947* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
1948ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
1949ScriptRunData.cpp, which is no longer needed.)
1950
1951The generated files have a current copyright date and "@draft" statement.
1952
1953* copy the above files into <icu>/source/layout, replacing the old files.
1954
1955Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
1956and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
1957
1958* rebuild the layout and layoutex libraries.
1959
1960---------------------------------------------------------------------------- ***
1961
1962Unicode 4.1 update
1963
1964*** related Jitterbugs
1965
19664332 RFE: Update to Unicode 4.1
19674157 RBBI, TR29 4.1 updates
1968
1969*** data files & enums & parser code
1970
1971* file preparation
1972- ucdstrip:
1973    DerivedCoreProperties.txt
1974    DerivedNormalizationProps.txt
1975    NormalizationTest.txt
1976    GraphemeBreakProperty.txt
1977    SentenceBreakProperty.txt
1978    WordBreakProperty.txt
1979- ucdstrip and ucdmerge:
1980    EastAsianWidth.txt
1981    LineBreak.txt
1982
1983* add new files to the repository
1984    GraphemeBreakProperty.txt
1985    SentenceBreakProperty.txt
1986    WordBreakProperty.txt
1987
1988* update FractionalUCA.txt and UCARules.txt with new canonical closure
1989
1990* genpname
1991- handle new enumerated properties in sub read_uchar
1992- run preparse.pl
1993
1994* uchar.h & uscript.h & uprops.h & uprops.c & genprops
1995- new binary properties
1996  + Pattern_Syntax
1997  + Pattern_White_Space
1998- new enumerated properties
1999  + Grapheme_Cluster_Break
2000  + Sentence_Break
2001  + Word_Break
2002- new block & script & line break values
2003
2004* gencase
2005- case-ignorable changes
2006  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2007  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2008
2009*** Unicode version numbers
2010- makedata.mak
2011- uchar.h
2012- configure.in
2013
2014*** tests
2015- verify that u_charMirror() round-trips
2016- test all new properties and some new values of old properties
2017
2018*** other code
2019
2020* hardcoded Unihan range end/limit
2021- Unihan range end moves from 9FA5 to 9FBB
2022  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2023  + do not modify BOCU/BOCSU code because that would change the encoding
2024    and break binary compatibility!
2025  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2026    NamePrepProfile.txt
2027  + ignore trietest.c: test data is arbitrary
2028  + ignore tstnorm.cpp: test optimization, not important
2029  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2030  + do change line_th.txt and word_th.txt
2031    by replacing hardcoded ranges with the new property values
2032  + do change gennames.c
2033
2034source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2035source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2036source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
2037
2038* case mappings
2039- compare new special casing context conditions with previous ones
2040  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2041
2042* genpname
2043- consider storing only the short name if it is the same as the long name
2044
2045*** other reviews
2046- UAX #29 changes (grapheme/word/sentence breaks)
2047- UAX #14 changes (line breaks)
2048- Pattern_Syntax & Pattern_White_Space
2049
2050---------------------------------------------------------------------------- ***
2051
2052Unicode 4.0.1 update
2053
2054*** related Jitterbugs
2055
20563170 RFE: Update to Unicode 4.0.1
20573171 Add new Unicode 4.0.1 properties
20583520 use Unicode 4.0.1 updates for break iteration
2059
2060*** data files & enums & parser code
2061
2062* file preparation
2063- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2064- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2065
2066* file fixes
2067- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2068  according to PRI #26
2069  http://www.unicode.org/review/resolved-pri.html#pri26
2070- undone again because no corrigendum in sight;
2071  instead modified tests to not check consistency on this for Unicode 4.0.1
2072
2073* ucdterms.txt
2074- update from http://www.unicode.org/copyright.html
2075  formatted for plain text
2076
2077* uchar.h & uprops.h & uprops.c & genprops
2078- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2079- add U_LB_INSEPARABLE due to a spelling fix
2080  + put short name comment only on line with new constant
2081    for genpname perl script parser
2082- new binary properties
2083  + STerm
2084  + Variation_Selector
2085
2086* genpname
2087- fix genpname perl script so that it doesn't choke on more than 2 names per property value
2088- perl script: correctly calculate the maximum number of fields per row
2089
2090* uscript.h
2091- new script code Hrkt=Katakana_Or_Hiragana
2092
2093* gennorm.c track changes in DerivedNormalizationProps.txt
2094- "FNC" -> "FC_NFKC"
2095- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2096
2097* genprops/props2.c track changes in DerivedNumericValues.txt
2098- changed from 3 columns to 2, dropping the numeric type
2099  + assume that the type is always numeric for Han characters,
2100    and that only those are added in addition to what UnicodeData.txt lists
2101
2102*** Unicode version numbers
2103- makedata.mak
2104- uchar.h
2105- configure.in
2106
2107*** tests
2108- update test of default bidi classes according to PRI #28
2109  /tsutil/cucdtst/TestUnicodeData
2110  http://www.unicode.org/review/resolved-pri.html#pri28
2111- bidi tests: change exemplar character for ES depending on Unicode version
2112- change hardcoded expected property values where they change
2113
2114*** other code
2115
2116* name matching
2117- read UCD.html
2118
2119* scripts
2120- use new Hrkt=Katakana_Or_Hiragana
2121
2122* ZWJ & ZWNJ
2123- are now part of combining character sequences
2124- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
2125