• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2016 and later: Unicode, Inc. and others.
2* License & terms of use: http://www.unicode.org/copyright.html
3* Copyright (C) 2004-2016, International Business Machines
4* Corporation and others.  All Rights Reserved.
5*
6*   file name:  changes.txt
7*   encoding:   US-ASCII
8*   tab size:   8 (not used)
9*   indentation:4
10*
11*   created on: 2004may06
12*   created by: Markus W. Scherer
13*
14* change log for Unicode updates
15
16---------------------------------------------------------------------------- ***
17
18* New ISO 15924 script codes
19
20Starting with ICU 55, we do not add UScriptCode constants for new scripts any more
21until they are encoded in Unicode,
22or can be assumed to be encoded in the next Unicode version.
23Script enum constant names want to follow the Unicode script property value aliases,
24which are assigned only when the scripts are encoded.
25When we encode scripts early and guess wrong, then we have confusing enum constants
26and have sometimes added aliases.
27
28Variant script codes like Latf and Aran that are not subject to separate encoding
29can be added at any time.
30(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.)
31
32We add script codes used in CLDR or in the spoof checker.
33This includes combination/alias codes like Hanb and Jamo.
34See http://unicode.org/reports/tr35/#unicode_script_subtag_validity
35and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html
36
37We add special Z* script codes like Zsye.
38
39For new script codes see http://www.unicode.org/iso15924/codechanges.html
40
41---------------------------------------------------------------------------- ***
42
43Unicode 9.0 update for ICU 58
44
45* Command-line environment setup
46
47ICU_ROOT=~/svn.icu/trunk
48ICU_SRC_DIR=$ICU_ROOT/src
49ICUDT=icudt58b
50export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
51SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
52UNIDATA=$ICU_SRC_DIR/source/data/unidata
53
54http://www.unicode.org/review/pri323/  -- beta review
55http://www.unicode.org/reports/uax-proposed-updates.html
56http://www.unicode.org/versions/beta-9.0.0.html
57http://www.unicode.org/versions/Unicode9.0.0/
58http://www.unicode.org/reports/tr44/tr44-17.html
59
60*** ICU Trac
61
62- ticket:12526: integrate Unicode 9
63- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b
64- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b
65
66*** CLDR Trac
67
68- cldrbug 9414: UCA 9
69- ^/branches/markus/uni90 at r11518 from trunk at r11517
70
71- cldrbug 8745: Unicode 9.0 script metadata
72
73*** Unicode version numbers
74- makedata.mak
75- uchar.h
76- com.ibm.icu.util.VersionInfo
77- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
78
79- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
80  so that the makefiles see the new version number.
81
82*** data files & enums & parser code
83
84* file preparation
85
86- download UCD & IDNA files
87- make sure that the Unicode data folder passed into preparseucd.py
88  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
89- only for manual diffs: remove version suffixes from the file names
90  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
91  (see https://sites.google.com/site/unicodetools/inputdata)
92- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
93- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src
94- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
95
96- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt
97  and copy to $UNIDATA
98    cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA
99
100* preparseucd.py changes
101- remove or add new Unicode scripts from/to the
102  only-in-ISO-15924 list according to the error messages:
103    ValueError: remove ['Tang'] from _scripts_only_in_iso15924
104    ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD
105    ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD
106    ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD
107  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
108      and in com.ibm.icu.dev.test.lang.TestUScript.java
109- DerivedNumericValues.txt new numeric values
110    0D58          ; 0.00625 ; ; 1/160 # No       MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH
111    0D59          ; 0.025 ; ; 1/40 # No       MALAYALAM FRACTION ONE FORTIETH
112    0D5A          ; 0.0375 ; ; 3/80 # No       MALAYALAM FRACTION THREE EIGHTIETHS
113    0D5B          ; 0.05 ; ; 1/20 # No       MALAYALAM FRACTION ONE TWENTIETH
114    0D5D          ; 0.15 ; ; 3/20 # No       MALAYALAM FRACTION THREE TWENTIETHS
115  -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(),
116     uchar.c, UCharacterProperty.java
117     to support a new series of values
118- adjust preparseucd.py for Tangut algorithmic names
119  in ppucd.txt:
120    algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH-
121  ->
122    algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH-
123- avoid block-compressing most String/Miscellaneous property values,
124  triggered by genprops not coping with a multi-code point Case_Folding on
125    block;1C80..1C8F;...;Cased;cf=0442;CWCF;...
126  keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors
127
128* PropertyAliases.txt changes
129- 1 new property PCM=Prepended_Concatenation_Mark
130  Ignore: Only useful for layout engines.
131  Ok to list in ppucd.txt.
132
133* PropertyValueAliases.txt new property values
134    blk; Adlam                            ; Adlam
135    blk; Bhaiksuki                        ; Bhaiksuki
136    blk; Cyrillic_Ext_C                   ; Cyrillic_Extended_C
137    blk; Glagolitic_Sup                   ; Glagolitic_Supplement
138    blk; Ideographic_Symbols              ; Ideographic_Symbols_And_Punctuation
139    blk; Marchen                          ; Marchen
140    blk; Mongolian_Sup                    ; Mongolian_Supplement
141    blk; Newa                             ; Newa
142    blk; Osage                            ; Osage
143    blk; Tangut                           ; Tangut
144    blk; Tangut_Components                ; Tangut_Components
145  -> add to uchar.h
146    use long property names for enum constants
147  -> add to UCharacter.UnicodeBlock IDs
148    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
149            replace  public static final int \1_ID = \2; \3
150  -> add to UCharacter.UnicodeBlock objects
151    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
152            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
153
154    GCB; EB                               ; E_Base
155    GCB; EBG                              ; E_Base_GAZ
156    GCB; EM                               ; E_Modifier
157    GCB; GAZ                              ; Glue_After_Zwj
158    GCB; ZWJ                              ; ZWJ
159  -> uchar.h & UCharacter.GraphemeClusterBreak
160
161    jg ; African_Feh                      ; African_Feh
162    jg ; African_Noon                     ; African_Noon
163    jg ; African_Qaf                      ; African_Qaf
164  -> uchar.h & UCharacter.JoiningGroup
165
166    lb ; EB                               ; E_Base
167    lb ; EM                               ; E_Modifier
168    lb ; ZWJ                              ; ZWJ
169  -> uchar.h & UCharacter.LineBreak
170
171    sc ; Adlm                             ; Adlam
172    sc ; Bhks                             ; Bhaiksuki
173    sc ; Marc                             ; Marchen
174    sc ; Newa                             ; Newa
175    sc ; Osge                             ; Osage
176    sc ; Tang                             ; Tangut
177  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
178
179    WB ; EB                               ; E_Base
180    WB ; EBG                              ; E_Base_GAZ
181    WB ; EM                               ; E_Modifier
182    WB ; GAZ                              ; Glue_After_Zwj
183    WB ; ZWJ                              ; ZWJ
184  -> uchar.h & UCharacter.WordBreak
185
186* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
187    (not strictly necessary for NOT_ENCODED scripts)
188  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
189
190* generate normalization data files
191  cd $ICU_ROOT/dbg
192  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
193  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
194  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
195  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
196  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
197
198* build ICU (make install)
199  so that the tools build can pick up the new definitions from the installed header files.
200
201  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt
202
203* build Unicode tools using CMake+make
204
205~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
206
207  # Location (--prefix) of where ICU was installed.
208  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
209  # Location of the ICU source tree.
210  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
211
212  ~/svn.icutools/trunk/dbg/unicode/c$
213    cmake ../../../src/unicode/c
214    make
215
216* generate core properties data files
217  ~/svn.icutools/trunk/dbg/unicode/c$
218    genprops/genprops $ICU_SRC_DIR
219    genuca/genuca --hanOrder implicit $ICU_SRC_DIR
220    genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
221- rebuild ICU (make install) & tools
222
223* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
224  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
225- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
226- Unicode 6.0..9.0: U+2260, U+226E, U+226F
227- nothing new in 9.0, no test file to update
228
229* run & fix ICU4C tests
230- Andy handles RBBI & spoof check test failures
231
232* collation: CLDR collation root, UCA DUCET
233
234- UCA DUCET goes into Mark's Unicode tools, see
235  https://sites.google.com/site/unicodetools/home#TOC-UCA
236- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
237    cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
238
239- cd (CLDR UCA branch)/common/uca/
240- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
241    cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
242- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
243    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
244    (note removing the underscore before "Rules")
245    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
246- restore TODO diffs in UCARules.txt
247    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
248- update (ICU4C)/source/test/testdata/CollationTest_*.txt
249  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
250  from the CLDR root files (..._CLDR_..._SHORT.txt)
251    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
252    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
253    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
254- if CLDR common/uca/unihan-index.txt changes, then update
255  CLDR common/collation/root.xml <collation type="private-unihan">
256  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
257
258- run genuca, see command line above;
259  deal with
260    Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt:
261    FDD1 104B5;     [75 B8 02, 05, 05]      # Osage first primary (compressible)
262        (add the character to genuca.cpp sampleCharsToScripts[])
263  + look up the USCRIPT_ code for the new sample characters
264    (should be obvious from the comment in the error output)
265  + *add* mappings to sampleCharsToScripts[], do not replace them
266    (in case the script sample characters flip-flop)
267  + insert new scripts in DUCET script order, see the top_byte table
268    at the beginning of FractionalUCA.txt
269- rebuild ICU4C
270
271* Unihan collators
272- run Unicode Tools
273    org.unicode.draft.GenerateUnihanCollators
274  with VM arguments
275    -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk
276    -DOTHER_WORKSPACE=/home/mscherer/svn.unitools
277    -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data
278    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
279    -DUVERSION=9.0.0
280    -ea
281- run Unicode Tools
282    org.unicode.draft.GenerateUnihanCollatorFiles
283  with the same arguments
284- check CLDR diffs
285    cd ~/svn.cldr/trunk
286    meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml
287    meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml
288- copy to CLDR
289    cd ~/svn.cldr/trunk
290    cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml
291    cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml
292- commit to CLDR
293- generate ICU zh collation data: run CLDR
294    org.unicode.cldr.icu.NewLdml2IcuConverter
295  with program arguments
296    -t collation
297    -s /home/mscherer/svn.cldr/trunk/common/collation
298    -m /home/mscherer/svn.cldr/trunk/common/supplemental
299    -d /home/mscherer/svn.icu/trunk/src/source/data/coll
300    -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation
301    zh
302  and VM arguments
303    -DCLDR_DIR=/home/mscherer/svn.cldr/trunk
304- rebuild ICU4C
305
306* run & fix ICU4C tests, now with new CLDR collation root data
307- run all tests with the collation test data *_SHORT.txt or the full files
308  (the full ones have comments, useful for debugging)
309- note on intltest: if collate/UCAConformanceTest fails, then
310  utility/MultithreadTest/TestCollators will fail as well;
311  fix the conformance test before looking into the multi-thread test
312
313* update Java data files
314- refresh just the UCD/UCA-related/derived files, just to be safe
315- see (ICU4C)/source/data/icu4j-readme.txt
316- mkdir /tmp/icu4j
317- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
318  output:
319    ...
320    Unicode .icu files built to ./out/build/icudt58l
321    echo timestamp > uni-core-data
322    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b
323    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b
324    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
325    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b
326    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b"
327    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/
328    mkdir -p /tmp/icu4j/main/shared/data
329    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
330    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/
331    mkdir -p /tmp/icu4j/main/shared/data
332    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
333    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
334- copy the big-endian Unicode data files to another location,
335  separate from the other data files,
336  and then refresh ICU4J
337    cd ~/svn.icu/trunk/dbg/data/out/icu4j
338    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
339    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
340    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
341    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
342    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
343    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
344    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
345    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
346    jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
347
348* When refreshing all of ICU4J data from ICU4C
349- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
350- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
351or
352- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
353
354* update CollationFCD.java
355  + copy & paste the initializers of lcccIndex[] etc. from
356    ICU4C/source/i18n/collationfcd.cpp to
357    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
358
359* refresh Java test .txt files
360- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
361    cd $ICU_SRC_DIR/source/data/unidata
362    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
363    cd ../../test/testdata
364    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
365    cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
366
367* run & fix ICU4J tests
368
369*** LayoutEngine script information
370
371* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
372  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
373  in the working directory.
374
375  (It also generates ScriptRunData.cpp, which is no longer needed.)
376
377  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
378  (a plain text file)
379  which maps ICU versions to the numbers of script/language constants
380  that were added then.
381  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
382
383  The generated files have a current copyright date and "@deprecated" statement.
384
385* Review changes, fix Java tool if necessary, and copy to ICU4C
386  cd ~/svn.icu4j/trunk/src
387  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
388  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
389  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
390
391*** API additions
392- send notice to icu-design about new born-@stable API (enum constants etc.)
393
394*** merge the Unicode update branches back onto the trunk
395- do not merge the icudata.jar and testdata.jar,
396  instead rebuild them from merged & tested ICU4C
397- make sure that changes to Unicode tools & ICU tools are checked in
398  http://www.unicode.org/utility/trac/log/trunk/unicodetools
399  http://bugs.icu-project.org/trac/log/tools/trunk
400
401---------------------------------------------------------------------------- ***
402
403New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764
404
405Adding
406- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge
407- new combination/alias codes: Hanb, Jamo
408  - used in CLDR 29 and in spoof checker
409- new Z* code: Zsye
410
411Add new codes to uscript.h & UScript.java, see Unicode update logs.
412  -> com.ibm.icu.lang.UScript
413    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
414    replace  public static final int \1 = \2; \3
415
416Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h,
417add new script codes.
418"Long" script names only where established in Unicode 9 PropertyValueAliases.txt.
419
420Note: If we have to run preparseucd.py again before the Unicode 9 update,
421then we need to manually keep/restore the new script codes.
422
423ICU_ROOT=~/svn.icu/trunk
424ICU_SRC_DIR=$ICU_ROOT/src
425ICUDT=icudt57b
426export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
427SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
428UNIDATA=$ICU_SRC_DIR/source/data/unidata
429
430Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files,
431see http://bugs.icu-project.org/trac/ticket/12141
432
433make install, then icutools cmake & make, then
434~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
435
436Generate Java data as usual, only update pnames.icu & uprops.icu.
437
438*** LayoutEngine script information
439
440* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
441  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
442  in the working directory.
443
444  (It also generates ScriptRunData.cpp, which is no longer needed.)
445
446  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
447  (a plain text file)
448  which maps ICU versions to the numbers of script/language constants
449  that were added then.
450  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
451
452  The generated files have a current copyright date and "@deprecated" statement.
453
454* Review changes, fix Java tool if necessary, and copy to ICU4C
455  cd ~/svn.icu4j/trunk/src
456  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
457  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
458  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
459
460---------------------------------------------------------------------------- ***
461
462Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802
463
464Edit preparseucd.py to add & parse new properties.
465They share the UCD property namespace but are not listed in PropertyAliases.txt.
466
467Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/
468Initial data from emoji/2.0/
469
470ICU_ROOT=~/svn.icu/trunk
471ICU_SRC_DIR=$ICU_ROOT/src
472ICUDT=icudt56b
473export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
474SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
475UNIDATA=$ICU_SRC_DIR/source/data/unidata
476
477Add binary-property constants to uchar.h enum UProperty & UProperty.java.
478
479~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src
480(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.)
481
482Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java
483
484make install, then icutools cmake & make, then
485~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR
486
487Generate Java data as usual, only update pnames.icu & uprops.icu.
488
489---------------------------------------------------------------------------- ***
490
491Unicode 8.0 update for ICU 56
492
493* Command-line environment setup
494
495ICU_ROOT=~/svn.icu/trunk
496ICU_SRC_DIR=$ICU_ROOT/src
497ICUDT=icudt56b
498export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
499SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
500UNIDATA=$ICU_SRC_DIR/source/data/unidata
501
502http://www.unicode.org/review/pri297/  -- beta review
503http://www.unicode.org/reports/uax-proposed-updates.html
504http://unicode.org/versions/beta-8.0.0.html
505http://www.unicode.org/versions/Unicode8.0.0/
506http://www.unicode.org/reports/tr44/tr44-15.html
507
508*** ICU Trac
509
510- ticket:11574: Unicode 8
511- C++ branches/markus/uni80 at r37351 from trunk at r37343
512- Java branches/markus/uni80 at r37352 from trunk at r37338
513
514*** CLDR Trac
515
516- cldrbug 8311: UCA 8
517- branches/markus/uni80 at r11518 from trunk at r11517
518
519- cldrbug 8109: Unicode 8.0 script metadata
520- cldrbug 8418: Updated segmentation for Unicode 8.0
521
522*** Unicode version numbers
523- makedata.mak
524- uchar.h
525- com.ibm.icu.util.VersionInfo
526- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
527
528- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
529  so that the makefiles see the new version number.
530
531*** data files & enums & parser code
532
533* file preparation
534
535- download UCD & IDNA files
536- make sure that the Unicode data folder passed into preparseucd.py
537  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
538- only for manual diffs: remove version suffixes from the file names
539  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
540  (see https://sites.google.com/site/unicodetools/inputdata)
541- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
542- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src
543- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
544
545- also: from http://unicode.org/Public/security/8.0.0/ download new
546  confusables.txt & confusablesWholeScript.txt
547  and copy to $UNIDATA
548    ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA
549    ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA
550
551* initial preparseucd.py changes
552- remove new Unicode scripts from the
553  only-in-ISO-15924 list according to the error message:
554    ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw']
555    from _scripts_only_in_iso15924
556  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
557      and in com.ibm.icu.dev.test.lang.TestUScript.java
558- property and file name change:
559    IndicMatraCategory -> IndicPositionalCategory
560- UnicodeData.txt unusual numeric values (improper fractions)
561    109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;;
562    109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;;
563    109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;;
564    109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;;
565    109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;;
566    109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;;
567    109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;;
568    109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;;
569    109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;;
570    109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;;
571  -> change preparseucd.py to map them to proper fractions (e.g., 1/6)
572     which are listed in DerivedNumericValues.txt;
573     keeps storage in data file simple
574
575* PropertyValueAliases.txt changes
576- 10 new Block (blk) values:
577    blk; Ahom                             ; Ahom
578    blk; Anatolian_Hieroglyphs            ; Anatolian_Hieroglyphs
579    blk; Cherokee_Sup                     ; Cherokee_Supplement
580    blk; CJK_Ext_E                        ; CJK_Unified_Ideographs_Extension_E
581    blk; Early_Dynastic_Cuneiform         ; Early_Dynastic_Cuneiform
582    blk; Hatran                           ; Hatran
583    blk; Multani                          ; Multani
584    blk; Old_Hungarian                    ; Old_Hungarian
585    blk; Sup_Symbols_And_Pictographs      ; Supplemental_Symbols_And_Pictographs
586    blk; Sutton_SignWriting               ; Sutton_SignWriting
587  -> add to uchar.h
588    use long property names for enum constants
589  -> add to UCharacter.UnicodeBlock IDs
590    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
591            replace  public static final int \1_ID = \2; \3
592  -> add to UCharacter.UnicodeBlock objects
593    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
594            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
595- 6 new Script (sc) values:
596    sc ; Ahom                             ; Ahom
597    sc ; Hatr                             ; Hatran
598    sc ; Hluw                             ; Anatolian_Hieroglyphs
599    sc ; Hung                             ; Old_Hungarian
600    sc ; Mult                             ; Multani
601    sc ; Sgnw                             ; SignWriting
602  -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript
603
604* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
605    (not strictly necessary for NOT_ENCODED scripts)
606  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
607
608* generate normalization data files
609  cd $ICU_ROOT/dbg
610  bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
611  bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
612  bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
613  bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
614  bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
615
616* build ICU (make install)
617  so that the tools build can pick up the new definitions from the installed header files.
618
619  $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
620
621* build Unicode tools using CMake+make
622
623~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
624
625  # Location (--prefix) of where ICU was installed.
626  set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst)
627  # Location of the ICU source tree.
628  set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src)
629
630  ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
631  ~/svn.icutools/trunk/dbg/unicode/c$ make
632
633* generate core properties data files
634- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
635- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR
636- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR
637- rebuild ICU (make install) & tools
638- run genuca again (see step above) so that it picks up the new nfc.nrm
639- rebuild ICU (make install) & tools
640
641* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
642  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
643- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
644- Unicode 6.0..8.0: U+2260, U+226E, U+226F
645- nothing new in 8.0, no test file to update
646
647* run & fix ICU4C tests
648- bad Cherokee case folding due to difference in fallbacks:
649  UCD case folding falls back to no mapping,
650  ICU runtime case folding falls back to lowercasing;
651  fixed casepropsbuilder.cpp to generate scf mappings to self
652  when there is an slc mapping but no scf
653- Andy handles RBBI & spoof check test failures
654
655* collation: CLDR collation root, UCA DUCET
656
657- UCA DUCET goes into Mark's Unicode tools, see
658  https://sites.google.com/site/unicodetools/home#TOC-UCA
659- CLDR root data files are checked into (CLDR UCA branch)/common/uca/
660- cd (CLDR UCA branch)/common/uca/
661- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
662  cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
663- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
664    cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt
665    (note removing the underscore before "Rules")
666    cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
667- restore TODO diffs in UCARules.txt
668    meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
669- update (ICU4C)/source/test/testdata/CollationTest_*.txt
670  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
671  from the CLDR root files (..._CLDR_..._SHORT.txt)
672    cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
673    cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
674    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
675- if CLDR common/uca/unihan-index.txt changes, then update
676  CLDR common/collation/root.xml <collation type="private-unihan">
677  and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt
678- run genuca, see command line above;
679  deal with
680    Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt
681        (add the character to genuca.cpp sampleCharsToScripts[])
682  + look up the script for the new sample characters
683    (e.g., in FractionalUCA.txt)
684  + *add* mappings to sampleCharsToScripts[], do not replace them
685    (in case the script sample characters flip-flop)
686  + insert new scripts in DUCET script order, see the top_byte table
687    at the beginning of FractionalUCA.txt
688- rebuild ICU4C
689
690* run & fix ICU4C tests, now with new CLDR collation root data
691- run all tests with the collation test data *_SHORT.txt or the full files
692  (the full ones have comments, useful for debugging)
693- note on intltest: if collate/UCAConformanceTest fails, then
694  utility/MultithreadTest/TestCollators will fail as well;
695  fix the conformance test before looking into the multi-thread test
696- fixed bug in CollationWeights::getWeightRanges()
697  exposed by new data and CollationTest::TestRootElements
698
699* update Java data files
700- refresh just the UCD/UCA-related/derived files, just to be safe
701- see (ICU4C)/source/data/icu4j-readme.txt
702- mkdir /tmp/icu4j
703- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
704  output:
705    ...
706    Unicode .icu files built to ./out/build/icudt56l
707    echo timestamp > uni-core-data
708    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b
709    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b
710    echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt
711    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b
712    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b"
713    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/
714    mkdir -p /tmp/icu4j/main/shared/data
715    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
716    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/
717    mkdir -p /tmp/icu4j/main/shared/data
718    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
719    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data'
720- copy the big-endian Unicode data files to another location,
721  separate from the other data files,
722  and then refresh ICU4J
723    cd ~/svn.icu/trunk/dbg/data/out/icu4j
724    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
725    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
726    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
727    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
728    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
729    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
730    cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
731    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
732    jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
733
734* When refreshing all of ICU4J data from ICU4C
735- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
736- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
737or
738- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
739
740* update CollationFCD.java
741  + copy & paste the initializers of lcccIndex[] etc. from
742    ICU4C/source/i18n/collationfcd.cpp to
743    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
744
745* refresh Java test .txt files
746- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
747    cd $ICU_SRC_DIR/source/data/unidata
748    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
749    cd ../../test/testdata
750    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
751    cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
752
753* run & fix ICU4J tests
754
755*** LayoutEngine script information
756
757* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more,
758  because the layout engine was deprecated in ICU 54.
759  Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java
760  to write lines that we used to add manually.
761
762* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
763  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
764  in the working directory.
765
766  (It also generates ScriptRunData.cpp, which is no longer needed.)
767
768  It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages
769  (a plain text file)
770  which maps ICU versions to the numbers of script/language constants
771  that were added then.
772  (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.)
773
774  The generated files have a current copyright date and "@deprecated" statement.
775
776* Review changes, fix Java tool if necessary, and copy to ICU4C
777  cd ~/svn.icu4j/trunk/src
778  meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
779  cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout
780  cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout
781
782*** API additions
783- send notice to icu-design about new born-@stable API (enum constants etc.)
784
785*** merge the Unicode update branches back onto the trunk
786- do not merge the icudata.jar and testdata.jar,
787  instead rebuild them from merged & tested ICU4C
788- make sure that changes to Unicode tools & ICU tools are checked in
789  http://www.unicode.org/utility/trac/log/trunk/unicodetools
790  http://bugs.icu-project.org/trac/log/tools/trunk
791
792---------------------------------------------------------------------------- ***
793
794Unicode 7.0 update for ICU 54
795
796http://www.unicode.org/review/pri271/  -- beta review
797http://www.unicode.org/reports/uax-proposed-updates.html
798http://www.unicode.org/versions/beta-7.0.0.html#notable_issues
799http://www.unicode.org/reports/tr44/tr44-13.html
800
801*** ICU Trac
802
803- ticket 10821: Unicode 7.0, UCA 7.0
804- C++ branches/markus/uni70 at r35584 from trunk at r35580
805- Java branches/markus/uni70 at r35587 from trunk at r35545
806
807*** CLDR Trac
808
809- ticket 7195: UCA 7.0 CLDR root collation
810- branches/markus/uni70 at r10062 from trunk at r10061
811
812- ticket 6762: script metadata for Unicode 7.0 new scripts
813
814*** Unicode version numbers
815- makedata.mak
816- uchar.h
817- com.ibm.icu.util.VersionInfo
818- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
819
820- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
821  so that the makefiles see the new version number.
822
823*** data files & enums & parser code
824
825* file preparation
826
827- download UCD & IDNA files
828- make sure that the Unicode data folder passed into preparseucd.py
829  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
830- only for manual diffs: remove version suffixes from the file names
831  ~/unidata/uni70/20140403$ ../../desuffixucd.py .
832  (see https://sites.google.com/site/unicodetools/inputdata)
833- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip
834- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src
835- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
836- Restore TODO diffs in source/data/unidata/UCARules.txt
837    cd $ICU_SRC_DIR
838    meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt
839- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt
840
841- also: from http://unicode.org/Public/security/7.0.0/ download new
842  confusables.txt & confusablesWholeScript.txt
843  and copy to $ICU_ROOT/src/source/data/unidata/
844
845* initial preparseucd.py changes
846- remove new Unicode scripts from the
847  only-in-ISO-15924 list according to the error message:
848    ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass',
849                        'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm',
850                        'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj']
851    from _scripts_only_in_iso15924
852  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
853      and in com.ibm.icu.dev.test.lang.TestUScript.java
854- NamesList.txt now has a heading with a non-ASCII character
855  + keep ppucd.txt in platform charset, rather than changing tool/test parsers
856  + escape non-ASCII characters in heading comments
857- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013
858  + get the copyright from the first file whose copyright line contains the current year
859
860* PropertyValueAliases.txt changes
861- 32 new Block (blk) values:
862    blk; Bassa_Vah                        ; Bassa_Vah
863    blk; Caucasian_Albanian               ; Caucasian_Albanian
864    blk; Coptic_Epact_Numbers             ; Coptic_Epact_Numbers
865    blk; Diacriticals_Ext                 ; Combining_Diacritical_Marks_Extended
866    blk; Duployan                         ; Duployan
867    blk; Elbasan                          ; Elbasan
868    blk; Geometric_Shapes_Ext             ; Geometric_Shapes_Extended
869    blk; Grantha                          ; Grantha
870    blk; Khojki                           ; Khojki
871    blk; Khudawadi                        ; Khudawadi
872    blk; Latin_Ext_E                      ; Latin_Extended_E
873    blk; Linear_A                         ; Linear_A
874    blk; Mahajani                         ; Mahajani
875    blk; Manichaean                       ; Manichaean
876    blk; Mende_Kikakui                    ; Mende_Kikakui
877    blk; Modi                             ; Modi
878    blk; Mro                              ; Mro
879    blk; Myanmar_Ext_B                    ; Myanmar_Extended_B
880    blk; Nabataean                        ; Nabataean
881    blk; Old_North_Arabian                ; Old_North_Arabian
882    blk; Old_Permic                       ; Old_Permic
883    blk; Ornamental_Dingbats              ; Ornamental_Dingbats
884    blk; Pahawh_Hmong                     ; Pahawh_Hmong
885    blk; Palmyrene                        ; Palmyrene
886    blk; Pau_Cin_Hau                      ; Pau_Cin_Hau
887    blk; Psalter_Pahlavi                  ; Psalter_Pahlavi
888    blk; Shorthand_Format_Controls        ; Shorthand_Format_Controls
889    blk; Siddham                          ; Siddham
890    blk; Sinhala_Archaic_Numbers          ; Sinhala_Archaic_Numbers
891    blk; Sup_Arrows_C                     ; Supplemental_Arrows_C
892    blk; Tirhuta                          ; Tirhuta
893    blk; Warang_Citi                      ; Warang_Citi
894  -> add to uchar.h
895    use long property names for enum constants
896  -> add to UCharacter.UnicodeBlock IDs
897    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
898            replace  public static final int \1_ID = \2; \3
899  -> add to UCharacter.UnicodeBlock objects
900    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
901            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
902- 28 new Joining_Group (jg) values:
903    jg ; Manichaean_Aleph                 ; Manichaean_Aleph
904    jg ; Manichaean_Ayin                  ; Manichaean_Ayin
905    jg ; Manichaean_Beth                  ; Manichaean_Beth
906    jg ; Manichaean_Daleth                ; Manichaean_Daleth
907    jg ; Manichaean_Dhamedh               ; Manichaean_Dhamedh
908    jg ; Manichaean_Five                  ; Manichaean_Five
909    jg ; Manichaean_Gimel                 ; Manichaean_Gimel
910    jg ; Manichaean_Heth                  ; Manichaean_Heth
911    jg ; Manichaean_Hundred               ; Manichaean_Hundred
912    jg ; Manichaean_Kaph                  ; Manichaean_Kaph
913    jg ; Manichaean_Lamedh                ; Manichaean_Lamedh
914    jg ; Manichaean_Mem                   ; Manichaean_Mem
915    jg ; Manichaean_Nun                   ; Manichaean_Nun
916    jg ; Manichaean_One                   ; Manichaean_One
917    jg ; Manichaean_Pe                    ; Manichaean_Pe
918    jg ; Manichaean_Qoph                  ; Manichaean_Qoph
919    jg ; Manichaean_Resh                  ; Manichaean_Resh
920    jg ; Manichaean_Sadhe                 ; Manichaean_Sadhe
921    jg ; Manichaean_Samekh                ; Manichaean_Samekh
922    jg ; Manichaean_Taw                   ; Manichaean_Taw
923    jg ; Manichaean_Ten                   ; Manichaean_Ten
924    jg ; Manichaean_Teth                  ; Manichaean_Teth
925    jg ; Manichaean_Thamedh               ; Manichaean_Thamedh
926    jg ; Manichaean_Twenty                ; Manichaean_Twenty
927    jg ; Manichaean_Waw                   ; Manichaean_Waw
928    jg ; Manichaean_Yodh                  ; Manichaean_Yodh
929    jg ; Manichaean_Zayin                 ; Manichaean_Zayin
930    jg ; Straight_Waw                     ; Straight_Waw
931  -> uchar.h & UCharacter.JoiningGroup
932- 23 new Script (sc) values:
933    sc ; Aghb                             ; Caucasian_Albanian
934    sc ; Bass                             ; Bassa_Vah
935    sc ; Dupl                             ; Duployan
936    sc ; Elba                             ; Elbasan
937    sc ; Gran                             ; Grantha
938    sc ; Hmng                             ; Pahawh_Hmong
939    sc ; Khoj                             ; Khojki
940    sc ; Lina                             ; Linear_A
941    sc ; Mahj                             ; Mahajani
942    sc ; Mani                             ; Manichaean
943    sc ; Mend                             ; Mende_Kikakui
944    sc ; Modi                             ; Modi
945    sc ; Mroo                             ; Mro
946    sc ; Narb                             ; Old_North_Arabian
947    sc ; Nbat                             ; Nabataean
948    sc ; Palm                             ; Palmyrene
949    sc ; Pauc                             ; Pau_Cin_Hau
950    sc ; Perm                             ; Old_Permic
951    sc ; Phlp                             ; Psalter_Pahlavi
952    sc ; Sidd                             ; Siddham
953    sc ; Sind                             ; Khudawadi
954    sc ; Tirh                             ; Tirhuta
955    sc ; Wara                             ; Warang_Citi
956  -> uscript.h (many were added before)
957    comment "Mende Kikakui" for USCRIPT_MENDE
958    add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias
959  -> com.ibm.icu.lang.UScript
960    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
961    replace  public static final int \1 = \2; \3
962- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
963  (added 2012-11-01)
964    Ahom        338     Ahom
965    Hatr        127     Hatran
966    Mult        323     Multani
967  (added 2013-10-12)
968    Modi        324     Modi
969    Pauc        263     Pau Cin Hau
970    Sidd        302     Siddham
971  -> uscript.h (some overlap with additions from Unicode)
972  -> com.ibm.icu.lang.UScript
973    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
974    replace  public static final int \1 = \2; \3
975  -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924
976  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
977      and in com.ibm.icu.dev.test.lang.TestUScript.java
978
979* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
980    (not strictly necessary for NOT_ENCODED scripts)
981  ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt
982
983* generate normalization data files
984- cd $ICU_ROOT/dbg
985- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib
986- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in
987- UNIDATA=$ICU_SRC_DIR/source/data/unidata
988- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource
989- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
990- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
991- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
992- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
993
994* build ICU (make install)
995  so that the tools build can pick up the new definitions from the installed header files.
996
997~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
998
999* build Unicode tools using CMake+make
1000
1001~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1002
1003# Location (--prefix) of where ICU was installed.
1004set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst)
1005# Location of the ICU source tree.
1006set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src)
1007
1008~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1009~/svn.icutools/trunk/dbg/unicode/c$ make
1010
1011* genprops work
1012- new code point range for Joining_Group values: 10AC0..10AFF Manichaean
1013  + add second array of Joining_Group values for at most 10800..10FFF
1014    icutools: unicode/c/genprops/bidipropsbuilder.cpp
1015    icu: source/common/ubidi_props.h/.c/_data.h
1016    icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java
1017
1018* generate core properties data files
1019- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR
1020- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR
1021- rebuild ICU (make install) & tools
1022- run genuca again (see step above) so that it picks up the new nfc.nrm
1023- rebuild ICU (make install) & tools
1024
1025* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1026  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1027- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1028- Unicode 6.0..7.0: U+2260, U+226E, U+226F
1029- nothing new in 7.0, no test file to update
1030
1031* run & fix ICU4C tests
1032
1033* update Java data files
1034- refresh just the UCD-related files, just to be safe
1035- see (ICU4C)/source/data/icu4j-readme.txt
1036- mkdir /tmp/icu4j
1037- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1038  output:
1039    ...
1040    Unicode .icu files built to ./out/build/icudt53l
1041    echo timestamp > uni-core-data
1042    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1043    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b
1044    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1045    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b
1046    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b"
1047    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/
1048    mkdir -p /tmp/icu4j/main/shared/data
1049    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1050    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/
1051    mkdir -p /tmp/icu4j/main/shared/data
1052    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1053    make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data'
1054- copy the big-endian Unicode data files to another location,
1055  separate from the other data files
1056    ICUDT=icudt54b
1057    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1058    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1059    cd ~/svn.icu/uni70/dbg/data/out/icu4j
1060    cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1061    cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1062    rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu
1063    cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT
1064    cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1065    cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr
1066- refresh ICU4J
1067    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1068
1069* update CollationFCD.java
1070  + copy & paste the initializers of lcccIndex[] etc. from
1071    ICU4C/source/i18n/collationfcd.cpp to
1072    ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java
1073
1074* refresh Java test .txt files
1075- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1076    cd $ICU_SRC_DIR/source/data/unidata
1077    cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1078    cd ../../test/testdata
1079    cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1080    cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode
1081
1082* UCA
1083
1084- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/
1085- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata)
1086- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/
1087- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA
1088- output files are in ~/svn.unitools/Generated/uca/7.0.0/
1089- review data; compare files, use blankweights.sed or similar
1090  ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt
1091- cd ~/svn.unitools/Generated/uca/7.0.0/
1092- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1093  cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt
1094- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1095    (note removing the underscore before "Rules")
1096    cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt
1097- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1098  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1099  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1100    cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt
1101    cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt
1102    cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data
1103- run genuca, see command line above
1104- rebuild ICU4C
1105- refresh ICU4J collation data:
1106  (subset of instructions above for properties data refresh, except copies all coll/*)
1107    ICUDT=icudt54b
1108    ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1109    ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1110    ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll
1111    ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT
1112- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1113- note on intltest: if collate/UCAConformanceTest fails, then
1114  utility/MultithreadTest/TestCollators will fail as well;
1115  fix the conformance test before looking into the multi-thread test
1116- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors
1117- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch
1118  ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/
1119
1120* When refreshing all of ICU4J data from ICU4C
1121- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1122- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1123or
1124- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1125
1126* run & fix ICU4J tests
1127
1128*** LayoutEngine script information
1129
1130(For details see the Unicode 5.2 change log below.)
1131
1132* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1133  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1134  in the working directory.
1135  (It also generates ScriptRunData.cpp, which is no longer needed.)
1136
1137  The generated files have a current copyright date and "@stable" statement.
1138  ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java
1139  for "born stable" Unicode API constants, and to stop parsing ICU version numbers
1140  which may not contain dots any more.
1141
1142- diff current <icu>/source/layout files vs. generated ones
1143    ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1144  review and manually merge desired changes;
1145  fix gratuitous changes, incorrect @draft/@stable and missing aliases;
1146  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1147- if you just copy the above files, then
1148  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1149  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1150
1151*** API additions
1152- send notice to icu-design about new born-@stable API (enum constants etc.)
1153
1154*** merge the Unicode update branches back onto the trunk
1155- do not merge the icudata.jar and testdata.jar,
1156  instead rebuild them from merged & tested ICU4C
1157
1158---------------------------------------------------------------------------- ***
1159
1160Unicode 6.3 update
1161
1162http://www.unicode.org/review/pri249/  -- beta review
1163http://www.unicode.org/reports/uax-proposed-updates.html
1164http://www.unicode.org/versions/beta-6.3.0.html#notable_issues
1165http://www.unicode.org/reports/tr44/tr44-11.html
1166
1167*** ICU Trac
1168
1169- ticket 10128: update ICU to Unicode 6.3 beta
1170- ticket 10168: update ICU to Unicode 6.3 final
1171- C++ branches/markus/uni63 at r33552 from trunk at r33551
1172- Java branches/markus/uni63 at r33550 from trunk at r33553
1173
1174- ticket 10142: implement Unicode 6.3 bidi algorithm additions
1175
1176*** Unicode version numbers
1177- makedata.mak
1178- uchar.h
1179  (configure.in & configure: have been modified to extract the version from uchar.h)
1180- com.ibm.icu.util.VersionInfo
1181- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1182
1183- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h
1184  so that the makefiles see the new version number.
1185
1186*** data files & enums & parser code
1187
1188* file preparation
1189
1190- download UCD, UCA & IDNA files
1191- make sure that the Unicode data folder passed into preparseucd.py
1192  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1193- modify preparseucd.py:
1194  parse new file BidiBrackets.txt
1195  with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type
1196- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src
1197- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1198- Check test file diffs for previously commented-out, known-failing data lines;
1199  probably need to keep those commented out.
1200
1201* PropertyAliases.txt changes
1202- 1 new Enumerated Property
1203  bpt                      ; Bidi_Paired_Bracket_Type
1204  -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType
1205  -> ubidi_props.h & .c & UBiDiProps.java
1206  -> remember to write the max value at UBIDI_MAX_VALUES_INDEX
1207  -> uprops.cpp
1208  -> change ubidi.icu format version from 2.0 to 2.1
1209- 1 new Miscellaneous Property
1210  bpb                      ; Bidi_Paired_Bracket
1211  -> uchar.h & UProperty.java
1212  -> ppucd.h & .cpp
1213
1214* PropertyValueAliases.txt changes
1215- 3 Bidi_Paired_Bracket_Type (bpt) values:
1216  bpt; c                                ; Close
1217  bpt; n                                ; None
1218  bpt; o                                ; Open
1219  -> uchar.h & UCharacter.BidiPairedBracketType
1220  -> ubidi_props.h & .c & UBiDiProps.java
1221  -> change ubidi.icu format version from 2.0 to 2.1
1222- 4 new Bidi_Class (bc) values:
1223  bc ; FSI                              ; First_Strong_Isolate
1224  bc ; LRI                              ; Left_To_Right_Isolate
1225  bc ; RLI                              ; Right_To_Left_Isolate
1226  bc ; PDI                              ; Pop_Directional_Isolate
1227  -> uchar.h & UCharacterEnums.ECharacterDirection
1228  -> until the bidi code gets updated,
1229     Roozbeh suggests mapping the new bc values to ON (Other_Neutral)
1230- 3 new Word_Break (WB) values:
1231  WB ; HL                               ; Hebrew_Letter
1232  WB ; SQ                               ; Single_Quote
1233  WB ; DQ                               ; Double_Quote
1234  -> uchar.h & UCharacter.WordBreak
1235  -> first time Word_Break numeric constants exceed 4 bits (now 17 values)
1236- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1237  (added 2012-10-16)
1238  Aghb  239     Caucasian Albanian
1239  Mahj  314     Mahajani
1240  -> uscript.h
1241  -> com.ibm.icu.lang.UScript
1242    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1243    replace  public static final int \1 = \2;\3
1244  -> preparseucd.py _scripts_only_in_iso15924
1245  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1246      and in com.ibm.icu.dev.test.lang.TestUScript.java
1247  -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata
1248     (not strictly necessary for NOT_ENCODED scripts)
1249
1250* generate normalization data files
1251- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib
1252- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in
1253- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata
1254- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1255- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1256- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1257- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1258
1259* build ICU (make install)
1260  so that the tools build can pick up the new definitions from the installed header files.
1261
1262~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt
1263
1264* build Unicode tools using CMake+make
1265
1266~/svn.icutools/trunk/src/unicode/c/icudefs.txt:
1267
1268# Location (--prefix) of where ICU was installed.
1269set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst)
1270# Location of the ICU source tree.
1271set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src)
1272
1273~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c
1274~/svn.icutools/trunk/dbg/unicode/c$ make
1275
1276* generate core properties data files
1277- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src
1278- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src
1279- rebuild ICU (make install) & tools
1280- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1281- rebuild ICU (make install) & tools
1282
1283* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1284  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1285- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1286- Unicode 6.0..6.3: U+2260, U+226E, U+226F
1287- nothing new in 6.3, no test file to update
1288
1289* update Java data files
1290- refresh just the UCD-related files, just to be safe
1291- see (ICU4C)/source/data/icu4j-readme.txt
1292- mkdir /tmp/icu4j
1293- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1294  output:
1295    ...
1296    Unicode .icu files built to ./out/build/icudt52l
1297    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1298    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b
1299    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1300    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b
1301    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b"
1302    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/
1303    mkdir -p /tmp/icu4j/main/shared/data
1304    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1305    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/
1306    mkdir -p /tmp/icu4j/main/shared/data
1307    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1308    make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data'
1309- copy the big-endian Unicode data files to another location,
1310  separate from the other data files
1311    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1312    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1313    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1314    ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu
1315    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b
1316    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1317    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr
1318- refresh ICU4J
1319    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1320
1321* refresh Java test .txt files
1322- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1323
1324* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files
1325
1326- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1327- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1328- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1329- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1330  (note removing the underscore before "Rules")
1331- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1332  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1333  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1334- check test file diffs for previously commented-out, known-failing data lines;
1335  probably need to keep those commented out
1336- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1337- run genuca, see command line above
1338- rebuild ICU4C
1339- refresh ICU4J collation data:
1340  (subset of instructions above for properties data refresh, except copies all coll/*)
1341    ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1342    ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1343    ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll
1344    ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b
1345- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1346- note on intltest: if collate/UCAConformanceTest fails, then
1347  utility/MultithreadTest/TestCollators will fail as well;
1348  fix the conformance test before looking into the multi-thread test
1349
1350* test ICU, fix test code where necessary
1351
1352* When refreshing all of ICU4J data from ICU4C
1353- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1354- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1355or
1356- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1357
1358*** LayoutEngine script information
1359- skipped for Unicode 6.3: no new scripts
1360
1361*** merge the Unicode update branches back onto the trunk
1362- do not merge the icudata.jar and testdata.jar,
1363  instead rebuild them from merged & tested ICU4C
1364
1365---------------------------------------------------------------------------- ***
1366
1367Unicode 6.2 update
1368
1369http://www.unicode.org/review/pri230/
1370http://www.unicode.org/versions/beta-6.2.0.html
1371http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0
1372http://www.unicode.org/review/pri227/  Changes to Script Extensions Property Values
1373http://www.unicode.org/review/pri228/  Changing some common characters from Punctuation to Symbol
1374http://www.unicode.org/review/pri229/  Linebreaking Changes for Pictographic Symbols
1375http://www.unicode.org/reports/tr46/tr46-8.html  IDNA
1376http://unicode.org/Public/idna/6.2.0/
1377
1378*** ICU Trac
1379
1380- ticket 9515: Unicode 6.2: final ICU update
1381
1382- ticket 9514: UCA 6.2: fix UCARules.txt
1383
1384- ticket 9437: update ICU to Unicode 6.2
1385- C++ branches/markus/uni62 at r32050 from trunk at r32041
1386- Java branches/markus/uni62 at r32068 from trunk at r32066
1387
1388*** Unicode version numbers
1389- makedata.mak
1390- uchar.h
1391  (configure.in & configure: have been modified to extract the version from uchar.h)
1392- com.ibm.icu.util.VersionInfo
1393- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_
1394
1395*** data files & enums & parser code
1396
1397* file preparation
1398
1399- download UCD, UCA & IDNA files
1400- make sure that the Unicode data folder passed into preparseucd.py
1401  includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder)
1402- modify preparseucd.py: NamesList.txt is now in UTF-8
1403- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src
1404- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1405- Check test file diffs for previously commented-out, known-failing data lines;
1406  probably need to keep those commented out.
1407
1408* PropertyValueAliases.txt changes
1409- 1 new Line_Break (lb) value:
1410  lb ; RI                               ; Regional_Indicator
1411  -> uchar.h & UCharacter.LineBreak
1412- 1 new Word_Break (WB) value:
1413  WB ; RI                               ; Regional_Indicator
1414  -> uchar.h & UCharacter.WordBreak
1415- 1 new Grapheme_Cluster_Break (GCB) value:
1416  GCB; RI                               ; Regional_Indicator
1417  -> uchar.h & UCharacter.GraphemeClusterBreak
1418
1419* 3 new numeric values
1420  The new value -1, which was really supposed to be NaN but that would have required
1421  new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1,
1422  but encodeNumericValue() in corepropsbuilder.cpp had to be fixed.
1423    cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1
1424    cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1
1425  The two new values 216000 and 432000 require an addition to the encoding of numeric values.
1426    cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000
1427    cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000
1428  -> uprops.h, uchar.c & UCharacterProperty.java
1429  -> cucdtst.c & UCharacterTest.java
1430
1431* generate normalization data files
1432- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib
1433- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in
1434- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata
1435- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1436- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1437- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1438- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1439
1440* build ICU (make install)
1441  so that the tools build can pick up the new definitions from the installed header files.
1442* build Unicode tools using CMake+make
1443
1444* generate core properties data files
1445- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src
1446- in initial bootstrapping, change the UCA version
1447  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1448- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src
1449- rebuild ICU (make install) & tools
1450  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1451    check if the UCA version in FractionalUCA.txt matches the new Unicode version
1452    (see step above)
1453- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm
1454- rebuild ICU (make install) & tools
1455
1456* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1457  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1458- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1459- Unicode 6.0..6.2: U+2260, U+226E, U+226F
1460- nothing new in 6.2, no test file to update
1461
1462* update Java data files
1463- refresh just the UCD-related files, just to be safe
1464- see (ICU4C)/source/data/icu4j-readme.txt
1465- mkdir /tmp/icu4j
1466- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1467  output:
1468    ...
1469    Unicode .icu files built to ./out/build/icudt50l
1470    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1471    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b
1472    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1473    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b
1474    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b"
1475    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/
1476    mkdir -p /tmp/icu4j/main/shared/data
1477    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1478    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/
1479    mkdir -p /tmp/icu4j/main/shared/data
1480    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1481    make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data'
1482- copy the big-endian Unicode data files to another location,
1483  separate from the other data files
1484    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1485    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1486    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1487    ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu
1488    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b
1489    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1490    ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr
1491- refresh ICU4J
1492    ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1493
1494* refresh Java test .txt files
1495- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1496
1497* UCA
1498
1499- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/
1500- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that
1501- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1502- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1503  (note removing the underscore before "Rules")
1504- update (ICU4C)/source/test/testdata/CollationTest_*.txt
1505  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1506  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1507- check test file diffs for previously commented-out, known-failing data lines;
1508  probably need to keep those commented out
1509- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1510- run genuca, see command line above
1511- rebuild ICU4C
1512- refresh ICU4J collation data:
1513  (subset of instructions above for properties data refresh, except copies all coll/*)
1514    ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1515    ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1516    ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll
1517    ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b
1518- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1519- note on intltest: if collate/UCAConformanceTest fails, then
1520  utility/MultithreadTest/TestCollators will fail as well;
1521  fix the conformance test before looking into the multi-thread test
1522
1523* test ICU, fix test code where necessary
1524
1525* When refreshing all of ICU4J data from ICU4C
1526- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1527- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1528or
1529- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1530
1531*** LayoutEngine script information
1532- skipped for Unicode 6.2: no new scripts
1533
1534*** merge the Unicode update branches back onto the trunk
1535- do not merge the icudata.jar and testdata.jar,
1536  instead rebuild them from merged & tested ICU4C
1537
1538---------------------------------------------------------------------------- ***
1539
1540Future Unicode update
1541
1542Tools simplified since the Unicode 6.1 update. See
1543- http://site.icu-project.org/design/props/ppucd
1544- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972
1545
1546* Unicode version numbers
1547- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates
1548
1549* file preparation
1550- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py:
1551- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src
1552- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders.
1553- Check test file diffs for previously commented-out, known-failing data lines;
1554  probably need to keep those commented out.
1555
1556* PropertyValueAliases.txt changes
1557- Script codes that are in ISO 15924 but not in Unicode are now listed in
1558  preparseucd.py, in the _scripts_only_in_iso15924 variable.
1559  If there are new ISO codes, then add them.
1560  If Unicode adds some of them, then remove them from the .py variable.
1561
1562* UnicodeData.txt changes
1563- No more manual changes for CJK ranges for algorithmic names;
1564  those are now written to ppucd.txt and genprops reads them from there.
1565
1566* generate core properties data files (makeprops.sh was deleted)
1567- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src
1568
1569* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt
1570- it is now generated by preparseucd.py
1571
1572* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt
1573- it is now generated by preparseucd.py
1574- make sure that the Unicode data folder passed into preparseucd.py
1575  includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1576  (can be in some subfolder)
1577
1578* generate normalization data files
1579- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib
1580- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in
1581- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata
1582- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm     -s $UNIDATA/norm2 nfc.txt
1583- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm    -s $UNIDATA/norm2 nfc.txt nfkc.txt
1584- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt
1585- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm   -s $UNIDATA/norm2 nfc.txt uts46.txt
1586
1587* build ICU (make install)
1588* build Unicode tools using CMake+make
1589
1590* new way to call genuca (makeuca.sh was deleted)
1591- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src
1592
1593---------------------------------------------------------------------------- ***
1594
1595Unicode 6.1 update
1596
1597*** ICU Trac
1598
1599- ticket 8995 final update to Unicode 6.1
1600- ticket 8994 regenerate source/layout/CanonData.cpp
1601
1602- ticket 8961 support Unicode "Age" value *names*
1603- ticket 8963 support multiple character name aliases & types
1604
1605- ticket 8827 "update ICU to Unicode 6.1"
1606- C++ branches/markus/uni61 at r30864 from trunk at r30843
1607- Java branches/markus/uni61 at r30865 from trunk at r30863
1608
1609*** Unicode version numbers
1610- makedata.mak
1611- uchar.h
1612  (configure.in & configure: have been modified to extract the version from uchar.h)
1613- com.ibm.icu.util.VersionInfo
1614- icutools/unicode/makedefs.sh
1615  + also review & update other definitions in that file,
1616    e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l
1617
1618*** data files & enums & parser code
1619
1620* file preparation
1621
1622~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed
1623- This prepares both unidata and testdata files in respective output subfolders.
1624- Check test file diffs for previously commented-out, known-failing data lines;
1625  probably need to keep those commented out.
1626
1627* PropertyValueAliases.txt changes
1628- 11 new block names:
1629  Arabic_Extended_A
1630  Arabic_Mathematical_Alphabetic_Symbols
1631  Chakma
1632  Meetei_Mayek_Extensions
1633  Meroitic_Cursive
1634  Meroitic_Hieroglyphs
1635  Miao
1636  Sharada
1637  Sora_Sompeng
1638  Sundanese_Supplement
1639  Takri
1640  -> add to uchar.h
1641  -> add to UCharacter.UnicodeBlock IDs
1642    Eclipse find     UBLOCK_([^ ]+) = ([0-9]+), (/.+)
1643            replace  public static final int \1_ID = \2; \3
1644  -> add to UCharacter.UnicodeBlock objects
1645    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1646            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1647- 1 new Joining_Group (jg) value:
1648  Rohingya_Yeh
1649  -> uchar.h & UCharacter.JoiningGroup
1650- 2 new Line_Break (lb) values:
1651  CJ=Conditional_Japanese_Starter
1652  HL=Hebrew_Letter
1653  -> uchar.h & UCharacter.LineBreak
1654- 7 new scripts:
1655  sc ; Cakm      ; Chakma
1656  sc ; Merc      ; Meroitic_Cursive
1657  sc ; Mero      ; Meroitic_Hieroglyphs
1658  sc ; Plrd      ; Miao
1659  sc ; Shrd      ; Sharada
1660  sc ; Sora      ; Sora_Sompeng
1661  sc ; Takr      ; Takri
1662  -> remove these from SyntheticPropertyValueAliases.txt
1663  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1664      and in com.ibm.icu.dev.test.lang.TestUScript.java
1665- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1666  (added 2011-06-21)
1667  Khoj        322     Khojki
1668  Tirh        326     Tirhuta
1669    and another one added 2011-12-09
1670  Hluw        080     Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs)
1671  -> uscript.h
1672  -> com.ibm.icu.lang.UScript
1673    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1674    replace  public static final int \1 = \2;\3
1675  -> SyntheticPropertyValueAliases.txt
1676  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1677      and in com.ibm.icu.dev.test.lang.TestUScript.java
1678
1679* UnicodeData.txt changes
1680- the last Unihan code point changes from U+9FCB to U+9FCC
1681  search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive)
1682  + do change gennames.c
1683  + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java
1684
1685* DerivedBidiClass.txt changes
1686- 2 new default-AL blocks:
1687#     Arabic Extended-A: U+08A0  -  U+08FF  (was default-R)
1688#     Arabic Mathematical Alphabetic Symbols:
1689#                       U+1EE00  - U+1EEFF  (was default-R)
1690- 2 new default-R blocks:
1691#     Meroitic Hieroglyphs:
1692#                        U+10980 - U+1099F
1693#     Meroitic Cursive:  U+109A0 - U+109FF
1694  -> should be picked up by the explicit data in the file
1695
1696* NameAliases.txt changes
1697- from
1698    # Each line has two fields
1699    # First field: Code point
1700    # Second field: Alias
1701- to
1702    # Each line has three fields, as described here:
1703    #
1704    # First field:  Code point
1705    # Second field: Alias
1706    # Third field:  Type
1707- Also, the file previously allowed multiple aliases but only now does it
1708  actually provide multiple, even multiple of the same type. For example,
1709    FEFF;BYTE ORDER MARK;alternate
1710    FEFF;BOM;abbreviation
1711    FEFF;ZWNBSP;abbreviation
1712- This breaks our gennames parser, unames.icu data structure, and API.
1713  Fix gennames to only pick up "correction" aliases.
1714  New ticket #8963 for further changes.
1715
1716* run genpname/preparse.pl (on Linux)
1717  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1718  + make sure that data.h is writable
1719  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1720  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1721
1722* build ICU (make install)
1723  so that the tools build can pick up the new definitions from the installed header files.
1724* build Unicode tools (at least genpname) using CMake+make
1725
1726* run genpname
1727  (builds both pnames.icu and propname_data.h)
1728- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1729- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1730
1731* build ICU (make install)
1732* build Unicode tools using CMake+make
1733
1734* update source/data/unidata/norm2/nfkc_cf.txt
1735- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
1736
1737* update source/data/unidata/norm2/uts46.txt
1738- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt
1739  to ~/svn.icu/tools/trunk/src/unicode/py
1740- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008".
1741- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
1742- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
1743
1744* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
1745  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
1746- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
1747- Unicode 6.0..6.1: U+2260, U+226E, U+226F
1748- nothing new in 6.1, no test file to update
1749
1750* generate core properties data files
1751- in initial bootstrapping, change the UCA version
1752  in source/data/unidata/FractionalUCA.txt to match the new Unicode version
1753- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1754- rebuild ICU & tools
1755  + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR,
1756    check if the UCA version in FractionalUCA.txt matches the new Unicode version
1757    (see step above)
1758- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm:
1759  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1760- rebuild ICU & tools
1761
1762* update Java data files
1763- refresh just the UCD-related files, just to be safe
1764- see (ICU4C)/source/data/icu4j-readme.txt
1765- mkdir /tmp/icu4j
1766- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1767  output:
1768    ...
1769    Unicode .icu files built to ./out/build/icudt49l
1770    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1771    mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b
1772    echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
1773    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b
1774    mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b"
1775    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/
1776    mkdir -p /tmp/icu4j/main/shared/data
1777    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
1778    jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/
1779    mkdir -p /tmp/icu4j/main/shared/data
1780    cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data
1781    make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data'
1782- copy the big-endian Unicode data files to another location,
1783  separate from the other data files
1784    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1785    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1786    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1787    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu
1788    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b
1789    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1790    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr
1791- refresh ICU4J
1792    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1793
1794* refresh Java test .txt files
1795- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
1796
1797* test ICU so far, fix test code where necessary
1798- temporarily ignore collation issues that look like UCA/UCD mismatches,
1799  until UCA data is updated
1800
1801* UCA
1802
1803- get output from Mark's tools; look in
1804    http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt
1805- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
1806- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
1807  (note removing the underscore before "Rules")
1808- update (ICU)/source/test/testdata/CollationTest_*.txt
1809  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
1810  with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt)
1811- check test file diffs for previously commented-out, known-failing data lines;
1812  probably need to keep those commented out
1813- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani
1814- run makeuca.sh:
1815  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
1816- rebuild ICU4C
1817- refresh ICU4J collation data:
1818  (subset of instructions above for properties data refresh, except copies all coll/*)
1819    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1820    ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1821    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll
1822    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b
1823- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging)
1824- note on intltest: if collate/UCAConformanceTest fails, then
1825  utility/MultithreadTest/TestCollators will fail as well;
1826  fix the conformance test before looking into the multi-thread test
1827
1828* When refreshing all of ICU4J data from ICU4C
1829- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1830- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
1831or
1832- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
1833
1834*** LayoutEngine script information
1835
1836(For details see the Unicode 5.2 change log below.)
1837
1838* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder.
1839  This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp
1840  in the working directory.
1841  (It also generates ScriptRunData.cpp, which is no longer needed.)
1842
1843  The generated files have a current copyright date and "@draft" statement.
1844
1845- diff current <icu>/source/layout files vs. generated ones
1846    ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout
1847  review and manually merge desired changes;
1848  fix gratuitous changes, incorrect @draft and missing aliases;
1849  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
1850- if you just copy the above files, then
1851  fix mixed line endings, review the diffs as above and restore changes to API tags etc.;
1852  manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
1853
1854*** merge the Unicode update branches back onto the trunk
1855- do not merge the icudata.jar and testdata.jar,
1856  instead rebuild them from merged & tested ICU4C
1857
1858---------------------------------------------------------------------------- ***
1859
1860ICU 4.8 (no Unicode update, just new script codes)
1861
1862* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1863  (added 2010-12-21)
1864    Afak    439     Afaka
1865    Jurc    510     Jurchen
1866    Mroo    199     Mro, Mru
1867    Nshu    499     Nüshu
1868    Shrd    319     Sharada, Śāradā
1869    Sora    398     Sora Sompeng
1870    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
1871    Tang    520     Tangut
1872    Wole    480     Woleai
1873  -> uscript.h
1874  -> com.ibm.icu.lang.UScript
1875    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1876    replace  public static final int \1 = \2;\3
1877  -> genpname/SyntheticPropertyValueAliases.txt
1878  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1879      and in com.ibm.icu.dev.test.lang.TestUScript.java
1880
1881* run genpname/preparse.pl (on Linux)
1882  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
1883  + make sure that data.h is writable
1884  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
1885  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
1886
1887* rebuild Unicode tools (at least genpname) using make
1888- You might first need to "make install" ICU so that the tools build can pick
1889  up the new definitions from the installed header files.
1890
1891* run genpname
1892  (builds both pnames.icu and propname_data.h)
1893- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
1894- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
1895- rebuild ICU & tools
1896
1897* run genprops
1898- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1899- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
1900- rebuild ICU & tools
1901
1902* update Java data files
1903- refresh just the UCD-related files, just to be safe
1904- see (ICU4C)/source/data/icu4j-readme.txt
1905- mkdir /tmp/icu4j
1906- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
1907- copy the big-endian Unicode data files to another location,
1908  separate from the other data files
1909    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1910    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1911    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
1912- refresh ICU4J
1913    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
1914
1915* should have updated the layout engine script codes but forgot
1916
1917---------------------------------------------------------------------------- ***
1918
1919Unicode 6.0 update
1920
1921*** related ICU Trac tickets
1922
19237264 Unicode 6.0 Update
1924
1925*** Unicode version numbers
1926- makedata.mak
1927- uchar.h
1928  (configure.in & configure: have been modified to extract the version from uchar.h)
1929- com.ibm.icu.util.VersionInfo
1930
1931*** data files & enums & parser code
1932
1933* file preparation
1934
1935~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
1936- This now prepares both unidata and testdata files in respective output subfolders.
1937
1938* PropertyAliases.txt changes
1939- new Script_Extensions property defined in the new ScriptExtensions.txt file
1940  but not listed in PropertyAliases.txt; reported to unicode.org;
1941  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
1942    scx; Script_Extensions
1943  -> uchar.h with new UProperty section
1944  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
1945
1946* PropertyValueAliases.txt changes
1947- 12 new block names:
1948  Alchemical_Symbols
1949  Bamum_Supplement
1950  Batak
1951  Brahmi
1952  CJK_Unified_Ideographs_Extension_D
1953  Emoticons
1954  Ethiopic_Extended_A
1955  Kana_Supplement
1956  Mandaic
1957  Miscellaneous_Symbols_And_Pictographs
1958  Playing_Cards
1959  Transport_And_Map_Symbols
1960  -> add to uchar.h
1961  -> add to UCharacter.UnicodeBlock
1962    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
1963            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
1964- Joining_Group (jg) values:
1965  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
1966  -> uchar.h & UCharacter.JoiningGroup
1967- 3 new scripts:
1968  sc ; Batk      ; Batak
1969  sc ; Brah      ; Brahmi
1970  sc ; Mand      ; Mandaic
1971  -> remove these from SyntheticPropertyValueAliases.txt
1972  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
1973  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
1974      and in com.ibm.icu.dev.test.lang.TestUScript.java
1975- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
1976  (added 2009-11-11..2010-07-18)
1977  Bass        259     Bassa Vah
1978  Dupl        755     Duployan shortand
1979  Elba        226     Elbasan
1980  Gran        343     Grantha
1981  Kpel        436     Kpelle
1982  Loma        437     Loma
1983  Mend        438     Mende
1984  Merc        101     Meroitic Cursive
1985  Narb        106     Old North Arabian
1986  Nbat        159     Nabataean
1987  Palm        126     Palmyrene
1988  Sind        318     Sindhi
1989  Wara        262     Warang Citi
1990  -> uscript.h
1991  -> com.ibm.icu.lang.UScript
1992    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
1993    replace  public static final int \1 = \2;\3
1994  -> SyntheticPropertyValueAliases.txt
1995  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
1996      and in com.ibm.icu.dev.test.lang.TestUScript.java
1997- ISO 15924 name change
1998  Mero        100     Meroitic Hieroglyphs (was Meroitic)
1999  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
2000- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
2001
2002* UnicodeData.txt changes
2003- new CJK block:
2004  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
2005  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
2006  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
2007
2008* build Unicode tools using CMake+make
2009
2010* run genpname/preparse.pl (on Linux)
2011  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
2012  + make sure that data.h is writable
2013  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
2014  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
2015
2016* rebuild Unicode tools (at least genpname) using make
2017- You might first need to "make install" ICU so that the tools build can pick
2018  up the new definitions from the installed header files.
2019
2020* run genpname
2021- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
2022- rebuild ICU & tools
2023
2024* update source/data/unidata/norm2/nfkc_cf.txt
2025- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
2026
2027* update source/data/unidata/norm2/uts46.txt
2028- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
2029  to ~/svn.icu/tools/trunk/src/unicode/py
2030- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
2031- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
2032- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
2033
2034* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
2035  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
2036- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
2037- Unicode 6.0: U+2260, U+226E, U+226F
2038
2039* generate core properties data files
2040- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2041- rebuild ICU & tools
2042- run makeuca.sh so that genuca picks up the new nfc.nrm:
2043  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2044- rebuild ICU & tools
2045
2046* implement new Script_Extensions property (provisional)
2047- parser & generator: genprops & uprops.icu
2048- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
2049- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
2050
2051* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
2052- (one-time change)
2053- genbidi/gencase/genprops tools changes
2054- re-run makeprops.sh (see above)
2055- UCharacterProperty.java, UCharacterTypeIterator.java,
2056  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
2057  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
2058
2059* update Java data files
2060- refresh just the UCD-related files, just to be safe
2061- see (ICU4C)/source/data/icu4j-readme.txt
2062- mkdir /tmp/icu4j
2063- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2064  output:
2065    ...
2066    Unicode .icu files built to ./out/build/icudt45l
2067    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2068    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
2069    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
2070    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
2071    mkdir -p /tmp/icu4j/main/shared/data
2072    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
2073- copy the big-endian Unicode data files to another location,
2074  separate from the other data files
2075    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2076    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2077    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2078    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
2079    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
2080    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2081    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
2082- refresh ICU4J
2083    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2084
2085* refresh Java test .txt files
2086- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
2087
2088* un-hardcode normalization skippable (NF*_Inert) test data
2089- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
2090
2091* copy updated break iterator test files
2092- now handled by early ucdcopy.py and
2093  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
2094  (old instructions:
2095   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
2096   to ~/svn.icu/trunk/src/source/test/testdata)
2097- they are not used in ICU4J
2098
2099* UCA
2100
2101- get output from Mark's tools; look in
2102    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
2103    http://www.macchiato.com/unicode/utc/additional-uca-files
2104    http://www.unicode.org/Public/UCA/6.0.0/
2105    http://www.unicode.org/~mdavis/uca/
2106- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
2107- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
2108- update Han-implicit ranges for new CJK extensions:
2109  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
2110- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
2111  do not add it into invuca so that tailoring primary-after an ignorable works
2112- genuca: permit space between [variable top] bytes
2113- ucol.cpp: treat noncharacters like unassigned rather than ignorable
2114- run makeuca.sh:
2115  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
2116- rebuild ICU4C
2117- refresh ICU4J collation data:
2118  (subset of instructions above for properties data refresh, except copies all coll/*)
2119    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2120    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2121    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
2122    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
2123- update (ICU)/source/test/testdata/CollationTest_*.txt
2124  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
2125  with output from Mark's Unicode tools
2126- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2127- note on intltest: if collate/UCAConformanceTest fails, then
2128  utility/MultithreadTest/TestCollators will fail as well;
2129  fix the conformance test before looking into the multi-thread test
2130
2131* When refreshing all of ICU4J data from ICU4C
2132- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
2133- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
2134or
2135- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
2136
2137*** LayoutEngine script information
2138
2139(For details see the Unicode 5.2 change log below.)
2140
2141* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2142ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2143ScriptRunData.cpp, which is no longer needed.)
2144
2145The generated files have a current copyright date and "@draft" statement.
2146
2147* copy the above files into <icu>/source/layout, replacing the old files.
2148* fix mixed line endings
2149* review the diffs and fix incorrect @draft and missing aliases;
2150  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
2151* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2152
2153---------------------------------------------------------------------------- ***
2154
2155Unicode 5.2 update
2156
2157*** related ICU Trac tickets
2158
21597084 Unicode 5.2
2160
21617167 verify collation bytes
21627235 Java test NAME_ALIAS
21637236 Java DerivedCoreProperties.txt test
21647237 Java BidiTest.txt
21657238 UTrie2 in core unidata
21667239 test for tailoring gaps
21677240 Java fix CollationMiscTest
21687243 update layout engine for Unicode 5.2
2169
2170*** Unicode version numbers
2171- makedata.mak
2172- uchar.h
2173- configure.in & configure
2174- update ucdVersion in gennames.c if an algorithmic range changes
2175
2176*** data files & enums & parser code
2177
2178* file preparation
2179
2180python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
2181- includes finding files regardless of version numbers,
2182  copying them, and performing the equivalent processing of the
2183  ucdstrip and ucdmerge tools on the desired set of files
2184
2185* notes on changes
2186- PropertyAliases.txt
2187  moved from numeric to enumerated:
2188    ccc       ; Canonical_Combining_Class
2189  new string properties:
2190    NFKC_CF   ; NFKC_Casefold
2191    Name_Alias; Name_Alias
2192  new binary properties:
2193    Cased     ; Cased
2194    CI        ; Case_Ignorable
2195    CWCF      ; Changes_When_Casefolded
2196    CWCM      ; Changes_When_Casemapped
2197    CWKCF     ; Changes_When_NFKC_Casefolded
2198    CWL       ; Changes_When_Lowercased
2199    CWT       ; Changes_When_Titlecased
2200    CWU       ; Changes_When_Uppercased
2201  new CJK Unihan properties (not supported by ICU)
2202- PropertyValueAliases.txt
2203  new block names
2204  new scripts
2205  one script code change:
2206    sc ; Qaai      ; Inherited
2207    ->
2208    sc ; Zinh      ; Inherited                        ; Qaai
2209  new Line_Break (lb) value:
2210    lb ; CP        ; Close_Parenthesis
2211  new Joining_Group (jg) values: Farsi_Yeh, Nya
2212  other new values:
2213    ccc; 214; ATA  ; Attached_Above
2214- DerivedBidiClass.txt
2215  new default-R range: U+1E800 - U+1EFFF
2216- UnicodeData.txt
2217  all of the ISO comments are gone
2218  new CJK block end:
2219    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
2220  new CJK block:
2221    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
2222    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
2223
2224* genpname
2225- run preparse.pl
2226  + cd \svn\icuproj\icu\trunk\source\tools\genpname
2227  + make sure that data.h is writable
2228  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
2229  + preparse.pl complains with errors like the following:
2230      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
2231    This is because ICU 4.0 had scripts from ISO 15924 which are now
2232    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
2233    and PropertyValueAliases.txt.
2234    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2235       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
2236  + preparse.pl complains with errors about block names missing from uchar.h; add them
2237
2238* uchar.h & uscript.h & uprops.h & uprops.c & genprops
2239- new block & script values
2240  + 26 new blocks
2241    copy new blocks from Blocks.txt
2242    MS VC++ 2008 regular expression:
2243      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
2244      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
2245  + several new script values already added in ICU 4.0 for ISO 15924 coverage
2246    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
2247  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
2248  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
2249    (added to SyntheticPropertyValueAliases.txt)
2250- new Joining Group (JG) values: Farsi_Yeh, Nya
2251- new Line_Break (lb) value:
2252    lb ; CP        ; Close_Parenthesis
2253
2254* hardcoded Unihan range end/limit
2255- Unihan range end moves from 9FC3 to 9FCB
2256  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
2257  + do change gennames.c
2258
2259* Compare definitions of new binary properties with what we used to use
2260  in algorithms, to see if the definitions changed.
2261- Verified that definitions for Cased and Case_Ignorable are unchanged.
2262  The gencase tool now parses the newly public Case_Ignorable values
2263  in case the definition changes in the future.
2264
2265* uchar.c & uprops.h & uprops.c & genprops
2266- new numeric values that didn't exist in Unicode data before:
2267    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
2268  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
2269  therefore redesign the encoding of numeric types and values for formatVersion 6;
2270  design for simple numbers up to at least 144 ("one gross"),
2271  large values up to at least 10^20,
2272  and fractions with numerators -1..17 and denominators 1..16
2273  to cover current and expected future values
2274  (e.g., more Han numeric values, Meroitic twelfths)
2275
2276* reimplement Hangul_Syllable_Type for new Jamo characters
2277- the old code assumed that all Jamo characters are in the 11xx block
2278- Unicode 5.2 fills holes there and adds new Jamo characters in
2279    A960..A97F; Hangul Jamo Extended-A
2280  and in
2281    D7B0..D7FF; Hangul Jamo Extended-B
2282- Hangul_Syllable_Type can be trivially derived from a subset of
2283  Grapheme_Cluster_Break values
2284
2285* build Unicode data source code for hardcoding core data
2286C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
2287
2288ICU data make path is \svn\icuproj\icu\trunk\source\data\
2289ICU root path is \svn\icuproj\icu\trunk
2290Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2291Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2292Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2293Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2294Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2295Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2296Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2297Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
2298Creating data file for Unicode Property Names
2299Creating data file for Unicode Character Properties
2300Creating data file for Unicode Case Mapping Properties
2301Creating data file for Unicode BiDi/Shaping Properties
2302Creating data file for Unicode Normalization
2303Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
2304Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
2305
2306- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
2307  and rebuild the common library
2308
2309*** UCA
2310
2311- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
2312- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
2313- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
2314[ Begin obsolete instructions:
2315  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
2316    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
2317      on Windows:
2318        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
2319        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
2320  End obsolete instructions]
2321- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
2322  not just the *_STUB.txt files
2323- note on intltest: if collate/UCAConformanceTest fails, then
2324  utility/MultithreadTest/TestCollators will fail as well;
2325  fix the conformance test before looking into the multi-thread test
2326
2327*** Implement Cased & Case_Ignorable properties
2328- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
2329- Problem: These properties should be disjoint, but aren't
2330- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
2331- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
2332
2333*** Implement Changes_When_Xyz properties
2334- without stored data
2335
2336*** Implement Name_Alias property
2337- add it as another name field in unames.icu
2338- make it available via u_charName() and UCharNameChoice and
2339- consider it in u_charFromName()
2340
2341*** Break iterators
2342
2343* Update break iterator rules to new UAX versions and new property values
2344* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
2345
2346*** new BidiTest file
2347- review format and data
2348- copy BidiTest.txt to source/test/testdata
2349- write test code using this data
2350- fix ICU code where it fails the conformance test
2351
2352*** Java
2353- generally, find and update code corresponding to C/C++
2354- UCharacter.UnicodeBlock constants:
2355  a) add an _ID integer per new block, update COUNT
2356  b) add a class instance per new block
2357     Visual Studio regex:
2358        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
2359        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
2360- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
2361
2362- port test changes to Java
2363
2364*** LayoutEngine script information
2365
2366(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
2367
2368* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
2369ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
2370ScriptRunData.cpp, which is no longer needed.)
2371
2372The generated files have a current copyright date and "@draft" statement.
2373
2374-> Eric Mader wrote in email on 20090930:
2375    "I think the tool has been modified to update @draft to @stable for
2376     older scripts and to add @draft for new scripts.
2377     (I worked with an intern on this last year.)
2378     You should check the output after you run it."
2379
2380* copy the above files into <icu>/source/layout, replacing the old files.
2381* fix mixed line endings
2382* review the diffs and fix incorrect @draft and missing aliases
2383* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
2384
2385Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2386and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2387
2388-> Eric Mader wrote in email on 20090930:
2389    "This is just a matter of making sure that all the per-script tables have
2390     entries for any new scripts that were added.
2391     If any new Indic characters were added, then the class tables in
2392     IndicClassTables.cpp should be updated to reflect this.
2393     John Emmons should know how to do this if it's required."
2394
2395* rebuild the layout and layoutex libraries.
2396
2397*** Documentation
2398- Update User Guide
2399  + Jamo_Short_Name, sfc->scf, binary property value aliases
2400
2401---------------------------------------------------------------------------- ***
2402
2403Unicode 5.1 update
2404
2405*** related ICU Trac tickets
2406
24075696 Update to Unicode 5.1
2408
2409*** Unicode version numbers
2410- makedata.mak
2411- uchar.h
2412- configure.in & configure
2413- update ucdVersion in gennames.c if an algorithmic range changes
2414
2415*** data files & enums & parser code
2416
2417* file preparation
2418- ucdstrip:
2419    DerivedCoreProperties.txt
2420    DerivedNormalizationProps.txt
2421    NormalizationTest.txt
2422    PropList.txt
2423    Scripts.txt
2424    GraphemeBreakProperty.txt
2425    SentenceBreakProperty.txt
2426    WordBreakProperty.txt
2427- ucdstrip and ucdmerge:
2428    EastAsianWidth.txt
2429    LineBreak.txt
2430
2431* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2432copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
2433copy 5.1.0\ucd\Blocks.txt ..\unidata\
2434copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
2435copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
2436copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2437copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2438copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2439copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2440copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
2441copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
2442copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
2443copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
2444copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
2445
2446ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2447ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2448ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2449ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
2450ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2451ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2452ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2453ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2454ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2455ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2456
2457* genpname
2458- run preparse.pl
2459  + cd \svn\icuproj\icu\uni51\source\tools\genpname
2460  + make sure that data.h is writable
2461  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
2462  + preparse.pl complains with errors like the following:
2463      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
2464    This is because ICU 3.8 had scripts from ISO 15924 which are now
2465    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
2466    and PropertyValueAliases.txt.
2467    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
2468       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
2469  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
2470      N/Y, No/Yes, F/T, False/True
2471    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
2472       It will use further values from the file if present.
2473
2474* uchar.h & uscript.h & uprops.h & uprops.c & genprops
2475- new block & script values
2476  + 17 new blocks
2477  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
2478    (removed from SyntheticPropertyValueAliases.txt)
2479  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
2480    (added to SyntheticPropertyValueAliases.txt)
2481- uprops.icu (uprops.h) only provides 7 bits for script codes.
2482  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
2483  There is none above 127 yet which is the script code for an
2484  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
2485  script code values greater than 127.
2486  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
2487  in a parallel bit field, and that overflows now.
2488  Also, future values >=128 would be incompatible anyway.
2489  uprops.h is modified to move around several of the bit fields
2490  in the properties vector words, and now uses 8 bits for the script code.
2491  Two other bit fields also grow to accommodate future growth:
2492  Block (current count: 172) grows from 8 to 9 bits,
2493  and Word_Break grows from 4 to 5 bits.
2494- renamed property Simple_Case_Folding (sfc->scf)
2495  + nothing to be done: handled as normal alias
2496- new property JSN Jamo_Short_Name
2497  + no new API: only contributes to the Name property
2498- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
2499- new Joining Group (JG) value: Burushashki_Yeh_Barree
2500- new Sentence_Break (SB) values:
2501    SB ; CR        ; CR
2502    SB ; EX        ; Extend
2503    SB ; LF        ; LF
2504    SB ; SC        ; SContinue
2505- new Word_Break (WB) values:
2506    WB ; CR        ; CR
2507    WB ; Extend    ; Extend
2508    WB ; LF        ; LF
2509    WB ; MB        ; MidNumLet
2510
2511* Further changes in the 2008-02-29 update:
2512- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
2513  because they should not normally be invisible.
2514- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
2515- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
2516- new Word_Break (WB) value: NL=Newline
2517
2518* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
2519- Unihan range end moves from 9FBB to 9FC3
2520  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
2521  + do change gennames.c
2522
2523* build Unicode data source code for hardcoding core data
2524C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
2525
2526ICU data make path is \svn\icuproj\icu\uni51\source\data\
2527ICU root path is \svn\icuproj\icu\uni51
2528Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2529Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
2530Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
2531Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
2532Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
2533Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
2534Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
2535Creating data file for Unicode Character Properties
2536Creating data file for Unicode Case Mapping Properties
2537Creating data file for Unicode BiDi/Shaping Properties
2538Creating data file for Unicode Normalization
2539Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
2540Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
2541
2542- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
2543  and rebuild the common library
2544
2545*** Break iterators
2546
2547* Update break iterator rules to new UAX versions and new property values
2548
2549*** UCA
2550
2551* update FractionalUCA.txt and UCARules.txt with new canonical closure
2552
2553*** Test suites
2554- Test that APIs using Unicode property value aliases (like UnicodeSet)
2555  support all of the boolean values N/Y, No/Yes, F/T, False/True
2556  -> TestBinaryValues() tests in both cintltst and intltest
2557
2558*** LayoutEngine script information
2559* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2560ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2561ScriptRunData.cpp, which is no longer needed.)
2562
2563The generated files have a current copyright date and "@draft" statement.
2564
2565* copy the above files into <icu>/source/layout, replacing the old files.
2566
2567Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2568and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2569
2570* rebuild the layout and layoutex libraries.
2571
2572*** Documentation
2573- Update User Guide
2574  + Jamo_Short_Name, sfc->scf, binary property value aliases
2575
2576---------------------------------------------------------------------------- ***
2577
2578Unicode 5.0 update
2579
2580*** related Jitterbugs
2581
25825084 RFE: Update to Unicode 5.0
2583
2584*** data files & enums & parser code
2585
2586* file preparation
2587- ucdstrip:
2588    DerivedCoreProperties.txt
2589    DerivedNormalizationProps.txt
2590    NormalizationTest.txt
2591    PropList.txt
2592    Scripts.txt
2593    GraphemeBreakProperty.txt
2594    SentenceBreakProperty.txt
2595    WordBreakProperty.txt
2596- ucdstrip and ucdmerge:
2597    EastAsianWidth.txt
2598    LineBreak.txt
2599
2600* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
2601copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
2602copy 5.0.0\ucd\Blocks.txt ..\unidata\
2603copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
2604copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
2605copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
2606copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
2607copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
2608copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
2609copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
2610copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
2611copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
2612copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
2613copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
2614
2615ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
2616ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
2617ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
2618ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
2619ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
2620ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
2621ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
2622ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
2623ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
2624ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
2625
2626* update FractionalUCA.txt and UCARules.txt with new canonical closure
2627
2628* genpname
2629- run preparse.pl
2630  + make sure that data.h is writable
2631  + perl preparse.pl \cvs\oss\icu > out.txt
2632
2633* uchar.h & uscript.h & uprops.h & uprops.c & genprops
2634- new block & script values
2635  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
2636
2637* build Unicode data source code for hardcoding core data
2638C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
2639
2640ICU data make path is \cvs\oss\icu\source\data\
2641ICU root path is \cvs\oss\icu
2642Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
2643[etc.]
2644Creating data file for Unicode Character Properties
2645Creating data file for Unicode Case Mapping Properties
2646Creating data file for Unicode BiDi/Shaping Properties
2647Creating data file for Unicode Normalization
2648Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
2649Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
2650
2651- copy the .c source files to C:\cvs\oss\icu\source\common
2652  and rebuild the common library
2653
2654*** Unicode version numbers
2655- makedata.mak
2656- uchar.h
2657- configure.in
2658
2659*** LayoutEngine script information
2660* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
2661ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
2662ScriptRunData.cpp, which is no longer needed.)
2663
2664The generated files have a current copyright date and "@draft" statement.
2665
2666* copy the above files into <icu>/source/layout, replacing the old files.
2667
2668Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
2669and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
2670
2671* rebuild the layout and layoutex libraries.
2672
2673---------------------------------------------------------------------------- ***
2674
2675Unicode 4.1 update
2676
2677*** related Jitterbugs
2678
26794332 RFE: Update to Unicode 4.1
26804157 RBBI, TR29 4.1 updates
2681
2682*** data files & enums & parser code
2683
2684* file preparation
2685- ucdstrip:
2686    DerivedCoreProperties.txt
2687    DerivedNormalizationProps.txt
2688    NormalizationTest.txt
2689    GraphemeBreakProperty.txt
2690    SentenceBreakProperty.txt
2691    WordBreakProperty.txt
2692- ucdstrip and ucdmerge:
2693    EastAsianWidth.txt
2694    LineBreak.txt
2695
2696* add new files to the repository
2697    GraphemeBreakProperty.txt
2698    SentenceBreakProperty.txt
2699    WordBreakProperty.txt
2700
2701* update FractionalUCA.txt and UCARules.txt with new canonical closure
2702
2703* genpname
2704- handle new enumerated properties in sub read_uchar
2705- run preparse.pl
2706
2707* uchar.h & uscript.h & uprops.h & uprops.c & genprops
2708- new binary properties
2709  + Pattern_Syntax
2710  + Pattern_White_Space
2711- new enumerated properties
2712  + Grapheme_Cluster_Break
2713  + Sentence_Break
2714  + Word_Break
2715- new block & script & line break values
2716
2717* gencase
2718- case-ignorable changes
2719  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2720  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
2721
2722*** Unicode version numbers
2723- makedata.mak
2724- uchar.h
2725- configure.in
2726
2727*** tests
2728- verify that u_charMirror() round-trips
2729- test all new properties and some new values of old properties
2730
2731*** other code
2732
2733* hardcoded Unihan range end/limit
2734- Unihan range end moves from 9FA5 to 9FBB
2735  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
2736  + do not modify BOCU/BOCSU code because that would change the encoding
2737    and break binary compatibility!
2738  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
2739    NamePrepProfile.txt
2740  + ignore trietest.c: test data is arbitrary
2741  + ignore tstnorm.cpp: test optimization, not important
2742  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
2743  + do change line_th.txt and word_th.txt
2744    by replacing hardcoded ranges with the new property values
2745  + do change gennames.c
2746
2747source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2748source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
2749source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
2750
2751* case mappings
2752- compare new special casing context conditions with previous ones
2753  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
2754
2755* genpname
2756- consider storing only the short name if it is the same as the long name
2757
2758*** other reviews
2759- UAX #29 changes (grapheme/word/sentence breaks)
2760- UAX #14 changes (line breaks)
2761- Pattern_Syntax & Pattern_White_Space
2762
2763---------------------------------------------------------------------------- ***
2764
2765Unicode 4.0.1 update
2766
2767*** related Jitterbugs
2768
27693170 RFE: Update to Unicode 4.0.1
27703171 Add new Unicode 4.0.1 properties
27713520 use Unicode 4.0.1 updates for break iteration
2772
2773*** data files & enums & parser code
2774
2775* file preparation
2776- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
2777- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
2778
2779* file fixes
2780- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
2781  according to PRI #26
2782  http://www.unicode.org/review/resolved-pri.html#pri26
2783- undone again because no corrigendum in sight;
2784  instead modified tests to not check consistency on this for Unicode 4.0.1
2785
2786* ucdterms.txt
2787- update from http://www.unicode.org/copyright.html
2788  formatted for plain text
2789
2790* uchar.h & uprops.h & uprops.c & genprops
2791- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
2792- add U_LB_INSEPARABLE due to a spelling fix
2793  + put short name comment only on line with new constant
2794    for genpname perl script parser
2795- new binary properties
2796  + STerm
2797  + Variation_Selector
2798
2799* genpname
2800- fix genpname perl script so that it doesn't choke on more than 2 names per property value
2801- perl script: correctly calculate the maximum number of fields per row
2802
2803* uscript.h
2804- new script code Hrkt=Katakana_Or_Hiragana
2805
2806* gennorm.c track changes in DerivedNormalizationProps.txt
2807- "FNC" -> "FC_NFKC"
2808- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
2809
2810* genprops/props2.c track changes in DerivedNumericValues.txt
2811- changed from 3 columns to 2, dropping the numeric type
2812  + assume that the type is always numeric for Han characters,
2813    and that only those are added in addition to what UnicodeData.txt lists
2814
2815*** Unicode version numbers
2816- makedata.mak
2817- uchar.h
2818- configure.in
2819
2820*** tests
2821- update test of default bidi classes according to PRI #28
2822  /tsutil/cucdtst/TestUnicodeData
2823  http://www.unicode.org/review/resolved-pri.html#pri28
2824- bidi tests: change exemplar character for ES depending on Unicode version
2825- change hardcoded expected property values where they change
2826
2827*** other code
2828
2829* name matching
2830- read UCD.html
2831
2832* scripts
2833- use new Hrkt=Katakana_Or_Hiragana
2834
2835* ZWJ & ZWNJ
2836- are now part of combining character sequences
2837- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
2838