1* Copyright (C) 2016 and later: Unicode, Inc. and others. 2* License & terms of use: http://www.unicode.org/copyright.html 3* Copyright (C) 2004-2016, International Business Machines 4* Corporation and others. All Rights Reserved. 5* 6* file name: changes.txt 7* encoding: US-ASCII 8* tab size: 8 (not used) 9* indentation:4 10* 11* created on: 2004may06 12* created by: Markus W. Scherer 13* 14* change log for Unicode updates 15* 16* For each new Unicode version, during the beta period, 17* I copy the change log for the previous version to the top of this file. 18* I adjust the versions, tickets, URLs, and paths. 19* I work my way through the steps listed in the log, top to bottom, 20* adjusting the log as necessary. 21* I report problems to the UTC and/or CLDR and/or ICU. 22* Before the data is final, I "turn the crank" several more times, 23* using appropriate subsets of the steps. 24 25---------------------------------------------------------------------------- *** 26 27* New ISO 15924 script codes 28 29Starting with ICU 55, we do not add UScriptCode constants for new scripts any more 30until they are encoded in Unicode, 31or can be assumed to be encoded in the next Unicode version. 32Script enum constant names want to follow the Unicode script property value aliases, 33which are assigned only when the scripts are encoded. 34When we encode scripts early and guess wrong, then we have confusing enum constants 35and have sometimes added aliases. 36 37Variant script codes like Latf and Aran that are not subject to separate encoding 38can be added at any time. 39(For example, Aran could be added as USCRIPT_ARABIC_NASTALIQ.) 40 41We add script codes used in CLDR or in the spoof checker. 42This includes combination/alias codes like Hanb and Jamo. 43See http://unicode.org/reports/tr35/#unicode_script_subtag_validity 44and look for "alias" on http://unicode.org/iso15924/iso15924-codes.html 45 46We add special Z* script codes like Zsye. 47 48For new script codes see http://www.unicode.org/iso15924/codechanges.html 49 50---------------------------------------------------------------------------- *** 51 52Unicode 10.0 update for ICU 60 53 54http://www.unicode.org/versions/Unicode10.0.0/ 55http://www.unicode.org/versions/beta-10.0.0.html 56http://blog.unicode.org/2017/03/unicode-100-beta-review.html 57http://www.unicode.org/review/pri350/ 58http://www.unicode.org/reports/uax-proposed-updates.html 59http://www.unicode.org/reports/tr44/tr44-19.html 60 61* Command-line environment setup 62 63UNICODE_DATA=~/unidata/uni10/20170605 64CLDR_SRC=~/svn.cldr/uni10 65ICU_ROOT=~/svn.icu/uni10 66ICU_SRC=$ICU_ROOT/src 67ICUDT=icudt60b 68ICU4C_DATA_IN=$ICU_SRC/icu4c/source/data/in 69ICU4C_UNIDATA=$ICU_SRC/icu4c/source/data/unidata 70export LD_LIBRARY_PATH=$ICU_ROOT/dbg/icu4c/lib 71 72*** ICU Trac 73 74- ticket:12985: Unicode 10 75- ticket:13061: undo hacks from emoji 5.0 update 76- ticket:13062: add Emoji_Component property 77- ^/branches/markus/uni10 78 79*** CLDR Trac 80 81- cldrbug 10055: Unicode 10 82- cldrbug 9882: Unicode 10 script metadata 83- cldrbug 10219: numbering systems for Unicode 10 84 85*** Unicode version numbers 86- makedata.mak 87- uchar.h 88- com.ibm.icu.util.VersionInfo 89- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 90 91- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 92 so that the makefiles see the new version number. 93 94*** data files & enums & parser code 95 96* download files 97- mkdir -p $UNICODE_DATA 98- download Unicode 10.0 files into $UNICODE_DATA 99 + subfolders: ucd, uca, idna, security 100 + inside ucd: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip 101- download emoji 5.0 files into $UNICODE_DATA/emoji 102 103* for manual diffs: remove version suffixes from the file names 104 ~$ unidata/desuffixucd.py $UNICODE_DATA 105 (see https://sites.google.com/site/unicodetools/inputdata) 106 107* process and/or copy files 108- $ICU_SRC/tools/unicode$ py/preparseucd.py $UNICODE_DATA $ICU_SRC 109 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 110 + For debugging, and tweaking how ppucd.txt is written, 111 the tool has an --only_ppucd option: 112 py/preparseucd.py $UNICODE_DATA --only_ppucd path/to/ppucd/outputfile 113 114- cp $UNICODE_DATA/security/confusables.txt $ICU4C_UNIDATA 115 116* build ICU (make install) 117 so that the tools build can pick up the new definitions from the installed header files. 118 119 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date 120 121* preparseucd.py changes 122- remove or add new Unicode scripts from/to the 123 only-in-ISO-15924 list according to the error messages: 124 ValueError: remove ['Nshu'] from _scripts_only_in_iso15924 125 -> adjust _scripts_only_in_iso15924 as indicated 126- fix other errors 127 Exception: no default values (@missing lines) for some Catalog or Enumerated properties: [u'vo'] 128 -> add vo=Vertical_Orientation to _ignored_properties 129 -> later removed again, parsing the file, even though we do not yet store data for runtime use 130 131* new constants for new property values 132- preparseucd.py error: 133 ValueError: missing uchar.h enum constants for some property values: 134 [(u'blk', set([u'Zanabazar_Square', u'Nushu', u'CJK_Ext_F', 135 u'Kana_Ext_A', u'Syriac_Sup', u'Masaram_Gondi', u'Soyombo'])), 136 (u'jg', set([u'Malayalam_Bha', u'Malayalam_Llla', u'Malayalam_Nya', u'Malayalam_Lla', 137 u'Malayalam_Nga', u'Malayalam_Ssa', u'Malayalam_Tta', u'Malayalam_Ra', 138 u'Malayalam_Nna', u'Malayalam_Ja', u'Malayalam_Nnna'])), 139 (u'sc', set([u'Soyo', u'Gonm', u'Zanb']))] 140 = PropertyValueAliases.txt new property values (diff old & new .txt files) 141 blk; CJK_Ext_F ; CJK_Unified_Ideographs_Extension_F 142 blk; Kana_Ext_A ; Kana_Extended_A 143 blk; Masaram_Gondi ; Masaram_Gondi 144 blk; Nushu ; Nushu 145 blk; Soyombo ; Soyombo 146 blk; Syriac_Sup ; Syriac_Supplement 147 blk; Zanabazar_Square ; Zanabazar_Square 148 -> add to uchar.h 149 use long property names for enum constants, 150 for the trailing comment get the block start code point: diff old & new Blocks.txt 151 -> add to UCharacter.UnicodeBlock IDs 152 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 153 replace public static final int \1_ID = \2; \3 154 -> add to UCharacter.UnicodeBlock objects 155 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 156 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 157 158 jg ; Malayalam_Bha ; Malayalam_Bha 159 jg ; Malayalam_Ja ; Malayalam_Ja 160 jg ; Malayalam_Lla ; Malayalam_Lla 161 jg ; Malayalam_Llla ; Malayalam_Llla 162 jg ; Malayalam_Nga ; Malayalam_Nga 163 jg ; Malayalam_Nna ; Malayalam_Nna 164 jg ; Malayalam_Nnna ; Malayalam_Nnna 165 jg ; Malayalam_Nya ; Malayalam_Nya 166 jg ; Malayalam_Ra ; Malayalam_Ra 167 jg ; Malayalam_Ssa ; Malayalam_Ssa 168 jg ; Malayalam_Tta ; Malayalam_Tta 169 -> uchar.h & UCharacter.JoiningGroup 170 171 sc ; Gonm ; Masaram_Gondi 172 sc ; Nshu ; Nushu 173 sc ; Soyo ; Soyombo 174 sc ; Zanb ; Zanabazar_Square 175 -> uscript.h & com.ibm.icu.lang.UScript 176 -> Nushu had been added already 177 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 178 and in com.ibm.icu.dev.test.lang.TestUScript.java 179 180* New properties as shown in PropertyValueAliases.txt changes 181- boolean Emoji_Component from emoji 5 182 -> uchar.h & UProperty.java 183- boolean 184 # Regional_Indicator (RI) 185 186 RI ; N ; No ; F ; False 187 RI ; Y ; Yes ; T ; True 188 -> uchar.h & UProperty.java 189 -> single immutable range, to be hardcoded 190- boolean 191 # Prepended_Concatenation_Mark (PCM) 192 193 PCM; N ; No ; F ; False 194 PCM; Y ; Yes ; T ; True 195 -> was new in Unicode 9 196 -> uchar.h & UProperty.java 197- enumerated 198 # Vertical_Orientation (vo) 199 200 vo ; R ; Rotated 201 vo ; Tr ; Transformed_Rotated 202 vo ; Tu ; Transformed_Upright 203 vo ; U ; Upright 204 -> only pre-parsed for now, but not yet stored for runtime use 205 206* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 207 (not strictly necessary for NOT_ENCODED scripts) 208 $ICU_SRC/tools/unicode$ py/parsescriptmetadata.py $ICU_SRC/icu4c/source/common/unicode/uscript.h $CLDR_SRC/common/properties/scriptMetadata.txt 209 210* generate normalization data files 211 cd $ICU_ROOT/dbg/icu4c 212 bin/gennorm2 -o $ICU_SRC/icu4c/source/common/norm2_nfc_data.h -s $ICU4C_UNIDATA/norm2 nfc.txt --csource 213 bin/gennorm2 -o $ICU4C_DATA_IN/nfc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt 214 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt 215 bin/gennorm2 -o $ICU4C_DATA_IN/nfkc_cf.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 216 bin/gennorm2 -o $ICU4C_DATA_IN/uts46.nrm -s $ICU4C_UNIDATA/norm2 nfc.txt uts46.txt 217 218* build ICU (make install) 219 so that the tools build can pick up the new definitions from the installed header files. 220 221 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date 222 223* build Unicode tools using CMake+make 224 225$ICU_SRC/tools/unicode/c/icudefs.txt: 226 227# Location (--prefix) of where ICU was installed. 228set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c) 229# Location of the ICU4C source tree. 230set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c) 231 232 $ICU_ROOT/dbg/tools/unicode/c$ 233 cmake ../../../../src/tools/unicode/c 234 make 235 236* generate core properties data files 237 $ICU_ROOT/dbg/tools/unicode/c$ 238 genprops/genprops $ICU_SRC/icu4c 239 genuca/genuca --hanOrder implicit $ICU_SRC/icu4c 240 genuca/genuca --hanOrder radical-stroke $ICU_SRC/icu4c 241- rebuild ICU (make install) & tools 242 243* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 244 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 245- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 246- Unicode 6.0..10.0: U+2260, U+226E, U+226F 247- nothing new in this Unicode version, no test file to update 248 249* run & fix ICU4C tests 250- Andy handles RBBI & spoof check test failures 251 252* collation: CLDR collation root, UCA DUCET 253 254- UCA DUCET goes into Mark's Unicode tools, see 255 https://sites.google.com/site/unicodetools/home#TOC-UCA 256- CLDR root data files are checked into $CLDR_SRC/common/uca/ 257 cp (Unicode Tools UCA generated)/CollationAuxiliary/* $CLDR_SRC/common/uca/ 258 259- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 260 cp $CLDR_SRC/common/uca/FractionalUCA_SHORT.txt $ICU4C_UNIDATA/FractionalUCA.txt 261- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 262 cp $ICU4C_UNIDATA/UCARules.txt /tmp/UCARules-old.txt 263 (note removing the underscore before "Rules") 264 cp $CLDR_SRC/common/uca/UCA_Rules_SHORT.txt $ICU4C_UNIDATA/UCARules.txt 265- restore TODO diffs in UCARules.txt 266 meld /tmp/UCARules-old.txt $ICU4C_UNIDATA/UCARules.txt 267- update (ICU4C)/source/test/testdata/CollationTest_*.txt 268 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 269 from the CLDR root files (..._CLDR_..._SHORT.txt) 270 cp $CLDR_SRC/common/uca/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt 271 cp $CLDR_SRC/common/uca/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC/icu4c/source/test/testdata/CollationTest_SHIFTED_SHORT.txt 272 cp $ICU_SRC/icu4c/source/test/testdata/CollationTest_*.txt $ICU_SRC/icu4j/main/tests/collate/src/com/ibm/icu/dev/data 273- if CLDR common/uca/unihan-index.txt changes, then update 274 CLDR common/collation/root.xml <collation type="private-unihan"> 275 and regenerate (or update in parallel) $ICU_SRC/icu4c/source/data/coll/root.txt 276 277- run genuca, see command line above; 278 deal with 279 Error: Unknown script for first-primary sample character U+11D10 on line 28117 of /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/unidata/FractionalUCA.txt: 280 FDD1 11D10; [70 D5 02, 05, 05] # Masaram_Gondi first primary (compressible) 281 (add the character to genuca.cpp sampleCharsToScripts[]) 282 + look up the USCRIPT_ code for the new sample characters 283 (should be obvious from the comment in the error output) 284 + *add* mappings to sampleCharsToScripts[], do not replace them 285 (in case the script sample characters flip-flop) 286 + insert new scripts in DUCET script order, see the top_byte table 287 at the beginning of FractionalUCA.txt 288- rebuild ICU4C 289 290* Unihan collators 291 https://sites.google.com/site/unicodetools/unihan 292- run Unicode Tools 293 org.unicode.draft.GenerateUnihanCollators 294 with VM arguments 295 -ea 296 -DSVN_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools/trunk 297 -DOTHER_WORKSPACE=/usr/local/google/home/mscherer/svn.unitools 298 -DUCD_DIR=/usr/local/google/home/mscherer/svn.unitools/trunk/data 299 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10 300 -DUVERSION=10.0.0 301- run Unicode Tools 302 org.unicode.draft.GenerateUnihanCollatorFiles 303 with the same arguments 304- check CLDR diffs 305 cd $CLDR_SRC 306 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml 307 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml 308- copy to CLDR 309 cd $CLDR_SRC 310 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml 311 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml 312- run CLDR unit tests, commit to CLDR 313- generate ICU zh collation data: run CLDR 314 org.unicode.cldr.icu.NewLdml2IcuConverter 315 with program arguments 316 -t collation 317 -s /usr/local/google/home/mscherer/svn.cldr/uni10/common/collation 318 -m /usr/local/google/home/mscherer/svn.cldr/uni10/common/supplemental 319 -d /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/coll 320 -p /usr/local/google/home/mscherer/svn.icu/uni10/src/icu4c/source/data/xml/collation 321 zh 322 and VM arguments 323 -ea 324 -DCLDR_DIR=/usr/local/google/home/mscherer/svn.cldr/uni10 325- rebuild ICU4C 326 327* run & fix ICU4C tests, now with new CLDR collation root data 328- run all tests with the collation test data *_SHORT.txt or the full files 329 (the full ones have comments, useful for debugging) 330- note on intltest: if collate/UCAConformanceTest fails, then 331 utility/MultithreadTest/TestCollators will fail as well; 332 fix the conformance test before looking into the multi-thread test 333 334* update Java data files 335- refresh just the UCD/UCA-related/derived files, just to be safe 336- see (ICU4C)/source/data/icu4j-readme.txt 337- mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 338- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 339 output: 340 ... 341 Unicode .icu files built to ./out/build/icudt60l 342 echo timestamp > uni-core-data 343 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt60b 344 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b 345 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt 346 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt60l.dat ./out/icu4j/icudt60b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt60l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt60b 347 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt60b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt60b" 348 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt60b/ 349 mkdir -p /tmp/icu4j/main/shared/data 350 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 351 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt60b/ 352 mkdir -p /tmp/icu4j/main/shared/data 353 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 354 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/uni10/dbg/icu4c/data' 355- copy the big-endian Unicode data files to another location, 356 separate from the other data files, 357 and then refresh ICU4J 358 cd $ICU_ROOT/dbg/icu4c/data/out/icu4j 359 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 360 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 361 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 362 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 363 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu 364 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 365 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 366 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 367 jar uvf $ICU_SRC/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 368 369* When refreshing all of ICU4J data from ICU4C 370- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 371- cp /tmp/icu4j/main/shared/data/icudata.jar $ICU_SRC/icu4j/main/shared/data 372or 373- $ICU_ROOT/dbg/icu4c$ make ICU4J_ROOT=$ICU_SRC/icu4j icu4j-data-install 374 375* update CollationFCD.java 376 + copy & paste the initializers of lcccIndex[] etc. from 377 ICU4C/source/i18n/collationfcd.cpp to 378 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java 379 380* refresh Java test .txt files 381- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 382 cd $ICU_SRC/icu4c/source/data/unidata 383 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode 384 cd ../../test/testdata 385 cp BidiCharacterTest.txt BidiTest.txt IdnaTest.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode 386 cp $UNICODE_DATA/ucd/CompositionExclusions.txt $ICU_SRC/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode 387 388* run & fix ICU4J tests 389 390*** API additions 391- send notice to icu-design about new born-@stable API (enum constants etc.) 392 393*** CLDR numbering systems 394- look for new sets of decimal digits (gc=ND & nv=4) and submit a CLDR ticket 395 Unicode 10: http://unicode.org/cldr/trac/ticket/10219 396 Unicode 9: http://unicode.org/cldr/trac/ticket/9692 397 398*** merge the Unicode update branches back onto the trunk 399- do not merge the icudata.jar and testdata.jar, 400 instead rebuild them from merged & tested ICU4C 401- make sure that changes to Unicode tools are checked in: 402 http://www.unicode.org/utility/trac/log/trunk/unicodetools 403 404---------------------------------------------------------------------------- *** 405 406Emoji 5.0 update for ICU 59 407- ICU 59 mostly remains on Unicode 9.0 408- except updates bidi and segmentation data to Unicode 10 beta 409 410First run of tools on combined icu4c/icu4j/tools trunk after svn repository reorg. 411 412* Command-line environment setup 413 414ICU_ROOT=~/svn.icu/trunk 415ICU_SRC_DIR=$ICU_ROOT/src 416ICU4C_SRC_DIR=$ICU_SRC_DIR/icu4c 417ICUDT=icudt59b 418export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 419SRC_DATA_IN=$ICU4C_SRC_DIR/source/data/in 420UNIDATA=$ICU4C_SRC_DIR/source/data/unidata 421 422*** ICU Trac 423 424- ticket:12900: take Emoji 5.0 properties data into ICU 59 once it's released 425- changes directly on trunk 426 427*** data files & enums & parser code 428 429* download files 430 431- download Unicode 9.0 files into a uni90e50 folder: ucd, idna, security (skip uca) 432- download emoji 5.0 beta files into the same uni90e50 folder 433- download Unicode 10.0 beta files: ucd 434 + copy Unicode 10 bidi files to the uni90e50/ucd folder: 435 BidiBrackets.txt 436 BidiCharacterTest.txt 437 BidiMirroring.txt 438 BidiTest.txt 439 extracted/DerivedBidiClass.txt 440 + copy Unicode 10 segmentation files to the uni90e50/ucd folder: 441 LineBreak.txt 442 auxiliary/* 443 444* preparseucd.py changes 445- adjust for combined trunks 446- write new copyright lines 447- ignore new Emoji_Component property for now 448 449* process and/or copy files 450- ~/svn.icu/trunk/src/tools/unicode$ py/preparseucd.py ~/unidata/uni90e50/20170322 $ICU_SRC_DIR 451 + This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 452 453- cp ~/unidata/uni90e50/20170322/security/confusables.txt $UNIDATA 454 455* build ICU (make install) 456 so that the tools build can pick up the new definitions from the installed header files. 457 458 $ICU_ROOT/dbg/icu4c$ echo;echo; make -j7 install > out.txt 2>&1 ; tail -n 30 out.txt ; date 459 460* build Unicode tools using CMake+make 461 462~/svn.icu/trunk/src/tools/unicode/c/icudefs.txt: 463 464# Location (--prefix) of where ICU was installed. 465set(ICU_INST_DIR /usr/local/google/home/mscherer/svn.icu/trunk/inst/icu4c) 466# Location of the ICU4C source tree. 467set(ICU4C_SRC_DIR /usr/local/google/home/mscherer/svn.icu/trunk/src/icu4c) 468 469 ~/svn.icu/trunk/dbg/tools/unicode/c$ 470 cmake ../../../../src/tools/unicode/c 471 make 472 473* generate core properties data files 474 ~/svn.icu/trunk/dbg/tools/unicode/c$ 475 genprops/genprops $ICU4C_SRC_DIR 476- rebuild ICU (make install) & tools 477 478* run & fix ICU4C tests 479- Andy handles RBBI & spoof check test failures 480 481* update Java data files 482- refresh just the UCD/UCA-related/derived files, just to be safe 483- see (ICU4C)/source/data/icu4j-readme.txt 484- mkdir /tmp/icu4j 485- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 486 output: 487 ... 488 Unicode .icu files built to ./out/build/icudt59l 489 echo timestamp > uni-core-data 490 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt59b 491 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b 492 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt 493 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt59l.dat ./out/icu4j/icudt59b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt59l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt59b 494 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt59b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt59b" 495 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt59b/ 496 mkdir -p /tmp/icu4j/main/shared/data 497 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 498 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt59b/ 499 mkdir -p /tmp/icu4j/main/shared/data 500 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 501 make[1]: Leaving directory `/usr/local/google/home/mscherer/svn.icu/trunk/dbg/icu4c/data' 502- copy the big-endian Unicode data files to another location, 503 separate from the other data files, 504 and then refresh ICU4J 505 cd ~/svn.icu/trunk/dbg/icu4c/data/out/icu4j 506 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 507 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 508 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 509 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu 510 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 511 jar uvf ~/svn.icu/trunk/src/icu4j/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 512 513* When refreshing all of ICU4J data from ICU4C 514- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 515- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu/trunk/src/icu4j/main/shared/data 516or 517- ~/svn.icu/trunk/dbg/icu4c$ make ICU4J_ROOT=~/svn.icu/trunk/src/icu4j icu4j-data-install 518 519* refresh Java test .txt files 520- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 521 cd $ICU4C_SRC_DIR/source/data/unidata 522 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode 523 cd ../../test/testdata 524 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode 525 cp ~/unidata/uni90e50/20170322/ucd/CompositionExclusions.txt ~/svn.icu/trunk/src/icu4j/main/tests/core/src/com/ibm/icu/dev/data/unicode 526 527* run & fix ICU4J tests 528 529---------------------------------------------------------------------------- *** 530 531Unicode 9.0 update for ICU 58 532 533* Command-line environment setup 534 535ICU_ROOT=~/svn.icu/trunk 536ICU_SRC_DIR=$ICU_ROOT/src 537ICUDT=icudt58b 538export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 539SRC_DATA_IN=$ICU_SRC_DIR/source/data/in 540UNIDATA=$ICU_SRC_DIR/source/data/unidata 541 542http://www.unicode.org/review/pri323/ -- beta review 543http://www.unicode.org/reports/uax-proposed-updates.html 544http://www.unicode.org/versions/beta-9.0.0.html 545http://www.unicode.org/versions/Unicode9.0.0/ 546http://www.unicode.org/reports/tr44/tr44-17.html 547 548*** ICU Trac 549 550- ticket:12526: integrate Unicode 9 551- C++ ^/icu/branches/markus/uni90, ^/icu/branches/markus/uni90b 552- Java ^/icu4j/branches/markus/uni90, ^/icu4j/branches/markus/uni90b 553 554*** CLDR Trac 555 556- cldrbug 9414: UCA 9 557- ^/branches/markus/uni90 at r11518 from trunk at r11517 558 559- cldrbug 8745: Unicode 9.0 script metadata 560 561*** Unicode version numbers 562- makedata.mak 563- uchar.h 564- com.ibm.icu.util.VersionInfo 565- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 566 567- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 568 so that the makefiles see the new version number. 569 570*** data files & enums & parser code 571 572* file preparation 573 574- download UCD & IDNA files 575- make sure that the Unicode data folder passed into preparseucd.py 576 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 577- only for manual diffs: remove version suffixes from the file names 578 ~/unidata/uni70/20140403$ ../../desuffixucd.py . 579 (see https://sites.google.com/site/unicodetools/inputdata) 580- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip 581- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni90/20160603 $ICU_SRC_DIR ~/svn.icutools/trunk/src 582- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 583 584- also: from http://unicode.org/Public/security/9.0.0/ download new confusables.txt 585 and copy to $UNIDATA 586 cp ~/unidata/uni90/20160603/security/confusables.txt $UNIDATA 587 588* preparseucd.py changes 589- remove or add new Unicode scripts from/to the 590 only-in-ISO-15924 list according to the error messages: 591 ValueError: remove ['Tang'] from _scripts_only_in_iso15924 592 ValueError: sc = Hanb (uchar.h USCRIPT_HAN_WITH_BOPOMOFO) not in the UCD 593 ValueError: sc = Jamo (uchar.h USCRIPT_JAMO) not in the UCD 594 ValueError: sc = Zsye (uchar.h USCRIPT_SYMBOLS_EMOJI) not in the UCD 595 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 596 and in com.ibm.icu.dev.test.lang.TestUScript.java 597- DerivedNumericValues.txt new numeric values 598 0D58 ; 0.00625 ; ; 1/160 # No MALAYALAM FRACTION ONE ONE-HUNDRED-AND-SIXTIETH 599 0D59 ; 0.025 ; ; 1/40 # No MALAYALAM FRACTION ONE FORTIETH 600 0D5A ; 0.0375 ; ; 3/80 # No MALAYALAM FRACTION THREE EIGHTIETHS 601 0D5B ; 0.05 ; ; 1/20 # No MALAYALAM FRACTION ONE TWENTIETH 602 0D5D ; 0.15 ; ; 3/20 # No MALAYALAM FRACTION THREE TWENTIETHS 603 -> change uprops.h, corepropsbuilder.cpp/encodeNumericValue(), 604 uchar.c, UCharacterProperty.java 605 to support a new series of values 606- adjust preparseucd.py for Tangut algorithmic names 607 in ppucd.txt: 608 algnamesrange;17000..187EC;han;CJK UNIFIED IDEOGRAPH- 609 -> 610 algnamesrange;17000..187EC;han;TANGUT IDEOGRAPH- 611- avoid block-compressing most String/Miscellaneous property values, 612 triggered by genprops not coping with a multi-code point Case_Folding on 613 block;1C80..1C8F;...;Cased;cf=0442;CWCF;... 614 keep block-compressing empty-string mappings NFKC_CF="" for tags and variation selectors 615 616* PropertyAliases.txt changes 617- 1 new property PCM=Prepended_Concatenation_Mark 618 Ignore: Only useful for layout engines. 619 Ok to list in ppucd.txt. 620 621* PropertyValueAliases.txt new property values 622 blk; Adlam ; Adlam 623 blk; Bhaiksuki ; Bhaiksuki 624 blk; Cyrillic_Ext_C ; Cyrillic_Extended_C 625 blk; Glagolitic_Sup ; Glagolitic_Supplement 626 blk; Ideographic_Symbols ; Ideographic_Symbols_And_Punctuation 627 blk; Marchen ; Marchen 628 blk; Mongolian_Sup ; Mongolian_Supplement 629 blk; Newa ; Newa 630 blk; Osage ; Osage 631 blk; Tangut ; Tangut 632 blk; Tangut_Components ; Tangut_Components 633 -> add to uchar.h 634 use long property names for enum constants 635 -> add to UCharacter.UnicodeBlock IDs 636 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 637 replace public static final int \1_ID = \2; \3 638 -> add to UCharacter.UnicodeBlock objects 639 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 640 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 641 642 GCB; EB ; E_Base 643 GCB; EBG ; E_Base_GAZ 644 GCB; EM ; E_Modifier 645 GCB; GAZ ; Glue_After_Zwj 646 GCB; ZWJ ; ZWJ 647 -> uchar.h & UCharacter.GraphemeClusterBreak 648 649 jg ; African_Feh ; African_Feh 650 jg ; African_Noon ; African_Noon 651 jg ; African_Qaf ; African_Qaf 652 -> uchar.h & UCharacter.JoiningGroup 653 654 lb ; EB ; E_Base 655 lb ; EM ; E_Modifier 656 lb ; ZWJ ; ZWJ 657 -> uchar.h & UCharacter.LineBreak 658 659 sc ; Adlm ; Adlam 660 sc ; Bhks ; Bhaiksuki 661 sc ; Marc ; Marchen 662 sc ; Newa ; Newa 663 sc ; Osge ; Osage 664 sc ; Tang ; Tangut 665 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript 666 667 WB ; EB ; E_Base 668 WB ; EBG ; E_Base_GAZ 669 WB ; EM ; E_Modifier 670 WB ; GAZ ; Glue_After_Zwj 671 WB ; ZWJ ; ZWJ 672 -> uchar.h & UCharacter.WordBreak 673 674* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 675 (not strictly necessary for NOT_ENCODED scripts) 676 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt 677 678* generate normalization data files 679 cd $ICU_ROOT/dbg 680 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource 681 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 682 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 683 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 684 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 685 686* build ICU (make install) 687 so that the tools build can pick up the new definitions from the installed header files. 688 689 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 30 out.txt 690 691* build Unicode tools using CMake+make 692 693~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 694 695 # Location (--prefix) of where ICU was installed. 696 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst) 697 # Location of the ICU source tree. 698 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src) 699 700 ~/svn.icutools/trunk/dbg/unicode/c$ 701 cmake ../../../src/unicode/c 702 make 703 704* generate core properties data files 705 ~/svn.icutools/trunk/dbg/unicode/c$ 706 genprops/genprops $ICU_SRC_DIR 707 genuca/genuca --hanOrder implicit $ICU_SRC_DIR 708 genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR 709- rebuild ICU (make install) & tools 710 711* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 712 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 713- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 714- Unicode 6.0..9.0: U+2260, U+226E, U+226F 715- nothing new in 9.0, no test file to update 716 717* run & fix ICU4C tests 718- Andy handles RBBI & spoof check test failures 719 720* collation: CLDR collation root, UCA DUCET 721 722- UCA DUCET goes into Mark's Unicode tools, see 723 https://sites.google.com/site/unicodetools/home#TOC-UCA 724- CLDR root data files are checked into (CLDR UCA branch)/common/uca/ 725 cp (UCA generated)/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/ 726 727- cd (CLDR UCA branch)/common/uca/ 728- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 729 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt 730- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 731 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt 732 (note removing the underscore before "Rules") 733 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt 734- restore TODO diffs in UCARules.txt 735 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt 736- update (ICU4C)/source/test/testdata/CollationTest_*.txt 737 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 738 from the CLDR root files (..._CLDR_..._SHORT.txt) 739 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt 740 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt 741 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data 742- if CLDR common/uca/unihan-index.txt changes, then update 743 CLDR common/collation/root.xml <collation type="private-unihan"> 744 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt 745 746- run genuca, see command line above; 747 deal with 748 Error: Unknown script for first-primary sample character U+104B5 on line 32599 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt: 749 FDD1 104B5; [75 B8 02, 05, 05] # Osage first primary (compressible) 750 (add the character to genuca.cpp sampleCharsToScripts[]) 751 + look up the USCRIPT_ code for the new sample characters 752 (should be obvious from the comment in the error output) 753 + *add* mappings to sampleCharsToScripts[], do not replace them 754 (in case the script sample characters flip-flop) 755 + insert new scripts in DUCET script order, see the top_byte table 756 at the beginning of FractionalUCA.txt 757- rebuild ICU4C 758 759* Unihan collators 760- run Unicode Tools 761 org.unicode.draft.GenerateUnihanCollators 762 with VM arguments 763 -DSVN_WORKSPACE=/home/mscherer/svn.unitools/trunk 764 -DOTHER_WORKSPACE=/home/mscherer/svn.unitools 765 -DUCD_DIR=/home/mscherer/svn.unitools/trunk/data 766 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk 767 -DUVERSION=9.0.0 768 -ea 769- run Unicode Tools 770 org.unicode.draft.GenerateUnihanCollatorFiles 771 with the same arguments 772- check CLDR diffs 773 cd ~/svn.cldr/trunk 774 meld common/collation/zh.xml ../Generated/cldr/han/replace/zh.xml 775 meld common/transforms/Han-Latin.xml ../Generated/cldr/han/replace/Han-Latin.xml 776- copy to CLDR 777 cd ~/svn.cldr/trunk 778 cp ../Generated/cldr/han/replace/zh.xml common/collation/zh.xml 779 cp ../Generated/cldr/han/replace/Han-Latin.xml common/transforms/Han-Latin.xml 780- commit to CLDR 781- generate ICU zh collation data: run CLDR 782 org.unicode.cldr.icu.NewLdml2IcuConverter 783 with program arguments 784 -t collation 785 -s /home/mscherer/svn.cldr/trunk/common/collation 786 -m /home/mscherer/svn.cldr/trunk/common/supplemental 787 -d /home/mscherer/svn.icu/trunk/src/source/data/coll 788 -p /home/mscherer/svn.icu/trunk/src/source/data/xml/collation 789 zh 790 and VM arguments 791 -DCLDR_DIR=/home/mscherer/svn.cldr/trunk 792- rebuild ICU4C 793 794* run & fix ICU4C tests, now with new CLDR collation root data 795- run all tests with the collation test data *_SHORT.txt or the full files 796 (the full ones have comments, useful for debugging) 797- note on intltest: if collate/UCAConformanceTest fails, then 798 utility/MultithreadTest/TestCollators will fail as well; 799 fix the conformance test before looking into the multi-thread test 800 801* update Java data files 802- refresh just the UCD/UCA-related/derived files, just to be safe 803- see (ICU4C)/source/data/icu4j-readme.txt 804- mkdir /tmp/icu4j 805- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 806 output: 807 ... 808 Unicode .icu files built to ./out/build/icudt58l 809 echo timestamp > uni-core-data 810 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt58b 811 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b 812 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt 813 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt58l.dat ./out/icu4j/icudt58b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt58l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt58b 814 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt58b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt58b" 815 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt58b/ 816 mkdir -p /tmp/icu4j/main/shared/data 817 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 818 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt58b/ 819 mkdir -p /tmp/icu4j/main/shared/data 820 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 821 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data' 822- copy the big-endian Unicode data files to another location, 823 separate from the other data files, 824 and then refresh ICU4J 825 cd ~/svn.icu/trunk/dbg/data/out/icu4j 826 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 827 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 828 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 829 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 830 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu 831 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 832 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 833 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 834 jar uvf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 835 836* When refreshing all of ICU4J data from ICU4C 837- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 838- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 839or 840- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 841 842* update CollationFCD.java 843 + copy & paste the initializers of lcccIndex[] etc. from 844 ICU4C/source/i18n/collationfcd.cpp to 845 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java 846 847* refresh Java test .txt files 848- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 849 cd $ICU_SRC_DIR/source/data/unidata 850 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 851 cd ../../test/testdata 852 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 853 cp ~/unidata/uni90/20160603/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 854 855* run & fix ICU4J tests 856 857*** LayoutEngine script information 858 859* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 860 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 861 in the working directory. 862 863 (It also generates ScriptRunData.cpp, which is no longer needed.) 864 865 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages 866 (a plain text file) 867 which maps ICU versions to the numbers of script/language constants 868 that were added then. 869 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.) 870 871 The generated files have a current copyright date and "@deprecated" statement. 872 873* Review changes, fix Java tool if necessary, and copy to ICU4C 874 cd ~/svn.icu4j/trunk/src 875 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 876 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout 877 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout 878 879*** API additions 880- send notice to icu-design about new born-@stable API (enum constants etc.) 881 882*** merge the Unicode update branches back onto the trunk 883- do not merge the icudata.jar and testdata.jar, 884 instead rebuild them from merged & tested ICU4C 885- make sure that changes to Unicode tools & ICU tools are checked in 886 http://www.unicode.org/utility/trac/log/trunk/unicodetools 887 http://bugs.icu-project.org/trac/log/tools/trunk 888 889---------------------------------------------------------------------------- *** 890 891New script codes early in ICU 58: http://bugs.icu-project.org/trac/ticket/11764 892 893Adding 894- new scripts in Unicode 9: Adlm, Bhks, Marc, Newa, Osge 895- new combination/alias codes: Hanb, Jamo 896 - used in CLDR 29 and in spoof checker 897- new Z* code: Zsye 898 899Add new codes to uscript.h & UScript.java, see Unicode update logs. 900 -> com.ibm.icu.lang.UScript 901 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 902 replace public static final int \1 = \2; \3 903 904Manually edit ppucd.txt and icutools:unicode/c/genprops/pnames_data.h, 905add new script codes. 906"Long" script names only where established in Unicode 9 PropertyValueAliases.txt. 907 908Note: If we have to run preparseucd.py again before the Unicode 9 update, 909then we need to manually keep/restore the new script codes. 910 911ICU_ROOT=~/svn.icu/trunk 912ICU_SRC_DIR=$ICU_ROOT/src 913ICUDT=icudt57b 914export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 915SRC_DATA_IN=$ICU_SRC_DIR/source/data/in 916UNIDATA=$ICU_SRC_DIR/source/data/unidata 917 918Adjust unicode/c/genprops/*builder.cpp for #ifndef/#ifdef changes in _data.h files, 919see http://bugs.icu-project.org/trac/ticket/12141 920 921make install, then icutools cmake & make, then 922~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR 923 924Generate Java data as usual, only update pnames.icu & uprops.icu. 925 926*** LayoutEngine script information 927 928* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 929 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 930 in the working directory. 931 932 (It also generates ScriptRunData.cpp, which is no longer needed.) 933 934 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages 935 (a plain text file) 936 which maps ICU versions to the numbers of script/language constants 937 that were added then. 938 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.) 939 940 The generated files have a current copyright date and "@deprecated" statement. 941 942* Review changes, fix Java tool if necessary, and copy to ICU4C 943 cd ~/svn.icu4j/trunk/src 944 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 945 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout 946 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout 947 948---------------------------------------------------------------------------- *** 949 950Emoji properties added in ICU 57: http://bugs.icu-project.org/trac/ticket/11802 951 952Edit preparseucd.py to add & parse new properties. 953They share the UCD property namespace but are not listed in PropertyAliases.txt. 954 955Add emoji-data.txt to the input files, from http://www.unicode.org/Public/emoji/ 956Initial data from emoji/2.0/ 957 958ICU_ROOT=~/svn.icu/trunk 959ICU_SRC_DIR=$ICU_ROOT/src 960ICUDT=icudt56b 961export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 962SRC_DATA_IN=$ICU_SRC_DIR/source/data/in 963UNIDATA=$ICU_SRC_DIR/source/data/unidata 964 965Add binary-property constants to uchar.h enum UProperty & UProperty.java. 966 967~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20151217 $ICU_SRC_DIR ~/svn.icutools/trunk/src 968(Needs to be run after uchar.h additions, so that the new properties can be picked up by genprops.) 969 970Data structure: uprops.h/.cpp, corepropsbuilder.cpp, UCharacterProperty.java 971 972make install, then icutools cmake & make, then 973~/svn.icutools/trunk/dbg/unicode/c$ make && genprops/genprops $ICU_SRC_DIR 974 975Generate Java data as usual, only update pnames.icu & uprops.icu. 976 977---------------------------------------------------------------------------- *** 978 979Unicode 8.0 update for ICU 56 980 981* Command-line environment setup 982 983ICU_ROOT=~/svn.icu/trunk 984ICU_SRC_DIR=$ICU_ROOT/src 985ICUDT=icudt56b 986export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 987SRC_DATA_IN=$ICU_SRC_DIR/source/data/in 988UNIDATA=$ICU_SRC_DIR/source/data/unidata 989 990http://www.unicode.org/review/pri297/ -- beta review 991http://www.unicode.org/reports/uax-proposed-updates.html 992http://unicode.org/versions/beta-8.0.0.html 993http://www.unicode.org/versions/Unicode8.0.0/ 994http://www.unicode.org/reports/tr44/tr44-15.html 995 996*** ICU Trac 997 998- ticket:11574: Unicode 8 999- C++ branches/markus/uni80 at r37351 from trunk at r37343 1000- Java branches/markus/uni80 at r37352 from trunk at r37338 1001 1002*** CLDR Trac 1003 1004- cldrbug 8311: UCA 8 1005- branches/markus/uni80 at r11518 from trunk at r11517 1006 1007- cldrbug 8109: Unicode 8.0 script metadata 1008- cldrbug 8418: Updated segmentation for Unicode 8.0 1009 1010*** Unicode version numbers 1011- makedata.mak 1012- uchar.h 1013- com.ibm.icu.util.VersionInfo 1014- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 1015 1016- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 1017 so that the makefiles see the new version number. 1018 1019*** data files & enums & parser code 1020 1021* file preparation 1022 1023- download UCD & IDNA files 1024- make sure that the Unicode data folder passed into preparseucd.py 1025 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 1026- only for manual diffs: remove version suffixes from the file names 1027 ~/unidata/uni70/20140403$ ../../desuffixucd.py . 1028 (see https://sites.google.com/site/unicodetools/inputdata) 1029- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip 1030- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni80/20150415 $ICU_SRC_DIR ~/svn.icutools/trunk/src 1031- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1032 1033- also: from http://unicode.org/Public/security/8.0.0/ download new 1034 confusables.txt & confusablesWholeScript.txt 1035 and copy to $UNIDATA 1036 ~/unidata$ cp uni80/20150415/security/confusables.txt $UNIDATA 1037 ~/unidata$ cp uni80/20150415/security/confusablesWholeScript.txt $UNIDATA 1038 1039* initial preparseucd.py changes 1040- remove new Unicode scripts from the 1041 only-in-ISO-15924 list according to the error message: 1042 ValueError: remove ['Ahom', 'Hatr', 'Hluw', 'Hung', 'Mult', 'Sgnw'] 1043 from _scripts_only_in_iso15924 1044 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 1045 and in com.ibm.icu.dev.test.lang.TestUScript.java 1046- property and file name change: 1047 IndicMatraCategory -> IndicPositionalCategory 1048- UnicodeData.txt unusual numeric values (improper fractions) 1049 109F6;MEROITIC CURSIVE FRACTION ONE TWELFTH;No;0;R;;;;1/12;N;;;;; 1050 109F7;MEROITIC CURSIVE FRACTION TWO TWELFTHS;No;0;R;;;;2/12;N;;;;; 1051 109F8;MEROITIC CURSIVE FRACTION THREE TWELFTHS;No;0;R;;;;3/12;N;;;;; 1052 109F9;MEROITIC CURSIVE FRACTION FOUR TWELFTHS;No;0;R;;;;4/12;N;;;;; 1053 109FA;MEROITIC CURSIVE FRACTION FIVE TWELFTHS;No;0;R;;;;5/12;N;;;;; 1054 109FB;MEROITIC CURSIVE FRACTION SIX TWELFTHS;No;0;R;;;;6/12;N;;;;; 1055 109FC;MEROITIC CURSIVE FRACTION SEVEN TWELFTHS;No;0;R;;;;7/12;N;;;;; 1056 109FD;MEROITIC CURSIVE FRACTION EIGHT TWELFTHS;No;0;R;;;;8/12;N;;;;; 1057 109FE;MEROITIC CURSIVE FRACTION NINE TWELFTHS;No;0;R;;;;9/12;N;;;;; 1058 109FF;MEROITIC CURSIVE FRACTION TEN TWELFTHS;No;0;R;;;;10/12;N;;;;; 1059 -> change preparseucd.py to map them to proper fractions (e.g., 1/6) 1060 which are listed in DerivedNumericValues.txt; 1061 keeps storage in data file simple 1062 1063* PropertyValueAliases.txt changes 1064- 10 new Block (blk) values: 1065 blk; Ahom ; Ahom 1066 blk; Anatolian_Hieroglyphs ; Anatolian_Hieroglyphs 1067 blk; Cherokee_Sup ; Cherokee_Supplement 1068 blk; CJK_Ext_E ; CJK_Unified_Ideographs_Extension_E 1069 blk; Early_Dynastic_Cuneiform ; Early_Dynastic_Cuneiform 1070 blk; Hatran ; Hatran 1071 blk; Multani ; Multani 1072 blk; Old_Hungarian ; Old_Hungarian 1073 blk; Sup_Symbols_And_Pictographs ; Supplemental_Symbols_And_Pictographs 1074 blk; Sutton_SignWriting ; Sutton_SignWriting 1075 -> add to uchar.h 1076 use long property names for enum constants 1077 -> add to UCharacter.UnicodeBlock IDs 1078 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 1079 replace public static final int \1_ID = \2; \3 1080 -> add to UCharacter.UnicodeBlock objects 1081 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 1082 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1083- 6 new Script (sc) values: 1084 sc ; Ahom ; Ahom 1085 sc ; Hatr ; Hatran 1086 sc ; Hluw ; Anatolian_Hieroglyphs 1087 sc ; Hung ; Old_Hungarian 1088 sc ; Mult ; Multani 1089 sc ; Sgnw ; SignWriting 1090 -> all of them had been added already to uscript.h & com.ibm.icu.lang.UScript 1091 1092* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 1093 (not strictly necessary for NOT_ENCODED scripts) 1094 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt 1095 1096* generate normalization data files 1097 cd $ICU_ROOT/dbg 1098 bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource 1099 bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1100 bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1101 bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1102 bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1103 1104* build ICU (make install) 1105 so that the tools build can pick up the new definitions from the installed header files. 1106 1107 $ICU_ROOT/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt 1108 1109* build Unicode tools using CMake+make 1110 1111~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 1112 1113 # Location (--prefix) of where ICU was installed. 1114 set(ICU_INST_DIR /home/mscherer/svn.icu/trunk/inst) 1115 # Location of the ICU source tree. 1116 set(ICU_SRC_DIR /home/mscherer/svn.icu/trunk/src) 1117 1118 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c 1119 ~/svn.icutools/trunk/dbg/unicode/c$ make 1120 1121* generate core properties data files 1122- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR 1123- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR 1124- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR 1125- rebuild ICU (make install) & tools 1126- run genuca again (see step above) so that it picks up the new nfc.nrm 1127- rebuild ICU (make install) & tools 1128 1129* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1130 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1131- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1132- Unicode 6.0..8.0: U+2260, U+226E, U+226F 1133- nothing new in 8.0, no test file to update 1134 1135* run & fix ICU4C tests 1136- bad Cherokee case folding due to difference in fallbacks: 1137 UCD case folding falls back to no mapping, 1138 ICU runtime case folding falls back to lowercasing; 1139 fixed casepropsbuilder.cpp to generate scf mappings to self 1140 when there is an slc mapping but no scf 1141- Andy handles RBBI & spoof check test failures 1142 1143* collation: CLDR collation root, UCA DUCET 1144 1145- UCA DUCET goes into Mark's Unicode tools, see 1146 https://sites.google.com/site/unicodetools/home#TOC-UCA 1147- CLDR root data files are checked into (CLDR UCA branch)/common/uca/ 1148- cd (CLDR UCA branch)/common/uca/ 1149- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1150 cp FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt 1151- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1152 cp $ICU_SRC_DIR/source/data/unidata/UCARules.txt /tmp/UCARules-old.txt 1153 (note removing the underscore before "Rules") 1154 cp UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt 1155- restore TODO diffs in UCARules.txt 1156 meld /tmp/UCARules-old.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt 1157- update (ICU4C)/source/test/testdata/CollationTest_*.txt 1158 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1159 from the CLDR root files (..._CLDR_..._SHORT.txt) 1160 cp CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt 1161 cp CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt 1162 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data 1163- if CLDR common/uca/unihan-index.txt changes, then update 1164 CLDR common/collation/root.xml <collation type="private-unihan"> 1165 and regenerate (or update in parallel) $ICU_SRC_DIR/source/data/coll/root.txt 1166- run genuca, see command line above; 1167 deal with 1168 Error: Unknown script for first-primary sample character U+07d8 on line 23005 of /home/mscherer/svn.icu/trunk/src/source/data/unidata/FractionalUCA.txt 1169 (add the character to genuca.cpp sampleCharsToScripts[]) 1170 + look up the script for the new sample characters 1171 (e.g., in FractionalUCA.txt) 1172 + *add* mappings to sampleCharsToScripts[], do not replace them 1173 (in case the script sample characters flip-flop) 1174 + insert new scripts in DUCET script order, see the top_byte table 1175 at the beginning of FractionalUCA.txt 1176- rebuild ICU4C 1177 1178* run & fix ICU4C tests, now with new CLDR collation root data 1179- run all tests with the collation test data *_SHORT.txt or the full files 1180 (the full ones have comments, useful for debugging) 1181- note on intltest: if collate/UCAConformanceTest fails, then 1182 utility/MultithreadTest/TestCollators will fail as well; 1183 fix the conformance test before looking into the multi-thread test 1184- fixed bug in CollationWeights::getWeightRanges() 1185 exposed by new data and CollationTest::TestRootElements 1186 1187* update Java data files 1188- refresh just the UCD/UCA-related/derived files, just to be safe 1189- see (ICU4C)/source/data/icu4j-readme.txt 1190- mkdir /tmp/icu4j 1191- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1192 output: 1193 ... 1194 Unicode .icu files built to ./out/build/icudt56l 1195 echo timestamp > uni-core-data 1196 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt56b 1197 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b 1198 echo pnames.icu uprops.icu ucase.icu ubidi.icu nfc.nrm > ./out/icu4j/add.txt 1199 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt56l.dat ./out/icu4j/icudt56b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt56l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt56b 1200 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt56b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt56b" 1201 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt56b/ 1202 mkdir -p /tmp/icu4j/main/shared/data 1203 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1204 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt56b/ 1205 mkdir -p /tmp/icu4j/main/shared/data 1206 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1207 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/dbg/data' 1208- copy the big-endian Unicode data files to another location, 1209 separate from the other data files, 1210 and then refresh ICU4J 1211 cd ~/svn.icu/trunk/dbg/data/out/icu4j 1212 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 1213 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 1214 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 1215 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 1216 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu 1217 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 1218 cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 1219 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 1220 jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 1221 1222* When refreshing all of ICU4J data from ICU4C 1223- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1224- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1225or 1226- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1227 1228* update CollationFCD.java 1229 + copy & paste the initializers of lcccIndex[] etc. from 1230 ICU4C/source/i18n/collationfcd.cpp to 1231 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java 1232 1233* refresh Java test .txt files 1234- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1235 cd $ICU_SRC_DIR/source/data/unidata 1236 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 1237 cd ../../test/testdata 1238 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 1239 cp ~/unidata/uni80/20150415/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 1240 1241* run & fix ICU4J tests 1242 1243*** LayoutEngine script information 1244 1245* ICU 56: Modify ScriptIDModuleWriter.java to not output @stable tags any more, 1246 because the layout engine was deprecated in ICU 54. 1247 Modify ScriptIDModuleWriter.java and ScriptTagModuleWriter.java 1248 to write lines that we used to add manually. 1249 1250* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 1251 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 1252 in the working directory. 1253 1254 (It also generates ScriptRunData.cpp, which is no longer needed.) 1255 1256 It also reads and regenerates tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguages 1257 (a plain text file) 1258 which maps ICU versions to the numbers of script/language constants 1259 that were added then. 1260 (This mapping is probably obsolete since we do not print "@stable ICU xy" any more.) 1261 1262 The generated files have a current copyright date and "@deprecated" statement. 1263 1264* Review changes, fix Java tool if necessary, and copy to ICU4C 1265 cd ~/svn.icu4j/trunk/src 1266 meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 1267 cp tools/misc/src/com/ibm/icu/dev/tool/layout/*.h $ICU_SRC_DIR/source/layout 1268 cp tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptAndLanguageTags.cpp $ICU_SRC_DIR/source/layout 1269 1270*** API additions 1271- send notice to icu-design about new born-@stable API (enum constants etc.) 1272 1273*** merge the Unicode update branches back onto the trunk 1274- do not merge the icudata.jar and testdata.jar, 1275 instead rebuild them from merged & tested ICU4C 1276- make sure that changes to Unicode tools & ICU tools are checked in 1277 http://www.unicode.org/utility/trac/log/trunk/unicodetools 1278 http://bugs.icu-project.org/trac/log/tools/trunk 1279 1280---------------------------------------------------------------------------- *** 1281 1282Unicode 7.0 update for ICU 54 1283 1284http://www.unicode.org/review/pri271/ -- beta review 1285http://www.unicode.org/reports/uax-proposed-updates.html 1286http://www.unicode.org/versions/beta-7.0.0.html#notable_issues 1287http://www.unicode.org/reports/tr44/tr44-13.html 1288 1289*** ICU Trac 1290 1291- ticket 10821: Unicode 7.0, UCA 7.0 1292- C++ branches/markus/uni70 at r35584 from trunk at r35580 1293- Java branches/markus/uni70 at r35587 from trunk at r35545 1294 1295*** CLDR Trac 1296 1297- ticket 7195: UCA 7.0 CLDR root collation 1298- branches/markus/uni70 at r10062 from trunk at r10061 1299 1300- ticket 6762: script metadata for Unicode 7.0 new scripts 1301 1302*** Unicode version numbers 1303- makedata.mak 1304- uchar.h 1305- com.ibm.icu.util.VersionInfo 1306- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 1307 1308- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 1309 so that the makefiles see the new version number. 1310 1311*** data files & enums & parser code 1312 1313* file preparation 1314 1315- download UCD & IDNA files 1316- make sure that the Unicode data folder passed into preparseucd.py 1317 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 1318- only for manual diffs: remove version suffixes from the file names 1319 ~/unidata/uni70/20140403$ ../../desuffixucd.py . 1320 (see https://sites.google.com/site/unicodetools/inputdata) 1321- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip 1322- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src 1323- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1324- Restore TODO diffs in source/data/unidata/UCARules.txt 1325 cd $ICU_SRC_DIR 1326 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt 1327- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt 1328 1329- also: from http://unicode.org/Public/security/7.0.0/ download new 1330 confusables.txt & confusablesWholeScript.txt 1331 and copy to $ICU_ROOT/src/source/data/unidata/ 1332 1333* initial preparseucd.py changes 1334- remove new Unicode scripts from the 1335 only-in-ISO-15924 list according to the error message: 1336 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass', 1337 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm', 1338 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj'] 1339 from _scripts_only_in_iso15924 1340 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 1341 and in com.ibm.icu.dev.test.lang.TestUScript.java 1342- NamesList.txt now has a heading with a non-ASCII character 1343 + keep ppucd.txt in platform charset, rather than changing tool/test parsers 1344 + escape non-ASCII characters in heading comments 1345- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013 1346 + get the copyright from the first file whose copyright line contains the current year 1347 1348* PropertyValueAliases.txt changes 1349- 32 new Block (blk) values: 1350 blk; Bassa_Vah ; Bassa_Vah 1351 blk; Caucasian_Albanian ; Caucasian_Albanian 1352 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers 1353 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended 1354 blk; Duployan ; Duployan 1355 blk; Elbasan ; Elbasan 1356 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended 1357 blk; Grantha ; Grantha 1358 blk; Khojki ; Khojki 1359 blk; Khudawadi ; Khudawadi 1360 blk; Latin_Ext_E ; Latin_Extended_E 1361 blk; Linear_A ; Linear_A 1362 blk; Mahajani ; Mahajani 1363 blk; Manichaean ; Manichaean 1364 blk; Mende_Kikakui ; Mende_Kikakui 1365 blk; Modi ; Modi 1366 blk; Mro ; Mro 1367 blk; Myanmar_Ext_B ; Myanmar_Extended_B 1368 blk; Nabataean ; Nabataean 1369 blk; Old_North_Arabian ; Old_North_Arabian 1370 blk; Old_Permic ; Old_Permic 1371 blk; Ornamental_Dingbats ; Ornamental_Dingbats 1372 blk; Pahawh_Hmong ; Pahawh_Hmong 1373 blk; Palmyrene ; Palmyrene 1374 blk; Pau_Cin_Hau ; Pau_Cin_Hau 1375 blk; Psalter_Pahlavi ; Psalter_Pahlavi 1376 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls 1377 blk; Siddham ; Siddham 1378 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers 1379 blk; Sup_Arrows_C ; Supplemental_Arrows_C 1380 blk; Tirhuta ; Tirhuta 1381 blk; Warang_Citi ; Warang_Citi 1382 -> add to uchar.h 1383 use long property names for enum constants 1384 -> add to UCharacter.UnicodeBlock IDs 1385 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 1386 replace public static final int \1_ID = \2; \3 1387 -> add to UCharacter.UnicodeBlock objects 1388 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 1389 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1390- 28 new Joining_Group (jg) values: 1391 jg ; Manichaean_Aleph ; Manichaean_Aleph 1392 jg ; Manichaean_Ayin ; Manichaean_Ayin 1393 jg ; Manichaean_Beth ; Manichaean_Beth 1394 jg ; Manichaean_Daleth ; Manichaean_Daleth 1395 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh 1396 jg ; Manichaean_Five ; Manichaean_Five 1397 jg ; Manichaean_Gimel ; Manichaean_Gimel 1398 jg ; Manichaean_Heth ; Manichaean_Heth 1399 jg ; Manichaean_Hundred ; Manichaean_Hundred 1400 jg ; Manichaean_Kaph ; Manichaean_Kaph 1401 jg ; Manichaean_Lamedh ; Manichaean_Lamedh 1402 jg ; Manichaean_Mem ; Manichaean_Mem 1403 jg ; Manichaean_Nun ; Manichaean_Nun 1404 jg ; Manichaean_One ; Manichaean_One 1405 jg ; Manichaean_Pe ; Manichaean_Pe 1406 jg ; Manichaean_Qoph ; Manichaean_Qoph 1407 jg ; Manichaean_Resh ; Manichaean_Resh 1408 jg ; Manichaean_Sadhe ; Manichaean_Sadhe 1409 jg ; Manichaean_Samekh ; Manichaean_Samekh 1410 jg ; Manichaean_Taw ; Manichaean_Taw 1411 jg ; Manichaean_Ten ; Manichaean_Ten 1412 jg ; Manichaean_Teth ; Manichaean_Teth 1413 jg ; Manichaean_Thamedh ; Manichaean_Thamedh 1414 jg ; Manichaean_Twenty ; Manichaean_Twenty 1415 jg ; Manichaean_Waw ; Manichaean_Waw 1416 jg ; Manichaean_Yodh ; Manichaean_Yodh 1417 jg ; Manichaean_Zayin ; Manichaean_Zayin 1418 jg ; Straight_Waw ; Straight_Waw 1419 -> uchar.h & UCharacter.JoiningGroup 1420- 23 new Script (sc) values: 1421 sc ; Aghb ; Caucasian_Albanian 1422 sc ; Bass ; Bassa_Vah 1423 sc ; Dupl ; Duployan 1424 sc ; Elba ; Elbasan 1425 sc ; Gran ; Grantha 1426 sc ; Hmng ; Pahawh_Hmong 1427 sc ; Khoj ; Khojki 1428 sc ; Lina ; Linear_A 1429 sc ; Mahj ; Mahajani 1430 sc ; Mani ; Manichaean 1431 sc ; Mend ; Mende_Kikakui 1432 sc ; Modi ; Modi 1433 sc ; Mroo ; Mro 1434 sc ; Narb ; Old_North_Arabian 1435 sc ; Nbat ; Nabataean 1436 sc ; Palm ; Palmyrene 1437 sc ; Pauc ; Pau_Cin_Hau 1438 sc ; Perm ; Old_Permic 1439 sc ; Phlp ; Psalter_Pahlavi 1440 sc ; Sidd ; Siddham 1441 sc ; Sind ; Khudawadi 1442 sc ; Tirh ; Tirhuta 1443 sc ; Wara ; Warang_Citi 1444 -> uscript.h (many were added before) 1445 comment "Mende Kikakui" for USCRIPT_MENDE 1446 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias 1447 -> com.ibm.icu.lang.UScript 1448 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1449 replace public static final int \1 = \2; \3 1450- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1451 (added 2012-11-01) 1452 Ahom 338 Ahom 1453 Hatr 127 Hatran 1454 Mult 323 Multani 1455 (added 2013-10-12) 1456 Modi 324 Modi 1457 Pauc 263 Pau Cin Hau 1458 Sidd 302 Siddham 1459 -> uscript.h (some overlap with additions from Unicode) 1460 -> com.ibm.icu.lang.UScript 1461 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1462 replace public static final int \1 = \2; \3 1463 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924 1464 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1465 and in com.ibm.icu.dev.test.lang.TestUScript.java 1466 1467* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 1468 (not strictly necessary for NOT_ENCODED scripts) 1469 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt 1470 1471* generate normalization data files 1472- cd $ICU_ROOT/dbg 1473- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 1474- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in 1475- UNIDATA=$ICU_SRC_DIR/source/data/unidata 1476- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource 1477- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1478- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1479- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1480- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1481 1482* build ICU (make install) 1483 so that the tools build can pick up the new definitions from the installed header files. 1484 1485~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt 1486 1487* build Unicode tools using CMake+make 1488 1489~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 1490 1491# Location (--prefix) of where ICU was installed. 1492set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst) 1493# Location of the ICU source tree. 1494set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src) 1495 1496~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c 1497~/svn.icutools/trunk/dbg/unicode/c$ make 1498 1499* genprops work 1500- new code point range for Joining_Group values: 10AC0..10AFF Manichaean 1501 + add second array of Joining_Group values for at most 10800..10FFF 1502 icutools: unicode/c/genprops/bidipropsbuilder.cpp 1503 icu: source/common/ubidi_props.h/.c/_data.h 1504 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java 1505 1506* generate core properties data files 1507- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR 1508- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR 1509- rebuild ICU (make install) & tools 1510- run genuca again (see step above) so that it picks up the new nfc.nrm 1511- rebuild ICU (make install) & tools 1512 1513* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1514 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1515- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1516- Unicode 6.0..7.0: U+2260, U+226E, U+226F 1517- nothing new in 7.0, no test file to update 1518 1519* run & fix ICU4C tests 1520 1521* update Java data files 1522- refresh just the UCD-related files, just to be safe 1523- see (ICU4C)/source/data/icu4j-readme.txt 1524- mkdir /tmp/icu4j 1525- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1526 output: 1527 ... 1528 Unicode .icu files built to ./out/build/icudt53l 1529 echo timestamp > uni-core-data 1530 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b 1531 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b 1532 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1533 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b 1534 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b" 1535 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/ 1536 mkdir -p /tmp/icu4j/main/shared/data 1537 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1538 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/ 1539 mkdir -p /tmp/icu4j/main/shared/data 1540 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1541 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data' 1542- copy the big-endian Unicode data files to another location, 1543 separate from the other data files 1544 ICUDT=icudt54b 1545 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 1546 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 1547 cd ~/svn.icu/uni70/dbg/data/out/icu4j 1548 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 1549 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 1550 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu 1551 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 1552 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 1553 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 1554- refresh ICU4J 1555 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 1556 1557* update CollationFCD.java 1558 + copy & paste the initializers of lcccIndex[] etc. from 1559 ICU4C/source/i18n/collationfcd.cpp to 1560 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java 1561 1562* refresh Java test .txt files 1563- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1564 cd $ICU_SRC_DIR/source/data/unidata 1565 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 1566 cd ../../test/testdata 1567 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 1568 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 1569 1570* UCA 1571 1572- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/ 1573- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata) 1574- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/ 1575- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA 1576- output files are in ~/svn.unitools/Generated/uca/7.0.0/ 1577- review data; compare files, use blankweights.sed or similar 1578 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt 1579- cd ~/svn.unitools/Generated/uca/7.0.0/ 1580- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1581 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt 1582- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1583 (note removing the underscore before "Rules") 1584 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt 1585- update (ICU4C)/source/test/testdata/CollationTest_*.txt 1586 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1587 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1588 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt 1589 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt 1590 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data 1591- run genuca, see command line above 1592- rebuild ICU4C 1593- refresh ICU4J collation data: 1594 (subset of instructions above for properties data refresh, except copies all coll/*) 1595 ICUDT=icudt54b 1596 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1597 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 1598 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 1599 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 1600- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 1601- note on intltest: if collate/UCAConformanceTest fails, then 1602 utility/MultithreadTest/TestCollators will fail as well; 1603 fix the conformance test before looking into the multi-thread test 1604- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors 1605- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch 1606 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/ 1607 1608* When refreshing all of ICU4J data from ICU4C 1609- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1610- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1611or 1612- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1613 1614* run & fix ICU4J tests 1615 1616*** LayoutEngine script information 1617 1618(For details see the Unicode 5.2 change log below.) 1619 1620* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 1621 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 1622 in the working directory. 1623 (It also generates ScriptRunData.cpp, which is no longer needed.) 1624 1625 The generated files have a current copyright date and "@stable" statement. 1626 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java 1627 for "born stable" Unicode API constants, and to stop parsing ICU version numbers 1628 which may not contain dots any more. 1629 1630- diff current <icu>/source/layout files vs. generated ones 1631 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 1632 review and manually merge desired changes; 1633 fix gratuitous changes, incorrect @draft/@stable and missing aliases; 1634 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 1635- if you just copy the above files, then 1636 fix mixed line endings, review the diffs as above and restore changes to API tags etc.; 1637 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1638 1639*** API additions 1640- send notice to icu-design about new born-@stable API (enum constants etc.) 1641 1642*** merge the Unicode update branches back onto the trunk 1643- do not merge the icudata.jar and testdata.jar, 1644 instead rebuild them from merged & tested ICU4C 1645 1646---------------------------------------------------------------------------- *** 1647 1648Unicode 6.3 update 1649 1650http://www.unicode.org/review/pri249/ -- beta review 1651http://www.unicode.org/reports/uax-proposed-updates.html 1652http://www.unicode.org/versions/beta-6.3.0.html#notable_issues 1653http://www.unicode.org/reports/tr44/tr44-11.html 1654 1655*** ICU Trac 1656 1657- ticket 10128: update ICU to Unicode 6.3 beta 1658- ticket 10168: update ICU to Unicode 6.3 final 1659- C++ branches/markus/uni63 at r33552 from trunk at r33551 1660- Java branches/markus/uni63 at r33550 from trunk at r33553 1661 1662- ticket 10142: implement Unicode 6.3 bidi algorithm additions 1663 1664*** Unicode version numbers 1665- makedata.mak 1666- uchar.h 1667 (configure.in & configure: have been modified to extract the version from uchar.h) 1668- com.ibm.icu.util.VersionInfo 1669- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 1670 1671- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 1672 so that the makefiles see the new version number. 1673 1674*** data files & enums & parser code 1675 1676* file preparation 1677 1678- download UCD, UCA & IDNA files 1679- make sure that the Unicode data folder passed into preparseucd.py 1680 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 1681- modify preparseucd.py: 1682 parse new file BidiBrackets.txt 1683 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type 1684- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src 1685- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1686- Check test file diffs for previously commented-out, known-failing data lines; 1687 probably need to keep those commented out. 1688 1689* PropertyAliases.txt changes 1690- 1 new Enumerated Property 1691 bpt ; Bidi_Paired_Bracket_Type 1692 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType 1693 -> ubidi_props.h & .c & UBiDiProps.java 1694 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX 1695 -> uprops.cpp 1696 -> change ubidi.icu format version from 2.0 to 2.1 1697- 1 new Miscellaneous Property 1698 bpb ; Bidi_Paired_Bracket 1699 -> uchar.h & UProperty.java 1700 -> ppucd.h & .cpp 1701 1702* PropertyValueAliases.txt changes 1703- 3 Bidi_Paired_Bracket_Type (bpt) values: 1704 bpt; c ; Close 1705 bpt; n ; None 1706 bpt; o ; Open 1707 -> uchar.h & UCharacter.BidiPairedBracketType 1708 -> ubidi_props.h & .c & UBiDiProps.java 1709 -> change ubidi.icu format version from 2.0 to 2.1 1710- 4 new Bidi_Class (bc) values: 1711 bc ; FSI ; First_Strong_Isolate 1712 bc ; LRI ; Left_To_Right_Isolate 1713 bc ; RLI ; Right_To_Left_Isolate 1714 bc ; PDI ; Pop_Directional_Isolate 1715 -> uchar.h & UCharacterEnums.ECharacterDirection 1716 -> until the bidi code gets updated, 1717 Roozbeh suggests mapping the new bc values to ON (Other_Neutral) 1718- 3 new Word_Break (WB) values: 1719 WB ; HL ; Hebrew_Letter 1720 WB ; SQ ; Single_Quote 1721 WB ; DQ ; Double_Quote 1722 -> uchar.h & UCharacter.WordBreak 1723 -> first time Word_Break numeric constants exceed 4 bits (now 17 values) 1724- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1725 (added 2012-10-16) 1726 Aghb 239 Caucasian Albanian 1727 Mahj 314 Mahajani 1728 -> uscript.h 1729 -> com.ibm.icu.lang.UScript 1730 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1731 replace public static final int \1 = \2;\3 1732 -> preparseucd.py _scripts_only_in_iso15924 1733 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1734 and in com.ibm.icu.dev.test.lang.TestUScript.java 1735 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 1736 (not strictly necessary for NOT_ENCODED scripts) 1737 1738* generate normalization data files 1739- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib 1740- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in 1741- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata 1742- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1743- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1744- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1745- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1746 1747* build ICU (make install) 1748 so that the tools build can pick up the new definitions from the installed header files. 1749 1750~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt 1751 1752* build Unicode tools using CMake+make 1753 1754~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 1755 1756# Location (--prefix) of where ICU was installed. 1757set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst) 1758# Location of the ICU source tree. 1759set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src) 1760 1761~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c 1762~/svn.icutools/trunk/dbg/unicode/c$ make 1763 1764* generate core properties data files 1765- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src 1766- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src 1767- rebuild ICU (make install) & tools 1768- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 1769- rebuild ICU (make install) & tools 1770 1771* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1772 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1773- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1774- Unicode 6.0..6.3: U+2260, U+226E, U+226F 1775- nothing new in 6.3, no test file to update 1776 1777* update Java data files 1778- refresh just the UCD-related files, just to be safe 1779- see (ICU4C)/source/data/icu4j-readme.txt 1780- mkdir /tmp/icu4j 1781- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1782 output: 1783 ... 1784 Unicode .icu files built to ./out/build/icudt52l 1785 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b 1786 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b 1787 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1788 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b 1789 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b" 1790 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/ 1791 mkdir -p /tmp/icu4j/main/shared/data 1792 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1793 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/ 1794 mkdir -p /tmp/icu4j/main/shared/data 1795 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1796 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data' 1797- copy the big-endian Unicode data files to another location, 1798 separate from the other data files 1799 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1800 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr 1801 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b 1802 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu 1803 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b 1804 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1805 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr 1806- refresh ICU4J 1807 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b 1808 1809* refresh Java test .txt files 1810- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1811 1812* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files 1813 1814- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 1815- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 1816- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1817- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1818 (note removing the underscore before "Rules") 1819- update (ICU4C)/source/test/testdata/CollationTest_*.txt 1820 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1821 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1822- check test file diffs for previously commented-out, known-failing data lines; 1823 probably need to keep those commented out 1824- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 1825- run genuca, see command line above 1826- rebuild ICU4C 1827- refresh ICU4J collation data: 1828 (subset of instructions above for properties data refresh, except copies all coll/*) 1829 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1830 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1831 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 1832 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b 1833- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 1834- note on intltest: if collate/UCAConformanceTest fails, then 1835 utility/MultithreadTest/TestCollators will fail as well; 1836 fix the conformance test before looking into the multi-thread test 1837 1838* test ICU, fix test code where necessary 1839 1840* When refreshing all of ICU4J data from ICU4C 1841- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1842- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1843or 1844- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1845 1846*** LayoutEngine script information 1847- skipped for Unicode 6.3: no new scripts 1848 1849*** merge the Unicode update branches back onto the trunk 1850- do not merge the icudata.jar and testdata.jar, 1851 instead rebuild them from merged & tested ICU4C 1852 1853---------------------------------------------------------------------------- *** 1854 1855Unicode 6.2 update 1856 1857http://www.unicode.org/review/pri230/ 1858http://www.unicode.org/versions/beta-6.2.0.html 1859http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0 1860http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values 1861http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol 1862http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols 1863http://www.unicode.org/reports/tr46/tr46-8.html IDNA 1864http://unicode.org/Public/idna/6.2.0/ 1865 1866*** ICU Trac 1867 1868- ticket 9515: Unicode 6.2: final ICU update 1869 1870- ticket 9514: UCA 6.2: fix UCARules.txt 1871 1872- ticket 9437: update ICU to Unicode 6.2 1873- C++ branches/markus/uni62 at r32050 from trunk at r32041 1874- Java branches/markus/uni62 at r32068 from trunk at r32066 1875 1876*** Unicode version numbers 1877- makedata.mak 1878- uchar.h 1879 (configure.in & configure: have been modified to extract the version from uchar.h) 1880- com.ibm.icu.util.VersionInfo 1881- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 1882 1883*** data files & enums & parser code 1884 1885* file preparation 1886 1887- download UCD, UCA & IDNA files 1888- make sure that the Unicode data folder passed into preparseucd.py 1889 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 1890- modify preparseucd.py: NamesList.txt is now in UTF-8 1891- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src 1892- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 1893- Check test file diffs for previously commented-out, known-failing data lines; 1894 probably need to keep those commented out. 1895 1896* PropertyValueAliases.txt changes 1897- 1 new Line_Break (lb) value: 1898 lb ; RI ; Regional_Indicator 1899 -> uchar.h & UCharacter.LineBreak 1900- 1 new Word_Break (WB) value: 1901 WB ; RI ; Regional_Indicator 1902 -> uchar.h & UCharacter.WordBreak 1903- 1 new Grapheme_Cluster_Break (GCB) value: 1904 GCB; RI ; Regional_Indicator 1905 -> uchar.h & UCharacter.GraphemeClusterBreak 1906 1907* 3 new numeric values 1908 The new value -1, which was really supposed to be NaN but that would have required 1909 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1, 1910 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed. 1911 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1 1912 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1 1913 The two new values 216000 and 432000 require an addition to the encoding of numeric values. 1914 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000 1915 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000 1916 -> uprops.h, uchar.c & UCharacterProperty.java 1917 -> cucdtst.c & UCharacterTest.java 1918 1919* generate normalization data files 1920- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib 1921- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in 1922- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata 1923- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 1924- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 1925- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 1926- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 1927 1928* build ICU (make install) 1929 so that the tools build can pick up the new definitions from the installed header files. 1930* build Unicode tools using CMake+make 1931 1932* generate core properties data files 1933- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src 1934- in initial bootstrapping, change the UCA version 1935 in source/data/unidata/FractionalUCA.txt to match the new Unicode version 1936- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src 1937- rebuild ICU (make install) & tools 1938 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 1939 check if the UCA version in FractionalUCA.txt matches the new Unicode version 1940 (see step above) 1941- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 1942- rebuild ICU (make install) & tools 1943 1944* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1945 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1946- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1947- Unicode 6.0..6.2: U+2260, U+226E, U+226F 1948- nothing new in 6.2, no test file to update 1949 1950* update Java data files 1951- refresh just the UCD-related files, just to be safe 1952- see (ICU4C)/source/data/icu4j-readme.txt 1953- mkdir /tmp/icu4j 1954- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1955 output: 1956 ... 1957 Unicode .icu files built to ./out/build/icudt50l 1958 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b 1959 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b 1960 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1961 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b 1962 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b" 1963 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/ 1964 mkdir -p /tmp/icu4j/main/shared/data 1965 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1966 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/ 1967 mkdir -p /tmp/icu4j/main/shared/data 1968 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1969 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data' 1970- copy the big-endian Unicode data files to another location, 1971 separate from the other data files 1972 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 1973 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 1974 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 1975 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu 1976 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 1977 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 1978 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 1979- refresh ICU4J 1980 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 1981 1982* refresh Java test .txt files 1983- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1984 1985* UCA 1986 1987- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 1988- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 1989- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1990- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1991 (note removing the underscore before "Rules") 1992- update (ICU4C)/source/test/testdata/CollationTest_*.txt 1993 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1994 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1995- check test file diffs for previously commented-out, known-failing data lines; 1996 probably need to keep those commented out 1997- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 1998- run genuca, see command line above 1999- rebuild ICU4C 2000- refresh ICU4J collation data: 2001 (subset of instructions above for properties data refresh, except copies all coll/*) 2002 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2003 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 2004 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 2005 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 2006- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 2007- note on intltest: if collate/UCAConformanceTest fails, then 2008 utility/MultithreadTest/TestCollators will fail as well; 2009 fix the conformance test before looking into the multi-thread test 2010 2011* test ICU, fix test code where necessary 2012 2013* When refreshing all of ICU4J data from ICU4C 2014- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2015- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 2016or 2017- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 2018 2019*** LayoutEngine script information 2020- skipped for Unicode 6.2: no new scripts 2021 2022*** merge the Unicode update branches back onto the trunk 2023- do not merge the icudata.jar and testdata.jar, 2024 instead rebuild them from merged & tested ICU4C 2025 2026---------------------------------------------------------------------------- *** 2027 2028Future Unicode update 2029 2030Tools simplified since the Unicode 6.1 update. See 2031- http://site.icu-project.org/design/props/ppucd 2032- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972 2033 2034* Unicode version numbers 2035- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates 2036 2037* file preparation 2038- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py: 2039- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src 2040- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 2041- Check test file diffs for previously commented-out, known-failing data lines; 2042 probably need to keep those commented out. 2043 2044* PropertyValueAliases.txt changes 2045- Script codes that are in ISO 15924 but not in Unicode are now listed in 2046 preparseucd.py, in the _scripts_only_in_iso15924 variable. 2047 If there are new ISO codes, then add them. 2048 If Unicode adds some of them, then remove them from the .py variable. 2049 2050* UnicodeData.txt changes 2051- No more manual changes for CJK ranges for algorithmic names; 2052 those are now written to ppucd.txt and genprops reads them from there. 2053 2054* generate core properties data files (makeprops.sh was deleted) 2055- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src 2056 2057* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt 2058- it is now generated by preparseucd.py 2059 2060* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt 2061- it is now generated by preparseucd.py 2062- make sure that the Unicode data folder passed into preparseucd.py 2063 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 2064 (can be in some subfolder) 2065 2066* generate normalization data files 2067- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib 2068- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in 2069- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata 2070- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 2071- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 2072- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 2073- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 2074 2075* build ICU (make install) 2076* build Unicode tools using CMake+make 2077 2078* new way to call genuca (makeuca.sh was deleted) 2079- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src 2080 2081---------------------------------------------------------------------------- *** 2082 2083Unicode 6.1 update 2084 2085*** ICU Trac 2086 2087- ticket 8995 final update to Unicode 6.1 2088- ticket 8994 regenerate source/layout/CanonData.cpp 2089 2090- ticket 8961 support Unicode "Age" value *names* 2091- ticket 8963 support multiple character name aliases & types 2092 2093- ticket 8827 "update ICU to Unicode 6.1" 2094- C++ branches/markus/uni61 at r30864 from trunk at r30843 2095- Java branches/markus/uni61 at r30865 from trunk at r30863 2096 2097*** Unicode version numbers 2098- makedata.mak 2099- uchar.h 2100 (configure.in & configure: have been modified to extract the version from uchar.h) 2101- com.ibm.icu.util.VersionInfo 2102- icutools/unicode/makedefs.sh 2103 + also review & update other definitions in that file, 2104 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l 2105 2106*** data files & enums & parser code 2107 2108* file preparation 2109 2110~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed 2111- This prepares both unidata and testdata files in respective output subfolders. 2112- Check test file diffs for previously commented-out, known-failing data lines; 2113 probably need to keep those commented out. 2114 2115* PropertyValueAliases.txt changes 2116- 11 new block names: 2117 Arabic_Extended_A 2118 Arabic_Mathematical_Alphabetic_Symbols 2119 Chakma 2120 Meetei_Mayek_Extensions 2121 Meroitic_Cursive 2122 Meroitic_Hieroglyphs 2123 Miao 2124 Sharada 2125 Sora_Sompeng 2126 Sundanese_Supplement 2127 Takri 2128 -> add to uchar.h 2129 -> add to UCharacter.UnicodeBlock IDs 2130 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 2131 replace public static final int \1_ID = \2; \3 2132 -> add to UCharacter.UnicodeBlock objects 2133 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 2134 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 2135- 1 new Joining_Group (jg) value: 2136 Rohingya_Yeh 2137 -> uchar.h & UCharacter.JoiningGroup 2138- 2 new Line_Break (lb) values: 2139 CJ=Conditional_Japanese_Starter 2140 HL=Hebrew_Letter 2141 -> uchar.h & UCharacter.LineBreak 2142- 7 new scripts: 2143 sc ; Cakm ; Chakma 2144 sc ; Merc ; Meroitic_Cursive 2145 sc ; Mero ; Meroitic_Hieroglyphs 2146 sc ; Plrd ; Miao 2147 sc ; Shrd ; Sharada 2148 sc ; Sora ; Sora_Sompeng 2149 sc ; Takr ; Takri 2150 -> remove these from SyntheticPropertyValueAliases.txt 2151 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 2152 and in com.ibm.icu.dev.test.lang.TestUScript.java 2153- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 2154 (added 2011-06-21) 2155 Khoj 322 Khojki 2156 Tirh 326 Tirhuta 2157 and another one added 2011-12-09 2158 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) 2159 -> uscript.h 2160 -> com.ibm.icu.lang.UScript 2161 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 2162 replace public static final int \1 = \2;\3 2163 -> SyntheticPropertyValueAliases.txt 2164 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 2165 and in com.ibm.icu.dev.test.lang.TestUScript.java 2166 2167* UnicodeData.txt changes 2168- the last Unihan code point changes from U+9FCB to U+9FCC 2169 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive) 2170 + do change gennames.c 2171 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java 2172 2173* DerivedBidiClass.txt changes 2174- 2 new default-AL blocks: 2175# Arabic Extended-A: U+08A0 - U+08FF (was default-R) 2176# Arabic Mathematical Alphabetic Symbols: 2177# U+1EE00 - U+1EEFF (was default-R) 2178- 2 new default-R blocks: 2179# Meroitic Hieroglyphs: 2180# U+10980 - U+1099F 2181# Meroitic Cursive: U+109A0 - U+109FF 2182 -> should be picked up by the explicit data in the file 2183 2184* NameAliases.txt changes 2185- from 2186 # Each line has two fields 2187 # First field: Code point 2188 # Second field: Alias 2189- to 2190 # Each line has three fields, as described here: 2191 # 2192 # First field: Code point 2193 # Second field: Alias 2194 # Third field: Type 2195- Also, the file previously allowed multiple aliases but only now does it 2196 actually provide multiple, even multiple of the same type. For example, 2197 FEFF;BYTE ORDER MARK;alternate 2198 FEFF;BOM;abbreviation 2199 FEFF;ZWNBSP;abbreviation 2200- This breaks our gennames parser, unames.icu data structure, and API. 2201 Fix gennames to only pick up "correction" aliases. 2202 New ticket #8963 for further changes. 2203 2204* run genpname/preparse.pl (on Linux) 2205 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 2206 + make sure that data.h is writable 2207 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 2208 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 2209 2210* build ICU (make install) 2211 so that the tools build can pick up the new definitions from the installed header files. 2212* build Unicode tools (at least genpname) using CMake+make 2213 2214* run genpname 2215 (builds both pnames.icu and propname_data.h) 2216- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 2217- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 2218 2219* build ICU (make install) 2220* build Unicode tools using CMake+make 2221 2222* update source/data/unidata/norm2/nfkc_cf.txt 2223- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 2224 2225* update source/data/unidata/norm2/uts46.txt 2226- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 2227 to ~/svn.icu/tools/trunk/src/unicode/py 2228- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008". 2229- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 2230- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 2231 2232* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 2233 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 2234- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 2235- Unicode 6.0..6.1: U+2260, U+226E, U+226F 2236- nothing new in 6.1, no test file to update 2237 2238* generate core properties data files 2239- in initial bootstrapping, change the UCA version 2240 in source/data/unidata/FractionalUCA.txt to match the new Unicode version 2241- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 2242- rebuild ICU & tools 2243 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 2244 check if the UCA version in FractionalUCA.txt matches the new Unicode version 2245 (see step above) 2246- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm: 2247 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 2248- rebuild ICU & tools 2249 2250* update Java data files 2251- refresh just the UCD-related files, just to be safe 2252- see (ICU4C)/source/data/icu4j-readme.txt 2253- mkdir /tmp/icu4j 2254- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2255 output: 2256 ... 2257 Unicode .icu files built to ./out/build/icudt49l 2258 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b 2259 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b 2260 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 2261 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b 2262 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b" 2263 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/ 2264 mkdir -p /tmp/icu4j/main/shared/data 2265 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 2266 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/ 2267 mkdir -p /tmp/icu4j/main/shared/data 2268 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 2269 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data' 2270- copy the big-endian Unicode data files to another location, 2271 separate from the other data files 2272 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 2273 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 2274 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 2275 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu 2276 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 2277 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 2278 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 2279- refresh ICU4J 2280 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 2281 2282* refresh Java test .txt files 2283- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 2284 2285* test ICU so far, fix test code where necessary 2286- temporarily ignore collation issues that look like UCA/UCD mismatches, 2287 until UCA data is updated 2288 2289* UCA 2290 2291- get output from Mark's tools; look in 2292 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt 2293- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 2294- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 2295 (note removing the underscore before "Rules") 2296- update (ICU)/source/test/testdata/CollationTest_*.txt 2297 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 2298 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 2299- check test file diffs for previously commented-out, known-failing data lines; 2300 probably need to keep those commented out 2301- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 2302- run makeuca.sh: 2303 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 2304- rebuild ICU4C 2305- refresh ICU4J collation data: 2306 (subset of instructions above for properties data refresh, except copies all coll/*) 2307 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2308 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 2309 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 2310 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 2311- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 2312- note on intltest: if collate/UCAConformanceTest fails, then 2313 utility/MultithreadTest/TestCollators will fail as well; 2314 fix the conformance test before looking into the multi-thread test 2315 2316* When refreshing all of ICU4J data from ICU4C 2317- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2318- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 2319or 2320- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 2321 2322*** LayoutEngine script information 2323 2324(For details see the Unicode 5.2 change log below.) 2325 2326* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 2327 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 2328 in the working directory. 2329 (It also generates ScriptRunData.cpp, which is no longer needed.) 2330 2331 The generated files have a current copyright date and "@draft" statement. 2332 2333- diff current <icu>/source/layout files vs. generated ones 2334 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 2335 review and manually merge desired changes; 2336 fix gratuitous changes, incorrect @draft and missing aliases; 2337 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 2338- if you just copy the above files, then 2339 fix mixed line endings, review the diffs as above and restore changes to API tags etc.; 2340 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 2341 2342*** merge the Unicode update branches back onto the trunk 2343- do not merge the icudata.jar and testdata.jar, 2344 instead rebuild them from merged & tested ICU4C 2345 2346---------------------------------------------------------------------------- *** 2347 2348ICU 4.8 (no Unicode update, just new script codes) 2349 2350* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 2351 (added 2010-12-21) 2352 Afak 439 Afaka 2353 Jurc 510 Jurchen 2354 Mroo 199 Mro, Mru 2355 Nshu 499 Nüshu 2356 Shrd 319 Sharada, Śāradā 2357 Sora 398 Sora Sompeng 2358 Takr 321 Takri, Ṭākrī, Ṭāṅkrī 2359 Tang 520 Tangut 2360 Wole 480 Woleai 2361 -> uscript.h 2362 -> com.ibm.icu.lang.UScript 2363 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 2364 replace public static final int \1 = \2;\3 2365 -> genpname/SyntheticPropertyValueAliases.txt 2366 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 2367 and in com.ibm.icu.dev.test.lang.TestUScript.java 2368 2369* run genpname/preparse.pl (on Linux) 2370 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 2371 + make sure that data.h is writable 2372 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 2373 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 2374 2375* rebuild Unicode tools (at least genpname) using make 2376- You might first need to "make install" ICU so that the tools build can pick 2377 up the new definitions from the installed header files. 2378 2379* run genpname 2380 (builds both pnames.icu and propname_data.h) 2381- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 2382- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 2383- rebuild ICU & tools 2384 2385* run genprops 2386- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 2387- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 2388- rebuild ICU & tools 2389 2390* update Java data files 2391- refresh just the UCD-related files, just to be safe 2392- see (ICU4C)/source/data/icu4j-readme.txt 2393- mkdir /tmp/icu4j 2394- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2395- copy the big-endian Unicode data files to another location, 2396 separate from the other data files 2397 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 2398 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 2399 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 2400- refresh ICU4J 2401 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b 2402 2403* should have updated the layout engine script codes but forgot 2404 2405---------------------------------------------------------------------------- *** 2406 2407Unicode 6.0 update 2408 2409*** related ICU Trac tickets 2410 24117264 Unicode 6.0 Update 2412 2413*** Unicode version numbers 2414- makedata.mak 2415- uchar.h 2416 (configure.in & configure: have been modified to extract the version from uchar.h) 2417- com.ibm.icu.util.VersionInfo 2418 2419*** data files & enums & parser code 2420 2421* file preparation 2422 2423~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed 2424- This now prepares both unidata and testdata files in respective output subfolders. 2425 2426* PropertyAliases.txt changes 2427- new Script_Extensions property defined in the new ScriptExtensions.txt file 2428 but not listed in PropertyAliases.txt; reported to unicode.org; 2429 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt 2430 scx; Script_Extensions 2431 -> uchar.h with new UProperty section 2432 -> com.ibm.icu.lang.UProperty, parallel with uchar.h 2433 2434* PropertyValueAliases.txt changes 2435- 12 new block names: 2436 Alchemical_Symbols 2437 Bamum_Supplement 2438 Batak 2439 Brahmi 2440 CJK_Unified_Ideographs_Extension_D 2441 Emoticons 2442 Ethiopic_Extended_A 2443 Kana_Supplement 2444 Mandaic 2445 Miscellaneous_Symbols_And_Pictographs 2446 Playing_Cards 2447 Transport_And_Map_Symbols 2448 -> add to uchar.h 2449 -> add to UCharacter.UnicodeBlock 2450 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 2451 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 2452- Joining_Group (jg) values: 2453 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias 2454 -> uchar.h & UCharacter.JoiningGroup 2455- 3 new scripts: 2456 sc ; Batk ; Batak 2457 sc ; Brah ; Brahmi 2458 sc ; Mand ; Mandaic 2459 -> remove these from SyntheticPropertyValueAliases.txt 2460 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN 2461 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 2462 and in com.ibm.icu.dev.test.lang.TestUScript.java 2463- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 2464 (added 2009-11-11..2010-07-18) 2465 Bass 259 Bassa Vah 2466 Dupl 755 Duployan shortand 2467 Elba 226 Elbasan 2468 Gran 343 Grantha 2469 Kpel 436 Kpelle 2470 Loma 437 Loma 2471 Mend 438 Mende 2472 Merc 101 Meroitic Cursive 2473 Narb 106 Old North Arabian 2474 Nbat 159 Nabataean 2475 Palm 126 Palmyrene 2476 Sind 318 Sindhi 2477 Wara 262 Warang Citi 2478 -> uscript.h 2479 -> com.ibm.icu.lang.UScript 2480 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 2481 replace public static final int \1 = \2;\3 2482 -> SyntheticPropertyValueAliases.txt 2483 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 2484 and in com.ibm.icu.dev.test.lang.TestUScript.java 2485- ISO 15924 name change 2486 Mero 100 Meroitic Hieroglyphs (was Meroitic) 2487 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC 2488- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt 2489 2490* UnicodeData.txt changes 2491- new CJK block: 2492 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; 2493 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; 2494 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion 2495 2496* build Unicode tools using CMake+make 2497 2498* run genpname/preparse.pl (on Linux) 2499 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 2500 + make sure that data.h is writable 2501 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 2502 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 2503 2504* rebuild Unicode tools (at least genpname) using make 2505- You might first need to "make install" ICU so that the tools build can pick 2506 up the new definitions from the installed header files. 2507 2508* run genpname 2509- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 2510- rebuild ICU & tools 2511 2512* update source/data/unidata/norm2/nfkc_cf.txt 2513- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 2514 2515* update source/data/unidata/norm2/uts46.txt 2516- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt 2517 to ~/svn.icu/tools/trunk/src/unicode/py 2518- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values 2519- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 2520- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 2521 2522* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 2523 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 2524- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 2525- Unicode 6.0: U+2260, U+226E, U+226F 2526 2527* generate core properties data files 2528- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 2529- rebuild ICU & tools 2530- run makeuca.sh so that genuca picks up the new nfc.nrm: 2531 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 2532- rebuild ICU & tools 2533 2534* implement new Script_Extensions property (provisional) 2535- parser & generator: genprops & uprops.icu 2536- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp 2537- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java 2538 2539* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 2540- (one-time change) 2541- genbidi/gencase/genprops tools changes 2542- re-run makeprops.sh (see above) 2543- UCharacterProperty.java, UCharacterTypeIterator.java, 2544 UBiDiProps.java, UCaseProps.java, and several others with minor changes; 2545 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java 2546 2547* update Java data files 2548- refresh just the UCD-related files, just to be safe 2549- see (ICU4C)/source/data/icu4j-readme.txt 2550- mkdir /tmp/icu4j 2551- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2552 output: 2553 ... 2554 Unicode .icu files built to ./out/build/icudt45l 2555 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b 2556 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 2557 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b 2558 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b 2559 mkdir -p /tmp/icu4j/main/shared/data 2560 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 2561- copy the big-endian Unicode data files to another location, 2562 separate from the other data files 2563 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 2564 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 2565 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 2566 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu 2567 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 2568 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 2569 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 2570- refresh ICU4J 2571 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 2572 2573* refresh Java test .txt files 2574- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 2575 2576* un-hardcode normalization skippable (NF*_Inert) test data 2577- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools 2578 2579* copy updated break iterator test files 2580- now handled by early ucdcopy.py and 2581 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata 2582 (old instructions: 2583 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt 2584 to ~/svn.icu/trunk/src/source/test/testdata) 2585- they are not used in ICU4J 2586 2587* UCA 2588 2589- get output from Mark's tools; look in 2590 http://www.unicode.org/~book/incoming/mark/uca6.0.0/ 2591 http://www.macchiato.com/unicode/utc/additional-uca-files 2592 http://www.unicode.org/Public/UCA/6.0.0/ 2593 http://www.unicode.org/~mdavis/uca/ 2594- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 2595- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 2596- update Han-implicit ranges for new CJK extensions: 2597 swapCJK() in ucol.cpp & ImplicitCEGenerator.java 2598- genuca: allow bytes 02 for U+FFFE, new merge-sort character; 2599 do not add it into invuca so that tailoring primary-after an ignorable works 2600- genuca: permit space between [variable top] bytes 2601- ucol.cpp: treat noncharacters like unassigned rather than ignorable 2602- run makeuca.sh: 2603 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 2604- rebuild ICU4C 2605- refresh ICU4J collation data: 2606 (subset of instructions above for properties data refresh, except copies all coll/*) 2607 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2608 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 2609 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 2610 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 2611- update (ICU)/source/test/testdata/CollationTest_*.txt 2612 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 2613 with output from Mark's Unicode tools 2614- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 2615- note on intltest: if collate/UCAConformanceTest fails, then 2616 utility/MultithreadTest/TestCollators will fail as well; 2617 fix the conformance test before looking into the multi-thread test 2618 2619* When refreshing all of ICU4J data from ICU4C 2620- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 2621- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 2622or 2623- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 2624 2625*** LayoutEngine script information 2626 2627(For details see the Unicode 5.2 change log below.) 2628 2629* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 2630ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 2631ScriptRunData.cpp, which is no longer needed.) 2632 2633The generated files have a current copyright date and "@draft" statement. 2634 2635* copy the above files into <icu>/source/layout, replacing the old files. 2636* fix mixed line endings 2637* review the diffs and fix incorrect @draft and missing aliases; 2638 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 2639* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 2640 2641---------------------------------------------------------------------------- *** 2642 2643Unicode 5.2 update 2644 2645*** related ICU Trac tickets 2646 26477084 Unicode 5.2 2648 26497167 verify collation bytes 26507235 Java test NAME_ALIAS 26517236 Java DerivedCoreProperties.txt test 26527237 Java BidiTest.txt 26537238 UTrie2 in core unidata 26547239 test for tailoring gaps 26557240 Java fix CollationMiscTest 26567243 update layout engine for Unicode 5.2 2657 2658*** Unicode version numbers 2659- makedata.mak 2660- uchar.h 2661- configure.in & configure 2662- update ucdVersion in gennames.c if an algorithmic range changes 2663 2664*** data files & enums & parser code 2665 2666* file preparation 2667 2668python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata 2669- includes finding files regardless of version numbers, 2670 copying them, and performing the equivalent processing of the 2671 ucdstrip and ucdmerge tools on the desired set of files 2672 2673* notes on changes 2674- PropertyAliases.txt 2675 moved from numeric to enumerated: 2676 ccc ; Canonical_Combining_Class 2677 new string properties: 2678 NFKC_CF ; NFKC_Casefold 2679 Name_Alias; Name_Alias 2680 new binary properties: 2681 Cased ; Cased 2682 CI ; Case_Ignorable 2683 CWCF ; Changes_When_Casefolded 2684 CWCM ; Changes_When_Casemapped 2685 CWKCF ; Changes_When_NFKC_Casefolded 2686 CWL ; Changes_When_Lowercased 2687 CWT ; Changes_When_Titlecased 2688 CWU ; Changes_When_Uppercased 2689 new CJK Unihan properties (not supported by ICU) 2690- PropertyValueAliases.txt 2691 new block names 2692 new scripts 2693 one script code change: 2694 sc ; Qaai ; Inherited 2695 -> 2696 sc ; Zinh ; Inherited ; Qaai 2697 new Line_Break (lb) value: 2698 lb ; CP ; Close_Parenthesis 2699 new Joining_Group (jg) values: Farsi_Yeh, Nya 2700 other new values: 2701 ccc; 214; ATA ; Attached_Above 2702- DerivedBidiClass.txt 2703 new default-R range: U+1E800 - U+1EFFF 2704- UnicodeData.txt 2705 all of the ISO comments are gone 2706 new CJK block end: 2707 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> 2708 new CJK block: 2709 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; 2710 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; 2711 2712* genpname 2713- run preparse.pl 2714 + cd \svn\icuproj\icu\trunk\source\tools\genpname 2715 + make sure that data.h is writable 2716 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt 2717 + preparse.pl complains with errors like the following: 2718 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. 2719 This is because ICU 4.0 had scripts from ISO 15924 which are now 2720 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt 2721 and PropertyValueAliases.txt. 2722 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 2723 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt 2724 + preparse.pl complains with errors about block names missing from uchar.h; add them 2725 2726* uchar.h & uscript.h & uprops.h & uprops.c & genprops 2727- new block & script values 2728 + 26 new blocks 2729 copy new blocks from Blocks.txt 2730 MS VC++ 2008 regular expression: 2731 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" 2732 replace with " UBLOCK_\3 = 172, /*[\1]*/" 2733 + several new script values already added in ICU 4.0 for ISO 15924 coverage 2734 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) 2735 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage 2736 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) 2737 (added to SyntheticPropertyValueAliases.txt) 2738- new Joining Group (JG) values: Farsi_Yeh, Nya 2739- new Line_Break (lb) value: 2740 lb ; CP ; Close_Parenthesis 2741 2742* hardcoded Unihan range end/limit 2743- Unihan range end moves from 9FC3 to 9FCB 2744 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) 2745 + do change gennames.c 2746 2747* Compare definitions of new binary properties with what we used to use 2748 in algorithms, to see if the definitions changed. 2749- Verified that definitions for Cased and Case_Ignorable are unchanged. 2750 The gencase tool now parses the newly public Case_Ignorable values 2751 in case the definition changes in the future. 2752 2753* uchar.c & uprops.h & uprops.c & genprops 2754- new numeric values that didn't exist in Unicode data before: 2755 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 2756 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, 2757 therefore redesign the encoding of numeric types and values for formatVersion 6; 2758 design for simple numbers up to at least 144 ("one gross"), 2759 large values up to at least 10^20, 2760 and fractions with numerators -1..17 and denominators 1..16 2761 to cover current and expected future values 2762 (e.g., more Han numeric values, Meroitic twelfths) 2763 2764* reimplement Hangul_Syllable_Type for new Jamo characters 2765- the old code assumed that all Jamo characters are in the 11xx block 2766- Unicode 5.2 fills holes there and adds new Jamo characters in 2767 A960..A97F; Hangul Jamo Extended-A 2768 and in 2769 D7B0..D7FF; Hangul Jamo Extended-B 2770- Hangul_Syllable_Type can be trivially derived from a subset of 2771 Grapheme_Cluster_Break values 2772 2773* build Unicode data source code for hardcoding core data 2774C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data 2775 2776ICU data make path is \svn\icuproj\icu\trunk\source\data\ 2777ICU root path is \svn\icuproj\icu\trunk 2778Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 2779Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 2780Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 2781Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 2782Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 2783Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 2784Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 2785Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. 2786Creating data file for Unicode Property Names 2787Creating data file for Unicode Character Properties 2788Creating data file for Unicode Case Mapping Properties 2789Creating data file for Unicode BiDi/Shaping Properties 2790Creating data file for Unicode Normalization 2791Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" 2792Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" 2793 2794- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common 2795 and rebuild the common library 2796 2797*** UCA 2798 2799- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) 2800- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools 2801- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools 2802[ Begin obsolete instructions: 2803 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. 2804 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py 2805 on Windows: 2806 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt 2807 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt 2808 End obsolete instructions] 2809- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 2810 not just the *_STUB.txt files 2811- note on intltest: if collate/UCAConformanceTest fails, then 2812 utility/MultithreadTest/TestCollators will fail as well; 2813 fix the conformance test before looking into the multi-thread test 2814 2815*** Implement Cased & Case_Ignorable properties 2816- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() 2817- Problem: These properties should be disjoint, but aren't 2818- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not 2819- change ucase.icu to be able to store any combination of Cased and Case_Ignorable 2820 2821*** Implement Changes_When_Xyz properties 2822- without stored data 2823 2824*** Implement Name_Alias property 2825- add it as another name field in unames.icu 2826- make it available via u_charName() and UCharNameChoice and 2827- consider it in u_charFromName() 2828 2829*** Break iterators 2830 2831* Update break iterator rules to new UAX versions and new property values 2832* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary 2833 2834*** new BidiTest file 2835- review format and data 2836- copy BidiTest.txt to source/test/testdata 2837- write test code using this data 2838- fix ICU code where it fails the conformance test 2839 2840*** Java 2841- generally, find and update code corresponding to C/C++ 2842- UCharacter.UnicodeBlock constants: 2843 a) add an _ID integer per new block, update COUNT 2844 b) add a class instance per new block 2845 Visual Studio regex: 2846 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} 2847 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 2848- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() 2849 2850- port test changes to Java 2851 2852*** LayoutEngine script information 2853 2854(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) 2855 2856* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 2857ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 2858ScriptRunData.cpp, which is no longer needed.) 2859 2860The generated files have a current copyright date and "@draft" statement. 2861 2862-> Eric Mader wrote in email on 20090930: 2863 "I think the tool has been modified to update @draft to @stable for 2864 older scripts and to add @draft for new scripts. 2865 (I worked with an intern on this last year.) 2866 You should check the output after you run it." 2867 2868* copy the above files into <icu>/source/layout, replacing the old files. 2869* fix mixed line endings 2870* review the diffs and fix incorrect @draft and missing aliases 2871* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 2872 2873Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 2874and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 2875 2876-> Eric Mader wrote in email on 20090930: 2877 "This is just a matter of making sure that all the per-script tables have 2878 entries for any new scripts that were added. 2879 If any new Indic characters were added, then the class tables in 2880 IndicClassTables.cpp should be updated to reflect this. 2881 John Emmons should know how to do this if it's required." 2882 2883* rebuild the layout and layoutex libraries. 2884 2885*** Documentation 2886- Update User Guide 2887 + Jamo_Short_Name, sfc->scf, binary property value aliases 2888 2889---------------------------------------------------------------------------- *** 2890 2891Unicode 5.1 update 2892 2893*** related ICU Trac tickets 2894 28955696 Update to Unicode 5.1 2896 2897*** Unicode version numbers 2898- makedata.mak 2899- uchar.h 2900- configure.in & configure 2901- update ucdVersion in gennames.c if an algorithmic range changes 2902 2903*** data files & enums & parser code 2904 2905* file preparation 2906- ucdstrip: 2907 DerivedCoreProperties.txt 2908 DerivedNormalizationProps.txt 2909 NormalizationTest.txt 2910 PropList.txt 2911 Scripts.txt 2912 GraphemeBreakProperty.txt 2913 SentenceBreakProperty.txt 2914 WordBreakProperty.txt 2915- ucdstrip and ucdmerge: 2916 EastAsianWidth.txt 2917 LineBreak.txt 2918 2919* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 2920copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ 2921copy 5.1.0\ucd\Blocks.txt ..\unidata\ 2922copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ 2923copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ 2924copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 2925copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 2926copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 2927copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 2928copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ 2929copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ 2930copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ 2931copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ 2932copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ 2933 2934ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 2935ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 2936ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 2937ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt 2938ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 2939ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 2940ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 2941ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 2942ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 2943ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 2944 2945* genpname 2946- run preparse.pl 2947 + cd \svn\icuproj\icu\uni51\source\tools\genpname 2948 + make sure that data.h is writable 2949 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt 2950 + preparse.pl complains with errors like the following: 2951 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. 2952 This is because ICU 3.8 had scripts from ISO 15924 which are now 2953 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt 2954 and PropertyValueAliases.txt. 2955 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 2956 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii 2957 + PropertyValueAliases.txt now explicitly contains values for boolean properties: 2958 N/Y, No/Yes, F/T, False/True 2959 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. 2960 It will use further values from the file if present. 2961 2962* uchar.h & uscript.h & uprops.h & uprops.c & genprops 2963- new block & script values 2964 + 17 new blocks 2965 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage 2966 (removed from SyntheticPropertyValueAliases.txt) 2967 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) 2968 (added to SyntheticPropertyValueAliases.txt) 2969- uprops.icu (uprops.h) only provides 7 bits for script codes. 2970 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. 2971 There is none above 127 yet which is the script code for an 2972 assigned Unicode character, so ICU 4.0 uprops.icu does not store any 2973 script code values greater than 127. 2974 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 2975 in a parallel bit field, and that overflows now. 2976 Also, future values >=128 would be incompatible anyway. 2977 uprops.h is modified to move around several of the bit fields 2978 in the properties vector words, and now uses 8 bits for the script code. 2979 Two other bit fields also grow to accommodate future growth: 2980 Block (current count: 172) grows from 8 to 9 bits, 2981 and Word_Break grows from 4 to 5 bits. 2982- renamed property Simple_Case_Folding (sfc->scf) 2983 + nothing to be done: handled as normal alias 2984- new property JSN Jamo_Short_Name 2985 + no new API: only contributes to the Name property 2986- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark 2987- new Joining Group (JG) value: Burushashki_Yeh_Barree 2988- new Sentence_Break (SB) values: 2989 SB ; CR ; CR 2990 SB ; EX ; Extend 2991 SB ; LF ; LF 2992 SB ; SC ; SContinue 2993- new Word_Break (WB) values: 2994 WB ; CR ; CR 2995 WB ; Extend ; Extend 2996 WB ; LF ; LF 2997 WB ; MB ; MidNumLet 2998 2999* Further changes in the 2008-02-29 update: 3000- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP 3001 because they should not normally be invisible. 3002- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) 3003- new Grapheme_Cluster_Break (GCB) value: PP=Prepend 3004- new Word_Break (WB) value: NL=Newline 3005 3006* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) 3007- Unihan range end moves from 9FBB to 9FC3 3008 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) 3009 + do change gennames.c 3010 3011* build Unicode data source code for hardcoding core data 3012C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data 3013 3014ICU data make path is \svn\icuproj\icu\uni51\source\data\ 3015ICU root path is \svn\icuproj\icu\uni51 3016Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 3017Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 3018Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 3019Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 3020Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 3021Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 3022Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 3023Creating data file for Unicode Character Properties 3024Creating data file for Unicode Case Mapping Properties 3025Creating data file for Unicode BiDi/Shaping Properties 3026Creating data file for Unicode Normalization 3027Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" 3028Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" 3029 3030- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common 3031 and rebuild the common library 3032 3033*** Break iterators 3034 3035* Update break iterator rules to new UAX versions and new property values 3036 3037*** UCA 3038 3039* update FractionalUCA.txt and UCARules.txt with new canonical closure 3040 3041*** Test suites 3042- Test that APIs using Unicode property value aliases (like UnicodeSet) 3043 support all of the boolean values N/Y, No/Yes, F/T, False/True 3044 -> TestBinaryValues() tests in both cintltst and intltest 3045 3046*** LayoutEngine script information 3047* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 3048ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 3049ScriptRunData.cpp, which is no longer needed.) 3050 3051The generated files have a current copyright date and "@draft" statement. 3052 3053* copy the above files into <icu>/source/layout, replacing the old files. 3054 3055Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 3056and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 3057 3058* rebuild the layout and layoutex libraries. 3059 3060*** Documentation 3061- Update User Guide 3062 + Jamo_Short_Name, sfc->scf, binary property value aliases 3063 3064---------------------------------------------------------------------------- *** 3065 3066Unicode 5.0 update 3067 3068*** related Jitterbugs 3069 30705084 RFE: Update to Unicode 5.0 3071 3072*** data files & enums & parser code 3073 3074* file preparation 3075- ucdstrip: 3076 DerivedCoreProperties.txt 3077 DerivedNormalizationProps.txt 3078 NormalizationTest.txt 3079 PropList.txt 3080 Scripts.txt 3081 GraphemeBreakProperty.txt 3082 SentenceBreakProperty.txt 3083 WordBreakProperty.txt 3084- ucdstrip and ucdmerge: 3085 EastAsianWidth.txt 3086 LineBreak.txt 3087 3088* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 3089copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ 3090copy 5.0.0\ucd\Blocks.txt ..\unidata\ 3091copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ 3092copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ 3093copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 3094copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 3095copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 3096copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 3097copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ 3098copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ 3099copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ 3100copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ 3101copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ 3102 3103ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 3104ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 3105ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 3106ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt 3107ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 3108ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 3109ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 3110ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 3111ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 3112ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 3113 3114* update FractionalUCA.txt and UCARules.txt with new canonical closure 3115 3116* genpname 3117- run preparse.pl 3118 + make sure that data.h is writable 3119 + perl preparse.pl \cvs\oss\icu > out.txt 3120 3121* uchar.h & uscript.h & uprops.h & uprops.c & genprops 3122- new block & script values 3123 + script values already added in ICU 3.6 because all of ISO 15924 is now covered 3124 3125* build Unicode data source code for hardcoding core data 3126C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data 3127 3128ICU data make path is \cvs\oss\icu\source\data\ 3129ICU root path is \cvs\oss\icu 3130Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 3131[etc.] 3132Creating data file for Unicode Character Properties 3133Creating data file for Unicode Case Mapping Properties 3134Creating data file for Unicode BiDi/Shaping Properties 3135Creating data file for Unicode Normalization 3136Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" 3137Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" 3138 3139- copy the .c source files to C:\cvs\oss\icu\source\common 3140 and rebuild the common library 3141 3142*** Unicode version numbers 3143- makedata.mak 3144- uchar.h 3145- configure.in 3146 3147*** LayoutEngine script information 3148* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 3149ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 3150ScriptRunData.cpp, which is no longer needed.) 3151 3152The generated files have a current copyright date and "@draft" statement. 3153 3154* copy the above files into <icu>/source/layout, replacing the old files. 3155 3156Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 3157and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 3158 3159* rebuild the layout and layoutex libraries. 3160 3161---------------------------------------------------------------------------- *** 3162 3163Unicode 4.1 update 3164 3165*** related Jitterbugs 3166 31674332 RFE: Update to Unicode 4.1 31684157 RBBI, TR29 4.1 updates 3169 3170*** data files & enums & parser code 3171 3172* file preparation 3173- ucdstrip: 3174 DerivedCoreProperties.txt 3175 DerivedNormalizationProps.txt 3176 NormalizationTest.txt 3177 GraphemeBreakProperty.txt 3178 SentenceBreakProperty.txt 3179 WordBreakProperty.txt 3180- ucdstrip and ucdmerge: 3181 EastAsianWidth.txt 3182 LineBreak.txt 3183 3184* add new files to the repository 3185 GraphemeBreakProperty.txt 3186 SentenceBreakProperty.txt 3187 WordBreakProperty.txt 3188 3189* update FractionalUCA.txt and UCARules.txt with new canonical closure 3190 3191* genpname 3192- handle new enumerated properties in sub read_uchar 3193- run preparse.pl 3194 3195* uchar.h & uscript.h & uprops.h & uprops.c & genprops 3196- new binary properties 3197 + Pattern_Syntax 3198 + Pattern_White_Space 3199- new enumerated properties 3200 + Grapheme_Cluster_Break 3201 + Sentence_Break 3202 + Word_Break 3203- new block & script & line break values 3204 3205* gencase 3206- case-ignorable changes 3207 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 3208 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk 3209 3210*** Unicode version numbers 3211- makedata.mak 3212- uchar.h 3213- configure.in 3214 3215*** tests 3216- verify that u_charMirror() round-trips 3217- test all new properties and some new values of old properties 3218 3219*** other code 3220 3221* hardcoded Unihan range end/limit 3222- Unihan range end moves from 9FA5 to 9FBB 3223 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) 3224 + do not modify BOCU/BOCSU code because that would change the encoding 3225 and break binary compatibility! 3226 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), 3227 NamePrepProfile.txt 3228 + ignore trietest.c: test data is arbitrary 3229 + ignore tstnorm.cpp: test optimization, not important 3230 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF 3231 + do change line_th.txt and word_th.txt 3232 by replacing hardcoded ranges with the new property values 3233 + do change gennames.c 3234 3235source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 3236source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 3237source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, 3238 3239* case mappings 3240- compare new special casing context conditions with previous ones 3241 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 3242 3243* genpname 3244- consider storing only the short name if it is the same as the long name 3245 3246*** other reviews 3247- UAX #29 changes (grapheme/word/sentence breaks) 3248- UAX #14 changes (line breaks) 3249- Pattern_Syntax & Pattern_White_Space 3250 3251---------------------------------------------------------------------------- *** 3252 3253Unicode 4.0.1 update 3254 3255*** related Jitterbugs 3256 32573170 RFE: Update to Unicode 4.0.1 32583171 Add new Unicode 4.0.1 properties 32593520 use Unicode 4.0.1 updates for break iteration 3260 3261*** data files & enums & parser code 3262 3263* file preparation 3264- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt 3265- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt 3266 3267* file fixes 3268- fix UnicodeData.txt general categories of Ethiopic digits Nd->No 3269 according to PRI #26 3270 http://www.unicode.org/review/resolved-pri.html#pri26 3271- undone again because no corrigendum in sight; 3272 instead modified tests to not check consistency on this for Unicode 4.0.1 3273 3274* ucdterms.txt 3275- update from http://www.unicode.org/copyright.html 3276 formatted for plain text 3277 3278* uchar.h & uprops.h & uprops.c & genprops 3279- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed 3280- add U_LB_INSEPARABLE due to a spelling fix 3281 + put short name comment only on line with new constant 3282 for genpname perl script parser 3283- new binary properties 3284 + STerm 3285 + Variation_Selector 3286 3287* genpname 3288- fix genpname perl script so that it doesn't choke on more than 2 names per property value 3289- perl script: correctly calculate the maximum number of fields per row 3290 3291* uscript.h 3292- new script code Hrkt=Katakana_Or_Hiragana 3293 3294* gennorm.c track changes in DerivedNormalizationProps.txt 3295- "FNC" -> "FC_NFKC" 3296- single field "NFD_NO" -> two fields "NFD_QC; N" etc. 3297 3298* genprops/props2.c track changes in DerivedNumericValues.txt 3299- changed from 3 columns to 2, dropping the numeric type 3300 + assume that the type is always numeric for Han characters, 3301 and that only those are added in addition to what UnicodeData.txt lists 3302 3303*** Unicode version numbers 3304- makedata.mak 3305- uchar.h 3306- configure.in 3307 3308*** tests 3309- update test of default bidi classes according to PRI #28 3310 /tsutil/cucdtst/TestUnicodeData 3311 http://www.unicode.org/review/resolved-pri.html#pri28 3312- bidi tests: change exemplar character for ES depending on Unicode version 3313- change hardcoded expected property values where they change 3314 3315*** other code 3316 3317* name matching 3318- read UCD.html 3319 3320* scripts 3321- use new Hrkt=Katakana_Or_Hiragana 3322 3323* ZWJ & ZWNJ 3324- are now part of combining character sequences 3325- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ 3326