1* Copyright (C) 2004-2015, International Business Machines 2* Corporation and others. All Rights Reserved. 3* 4* file name: changes.txt 5* encoding: US-ASCII 6* tab size: 8 (not used) 7* indentation:4 8* 9* created on: 2004may06 10* created by: Markus W. Scherer 11* 12* change log for Unicode updates 13 14---------------------------------------------------------------------------- *** 15 16* New ISO 15924 script codes 17 18Starting with ICU 55, we do not add UScriptCode constants any more until their scripts 19are encoded in Unicode, or can be assumed to be encoded in the next Unicode version. 20Script enum constant names want to follow the Unicode script property value aliases, 21which are assigned only when the scripts are encoded. 22When we encode scripts early and guess wrong, then we have confusing enum constants 23and have sometimes added aliases. 24 25Exception: Script codes like Latf and Aran that are not subject to separate encoding 26can be added at any time. 27 28Script codes not yet in ICU: http://www.unicode.org/iso15924/codechanges.html 29 30Added 2014-11-15, see http://bugs.icu-project.org/trac/ticket/11561 31- Adlm 166 Adlam 32- Aran 161 Arabic (Nastaliq variant) 33- Kitl 505 Khitan large script 34- Kits 288 Khitan small script 35- Marc 332 Marchen 36- Osge 219 Osage 37 38Aran can be added as USCRIPT_ARABIC_NASTALIQ at any time. 39 40Adlam, Marchen, and Osage are expected to go into Unicode 9; 41we should assign Unicode script property value aliases for them 42soon after Unicode 8 is released, and add them in ICU 56. 43 44Khitan scripts will be encoded later. 45 46---------------------------------------------------------------------------- *** 47 48Unicode 8.0 update for ICU ?? 49 50* UCA issue from 7.0 51 52- U+1DE9 COMBINING LATIN SMALL LETTER BETA 53 sorts with Greek Beta, should sort with Latin B? 54 + Ken says: 55 No, it was deliberate: 56 57 03B2;GREEK SMALL LETTER BETA;Ll;;;;0392;;0392 58 1D5D;MODIFIER LETTER SMALL BETA;Lm;<super> 03B2;;;;; 59 1DE9;COMBINING LATIN SMALL LETTER BETA;Mn;<sort> 03B2;;;;; 60 1D66;GREEK SUBSCRIPT SMALL LETTER BETA;Ll;<sub> 03B2;;;;; 61 62 Note the relationship to U+1D5D. 63 64 When the disunified *Latin* beta base letter shows up in Unicode 8.0: 65 66 U+A7B4 LATIN CAPITAL LETTER BETA 67 U+A7B5 LATIN SMALL LETTER BETA 68 69 we could re-evaluate what U+1DE9 equates to, for collation, 70 but currently there isn’t any Latin beta to serve that function 71 in Unicode 7.0. 72 73- ICU_ROOT=~/svn.icu/trunk 74- ICU_SRC_DIR=$ICU_ROOT/src 75- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC_DIR 76- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $ICU_SRC_DIR 77 78 79---------------------------------------------------------------------------- *** 80 81Unicode 7.0 update for ICU 54 82 83http://www.unicode.org/review/pri271/ -- beta review 84http://www.unicode.org/reports/uax-proposed-updates.html 85http://www.unicode.org/versions/beta-7.0.0.html#notable_issues 86http://www.unicode.org/reports/tr44/tr44-13.html 87 88*** ICU Trac 89 90- ticket 10821: Unicode 7.0, UCA 7.0 91- C++ branches/markus/uni70 at r35584 from trunk at r35580 92- Java branches/markus/uni70 at r35587 from trunk at r35545 93 94*** CLDR Trac 95 96- ticket 7195: UCA 7.0 CLDR root collation 97- branches/markus/uni70 at r10062 from trunk at r10061 98 99- ticket 6762: script metadata for Unicode 7.0 new scripts 100 101*** Unicode version numbers 102- makedata.mak 103- uchar.h 104- com.ibm.icu.util.VersionInfo 105- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 106 107- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 108 so that the makefiles see the new version number. 109 110*** data files & enums & parser code 111 112* file preparation 113 114- download UCD & IDNA files 115- make sure that the Unicode data folder passed into preparseucd.py 116 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 117- only for manual diffs: remove version suffixes from the file names 118 ~/unidata/uni70/20140403$ ../../desuffixucd.py . 119 (see https://sites.google.com/site/unicodetools/inputdata) 120- only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), delete Unihan.zip 121- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $ICU_SRC_DIR ~/svn.icutools/trunk/src 122- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 123- Restore TODO diffs in source/data/unidata/UCARules.txt 124 cd $ICU_SRC_DIR 125 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UCARules.txt 126- Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.txt 127 128- also: from http://unicode.org/Public/security/7.0.0/ download new 129 confusables.txt & confusablesWholeScript.txt 130 and copy to $ICU_ROOT/src/source/data/unidata/ 131 132* initial preparseucd.py changes 133- remove new Unicode scripts from the 134 only-in-ISO-15924 list according to the error message: 135 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass', 136 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm', 137 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj'] 138 from _scripts_only_in_iso15924 139 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 140 and in com.ibm.icu.dev.test.lang.TestUScript.java 141- NamesList.txt now has a heading with a non-ASCII character 142 + keep ppucd.txt in platform charset, rather than changing tool/test parsers 143 + escape non-ASCII characters in heading comments 144- gets Unicode copyright line from PropertyAliases.txt which is currently still at 2013 145 + get the copyright from the first file whose copyright line contains the current year 146 147* PropertyValueAliases.txt changes 148- 32 new Block (blk) values: 149 blk; Bassa_Vah ; Bassa_Vah 150 blk; Caucasian_Albanian ; Caucasian_Albanian 151 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers 152 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended 153 blk; Duployan ; Duployan 154 blk; Elbasan ; Elbasan 155 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended 156 blk; Grantha ; Grantha 157 blk; Khojki ; Khojki 158 blk; Khudawadi ; Khudawadi 159 blk; Latin_Ext_E ; Latin_Extended_E 160 blk; Linear_A ; Linear_A 161 blk; Mahajani ; Mahajani 162 blk; Manichaean ; Manichaean 163 blk; Mende_Kikakui ; Mende_Kikakui 164 blk; Modi ; Modi 165 blk; Mro ; Mro 166 blk; Myanmar_Ext_B ; Myanmar_Extended_B 167 blk; Nabataean ; Nabataean 168 blk; Old_North_Arabian ; Old_North_Arabian 169 blk; Old_Permic ; Old_Permic 170 blk; Ornamental_Dingbats ; Ornamental_Dingbats 171 blk; Pahawh_Hmong ; Pahawh_Hmong 172 blk; Palmyrene ; Palmyrene 173 blk; Pau_Cin_Hau ; Pau_Cin_Hau 174 blk; Psalter_Pahlavi ; Psalter_Pahlavi 175 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls 176 blk; Siddham ; Siddham 177 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers 178 blk; Sup_Arrows_C ; Supplemental_Arrows_C 179 blk; Tirhuta ; Tirhuta 180 blk; Warang_Citi ; Warang_Citi 181 -> add to uchar.h 182 use long property names for enum constants 183 -> add to UCharacter.UnicodeBlock IDs 184 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 185 replace public static final int \1_ID = \2; \3 186 -> add to UCharacter.UnicodeBlock objects 187 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 188 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 189- 28 new Joining_Group (jg) values: 190 jg ; Manichaean_Aleph ; Manichaean_Aleph 191 jg ; Manichaean_Ayin ; Manichaean_Ayin 192 jg ; Manichaean_Beth ; Manichaean_Beth 193 jg ; Manichaean_Daleth ; Manichaean_Daleth 194 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh 195 jg ; Manichaean_Five ; Manichaean_Five 196 jg ; Manichaean_Gimel ; Manichaean_Gimel 197 jg ; Manichaean_Heth ; Manichaean_Heth 198 jg ; Manichaean_Hundred ; Manichaean_Hundred 199 jg ; Manichaean_Kaph ; Manichaean_Kaph 200 jg ; Manichaean_Lamedh ; Manichaean_Lamedh 201 jg ; Manichaean_Mem ; Manichaean_Mem 202 jg ; Manichaean_Nun ; Manichaean_Nun 203 jg ; Manichaean_One ; Manichaean_One 204 jg ; Manichaean_Pe ; Manichaean_Pe 205 jg ; Manichaean_Qoph ; Manichaean_Qoph 206 jg ; Manichaean_Resh ; Manichaean_Resh 207 jg ; Manichaean_Sadhe ; Manichaean_Sadhe 208 jg ; Manichaean_Samekh ; Manichaean_Samekh 209 jg ; Manichaean_Taw ; Manichaean_Taw 210 jg ; Manichaean_Ten ; Manichaean_Ten 211 jg ; Manichaean_Teth ; Manichaean_Teth 212 jg ; Manichaean_Thamedh ; Manichaean_Thamedh 213 jg ; Manichaean_Twenty ; Manichaean_Twenty 214 jg ; Manichaean_Waw ; Manichaean_Waw 215 jg ; Manichaean_Yodh ; Manichaean_Yodh 216 jg ; Manichaean_Zayin ; Manichaean_Zayin 217 jg ; Straight_Waw ; Straight_Waw 218 -> uchar.h & UCharacter.JoiningGroup 219- 23 new Script (sc) values: 220 sc ; Aghb ; Caucasian_Albanian 221 sc ; Bass ; Bassa_Vah 222 sc ; Dupl ; Duployan 223 sc ; Elba ; Elbasan 224 sc ; Gran ; Grantha 225 sc ; Hmng ; Pahawh_Hmong 226 sc ; Khoj ; Khojki 227 sc ; Lina ; Linear_A 228 sc ; Mahj ; Mahajani 229 sc ; Mani ; Manichaean 230 sc ; Mend ; Mende_Kikakui 231 sc ; Modi ; Modi 232 sc ; Mroo ; Mro 233 sc ; Narb ; Old_North_Arabian 234 sc ; Nbat ; Nabataean 235 sc ; Palm ; Palmyrene 236 sc ; Pauc ; Pau_Cin_Hau 237 sc ; Perm ; Old_Permic 238 sc ; Phlp ; Psalter_Pahlavi 239 sc ; Sidd ; Siddham 240 sc ; Sind ; Khudawadi 241 sc ; Tirh ; Tirhuta 242 sc ; Wara ; Warang_Citi 243 -> uscript.h (many were added before) 244 comment "Mende Kikakui" for USCRIPT_MENDE 245 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias 246 -> com.ibm.icu.lang.UScript 247 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 248 replace public static final int \1 = \2; \3 249- 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 250 (added 2012-11-01) 251 Ahom 338 Ahom 252 Hatr 127 Hatran 253 Mult 323 Multani 254 (added 2013-10-12) 255 Modi 324 Modi 256 Pauc 263 Pau Cin Hau 257 Sidd 302 Siddham 258 -> uscript.h (some overlap with additions from Unicode) 259 -> com.ibm.icu.lang.UScript 260 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 261 replace public static final int \1 = \2; \3 262 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924 263 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 264 and in com.ibm.icu.dev.test.lang.TestUScript.java 265 266* update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 267 (not strictly necessary for NOT_ENCODED scripts) 268 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/source/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt 269 270* generate normalization data files 271- cd $ICU_ROOT/dbg 272- export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib 273- SRC_DATA_IN=$ICU_SRC_DIR/source/data/in 274- UNIDATA=$ICU_SRC_DIR/source/data/unidata 275- bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2 nfc.txt --csource 276- bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 277- bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 278- bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 279- bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 280 281* build ICU (make install) 282 so that the tools build can pick up the new definitions from the installed header files. 283 284~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt 285 286* build Unicode tools using CMake+make 287 288~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 289 290# Location (--prefix) of where ICU was installed. 291set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst) 292# Location of the ICU source tree. 293set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src) 294 295~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c 296~/svn.icutools/trunk/dbg/unicode/c$ make 297 298* genprops work 299- new code point range for Joining_Group values: 10AC0..10AFF Manichaean 300 + add second array of Joining_Group values for at most 10800..10FFF 301 icutools: unicode/c/genprops/bidipropsbuilder.cpp 302 icu: source/common/ubidi_props.h/.c/_data.h 303 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java 304 305* generate core properties data files 306- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR 307- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR 308- rebuild ICU (make install) & tools 309- run genuca again (see step above) so that it picks up the new nfc.nrm 310- rebuild ICU (make install) & tools 311 312* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 313 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 314- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 315- Unicode 6.0..7.0: U+2260, U+226E, U+226F 316- nothing new in 7.0, no test file to update 317 318* run & fix ICU4C tests 319 320* update Java data files 321- refresh just the UCD-related files, just to be safe 322- see (ICU4C)/source/data/icu4j-readme.txt 323- mkdir /tmp/icu4j 324- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 325 output: 326 ... 327 Unicode .icu files built to ./out/build/icudt53l 328 echo timestamp > uni-core-data 329 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b 330 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b 331 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 332 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b 333 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b" 334 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b/ 335 mkdir -p /tmp/icu4j/main/shared/data 336 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 337 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt53b/ 338 mkdir -p /tmp/icu4j/main/shared/data 339 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 340 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data' 341- copy the big-endian Unicode data files to another location, 342 separate from the other data files 343 ICUDT=icudt54b 344 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 345 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 346 cd ~/svn.icu/uni70/dbg/data/out/icu4j 347 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 348 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 349 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu 350 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT 351 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 352 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr 353- refresh ICU4J 354 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 355 356* update CollationFCD.java 357 + copy & paste the initializers of lcccIndex[] etc. from 358 ICU4C/source/i18n/collationfcd.cpp to 359 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java 360 361* refresh Java test .txt files 362- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 363 cd $ICU_SRC_DIR/source/data/unidata 364 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt NormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 365 cd ../../test/testdata 366 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 367 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/src/main/tests/core/src/com/ibm/icu/dev/data/unicode 368 369* UCA 370 371- download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA/<beta version>/ 372- run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata) 373- update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/7.0.0/ 374- run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA 375- output files are in ~/svn.unitools/Generated/uca/7.0.0/ 376- review data; compare files, use blankweights.sed or similar 377 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxiliary/FractionalUCA.txt > frac-7.0.txt 378- cd ~/svn.unitools/Generated/uca/7.0.0/ 379- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 380 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata/FractionalUCA.txt 381- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 382 (note removing the underscore before "Rules") 383 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/UCARules.txt 384- update (ICU4C)/source/test/testdata/CollationTest_*.txt 385 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 386 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 387 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt 388 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/source/test/testdata/CollationTest_SHIFTED_SHORT.txt 389 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/src/main/tests/collate/src/com/ibm/icu/dev/data 390- run genuca, see command line above 391- rebuild ICU4C 392- refresh ICU4J collation data: 393 (subset of instructions above for properties data refresh, except copies all coll/*) 394 ICUDT=icudt54b 395 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 396 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 397 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll 398 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT 399- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 400- note on intltest: if collate/UCAConformanceTest fails, then 401 utility/MultithreadTest/TestCollators will fail as well; 402 fix the conformance test before looking into the multi-thread test 403- copy all output from Mark's UCA tool to unicode.org for review & staging by Ken & editors 404- copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR branch 405 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/common/uca/ 406 407* When refreshing all of ICU4J data from ICU4C 408- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 409- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 410or 411- ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 412 413* run & fix ICU4J tests 414 415*** LayoutEngine script information 416 417(For details see the Unicode 5.2 change log below.) 418 419* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 420 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 421 in the working directory. 422 (It also generates ScriptRunData.cpp, which is no longer needed.) 423 424 The generated files have a current copyright date and "@stable" statement. 425 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.java 426 for "born stable" Unicode API constants, and to stop parsing ICU version numbers 427 which may not contain dots any more. 428 429- diff current <icu>/source/layout files vs. generated ones 430 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 431 review and manually merge desired changes; 432 fix gratuitous changes, incorrect @draft/@stable and missing aliases; 433 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 434- if you just copy the above files, then 435 fix mixed line endings, review the diffs as above and restore changes to API tags etc.; 436 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 437 438*** API additions 439- send notice to icu-design about new born-@stable API (enum constants etc.) 440 441*** merge the Unicode update branches back onto the trunk 442- do not merge the icudata.jar and testdata.jar, 443 instead rebuild them from merged & tested ICU4C 444 445---------------------------------------------------------------------------- *** 446 447Unicode 6.3 update 448 449http://www.unicode.org/review/pri249/ -- beta review 450http://www.unicode.org/reports/uax-proposed-updates.html 451http://www.unicode.org/versions/beta-6.3.0.html#notable_issues 452http://www.unicode.org/reports/tr44/tr44-11.html 453 454*** ICU Trac 455 456- ticket 10128: update ICU to Unicode 6.3 beta 457- ticket 10168: update ICU to Unicode 6.3 final 458- C++ branches/markus/uni63 at r33552 from trunk at r33551 459- Java branches/markus/uni63 at r33550 from trunk at r33553 460 461- ticket 10142: implement Unicode 6.3 bidi algorithm additions 462 463*** Unicode version numbers 464- makedata.mak 465- uchar.h 466 (configure.in & configure: have been modified to extract the version from uchar.h) 467- com.ibm.icu.util.VersionInfo 468- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 469 470- Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h 471 so that the makefiles see the new version number. 472 473*** data files & enums & parser code 474 475* file preparation 476 477- download UCD, UCA & IDNA files 478- make sure that the Unicode data folder passed into preparseucd.py 479 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 480- modify preparseucd.py: 481 parse new file BidiBrackets.txt 482 with new properties bpb=Bidi_Paired_Bracket and bpt=Bidi_Paired_Bracket_Type 483- ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni63/20130425 ~/svn.icu/uni63/src ~/svn.icutools/trunk/src 484- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 485- Check test file diffs for previously commented-out, known-failing data lines; 486 probably need to keep those commented out. 487 488* PropertyAliases.txt changes 489- 1 new Enumerated Property 490 bpt ; Bidi_Paired_Bracket_Type 491 -> uchar.h & UProperty.java & UCharacter.BidiPairedBracketType 492 -> ubidi_props.h & .c & UBiDiProps.java 493 -> remember to write the max value at UBIDI_MAX_VALUES_INDEX 494 -> uprops.cpp 495 -> change ubidi.icu format version from 2.0 to 2.1 496- 1 new Miscellaneous Property 497 bpb ; Bidi_Paired_Bracket 498 -> uchar.h & UProperty.java 499 -> ppucd.h & .cpp 500 501* PropertyValueAliases.txt changes 502- 3 Bidi_Paired_Bracket_Type (bpt) values: 503 bpt; c ; Close 504 bpt; n ; None 505 bpt; o ; Open 506 -> uchar.h & UCharacter.BidiPairedBracketType 507 -> ubidi_props.h & .c & UBiDiProps.java 508 -> change ubidi.icu format version from 2.0 to 2.1 509- 4 new Bidi_Class (bc) values: 510 bc ; FSI ; First_Strong_Isolate 511 bc ; LRI ; Left_To_Right_Isolate 512 bc ; RLI ; Right_To_Left_Isolate 513 bc ; PDI ; Pop_Directional_Isolate 514 -> uchar.h & UCharacterEnums.ECharacterDirection 515 -> until the bidi code gets updated, 516 Roozbeh suggests mapping the new bc values to ON (Other_Neutral) 517- 3 new Word_Break (WB) values: 518 WB ; HL ; Hebrew_Letter 519 WB ; SQ ; Single_Quote 520 WB ; DQ ; Double_Quote 521 -> uchar.h & UCharacter.WordBreak 522 -> first time Word_Break numeric constants exceed 4 bits (now 17 values) 523- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 524 (added 2012-10-16) 525 Aghb 239 Caucasian Albanian 526 Mahj 314 Mahajani 527 -> uscript.h 528 -> com.ibm.icu.lang.UScript 529 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 530 replace public static final int \1 = \2;\3 531 -> preparseucd.py _scripts_only_in_iso15924 532 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 533 and in com.ibm.icu.dev.test.lang.TestUScript.java 534 -> update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMetadata 535 (not strictly necessary for NOT_ENCODED scripts) 536 537* generate normalization data files 538- ~/svn.icu/uni63/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni63/dbg/lib 539- ~/svn.icu/uni63/dbg$ SRC_DATA_IN=~/svn.icu/uni63/src/source/data/in 540- ~/svn.icu/uni63/dbg$ UNIDATA=~/svn.icu/uni63/src/source/data/unidata 541- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 542- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 543- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 544- ~/svn.icu/uni63/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 545 546* build ICU (make install) 547 so that the tools build can pick up the new definitions from the installed header files. 548 549~/svn.icu/uni63/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.txt 550 551* build Unicode tools using CMake+make 552 553~/svn.icutools/trunk/src/unicode/c/icudefs.txt: 554 555# Location (--prefix) of where ICU was installed. 556set(ICU_INST_DIR /home/mscherer/svn.icu/uni63/inst) 557# Location of the ICU source tree. 558set(ICU_SRC_DIR /home/mscherer/svn.icu/uni63/src) 559 560~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c 561~/svn.icutools/trunk/dbg/unicode/c$ make 562 563* generate core properties data files 564- ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops ~/svn.icu/uni63/src 565- ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca -i ~/svn.icu/uni63/dbg/data/out/build/icudt52l ~/svn.icu/uni63/src 566- rebuild ICU (make install) & tools 567- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 568- rebuild ICU (make install) & tools 569 570* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 571 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 572- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 573- Unicode 6.0..6.3: U+2260, U+226E, U+226F 574- nothing new in 6.3, no test file to update 575 576* update Java data files 577- refresh just the UCD-related files, just to be safe 578- see (ICU4C)/source/data/icu4j-readme.txt 579- mkdir /tmp/icu4j 580- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 581 output: 582 ... 583 Unicode .icu files built to ./out/build/icudt52l 584 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt52b 585 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b 586 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 587 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt52l.dat ./out/icu4j/icudt52b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt52l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt52b 588 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt52b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt52b" 589 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt52b/ 590 mkdir -p /tmp/icu4j/main/shared/data 591 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 592 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt52b/ 593 mkdir -p /tmp/icu4j/main/shared/data 594 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 595 make[1]: Leaving directory `/home/mscherer/svn.icu/uni63/dbg/data' 596- copy the big-endian Unicode data files to another location, 597 separate from the other data files 598 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 599 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr 600 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b 601 ~/svn.icu/uni63/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/cnvalias.icu 602 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt52b 603 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 604 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/brkitr 605- refresh ICU4J 606 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b 607 608* refresh Java test .txt files 609- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 610 611* UCA -- mostly skipped for ICU 52 / Unicode 6.3, except update coll/* files 612 613- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 614- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 615- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 616- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 617 (note removing the underscore before "Rules") 618- update (ICU4C)/source/test/testdata/CollationTest_*.txt 619 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 620 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 621- check test file diffs for previously commented-out, known-failing data lines; 622 probably need to keep those commented out 623- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 624- run genuca, see command line above 625- rebuild ICU4C 626- refresh ICU4J collation data: 627 (subset of instructions above for properties data refresh, except copies all coll/*) 628 ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 629 ~/svn.icu/uni63/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 630 ~/svn.icu/uni63/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt52b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt52b/coll 631 ~/svn.icu/uni63/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt52b 632- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 633- note on intltest: if collate/UCAConformanceTest fails, then 634 utility/MultithreadTest/TestCollators will fail as well; 635 fix the conformance test before looking into the multi-thread test 636 637* test ICU, fix test code where necessary 638 639* When refreshing all of ICU4J data from ICU4C 640- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 641- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 642or 643- ~/svn.icu/uni63/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 644 645*** LayoutEngine script information 646- skipped for Unicode 6.3: no new scripts 647 648*** merge the Unicode update branches back onto the trunk 649- do not merge the icudata.jar and testdata.jar, 650 instead rebuild them from merged & tested ICU4C 651 652---------------------------------------------------------------------------- *** 653 654Unicode 6.2 update 655 656http://www.unicode.org/review/pri230/ 657http://www.unicode.org/versions/beta-6.2.0.html 658http://www.unicode.org/reports/tr44/tr44-9.html#Unicode_6.2.0 659http://www.unicode.org/review/pri227/ Changes to Script Extensions Property Values 660http://www.unicode.org/review/pri228/ Changing some common characters from Punctuation to Symbol 661http://www.unicode.org/review/pri229/ Linebreaking Changes for Pictographic Symbols 662http://www.unicode.org/reports/tr46/tr46-8.html IDNA 663http://unicode.org/Public/idna/6.2.0/ 664 665*** ICU Trac 666 667- ticket 9515: Unicode 6.2: final ICU update 668 669- ticket 9514: UCA 6.2: fix UCARules.txt 670 671- ticket 9437: update ICU to Unicode 6.2 672- C++ branches/markus/uni62 at r32050 from trunk at r32041 673- Java branches/markus/uni62 at r32068 from trunk at r32066 674 675*** Unicode version numbers 676- makedata.mak 677- uchar.h 678 (configure.in & configure: have been modified to extract the version from uchar.h) 679- com.ibm.icu.util.VersionInfo 680- com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ 681 682*** data files & enums & parser code 683 684* file preparation 685 686- download UCD, UCA & IDNA files 687- make sure that the Unicode data folder passed into preparseucd.py 688 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) 689- modify preparseucd.py: NamesList.txt is now in UTF-8 690- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni62/20120816 ~/svn.icu/uni62/src ~/svn.icu/tools/trunk/src 691- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 692- Check test file diffs for previously commented-out, known-failing data lines; 693 probably need to keep those commented out. 694 695* PropertyValueAliases.txt changes 696- 1 new Line_Break (lb) value: 697 lb ; RI ; Regional_Indicator 698 -> uchar.h & UCharacter.LineBreak 699- 1 new Word_Break (WB) value: 700 WB ; RI ; Regional_Indicator 701 -> uchar.h & UCharacter.WordBreak 702- 1 new Grapheme_Cluster_Break (GCB) value: 703 GCB; RI ; Regional_Indicator 704 -> uchar.h & UCharacter.GraphemeClusterBreak 705 706* 3 new numeric values 707 The new value -1, which was really supposed to be NaN but that would have required 708 new UnicodeData.txt syntax, can already be represented as a "fraction" of -1/1, 709 but encodeNumericValue() in corepropsbuilder.cpp had to be fixed. 710 cp;12456;na=CUNEIFORM NUMERIC SIGN NIGIDAMIN;nv=-1 711 cp;12457;na=CUNEIFORM NUMERIC SIGN NIGIDAESH;nv=-1 712 The two new values 216000 and 432000 require an addition to the encoding of numeric values. 713 cp;12432;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS DISH;nv=216000 714 cp;12433;na=CUNEIFORM NUMERIC SIGN SHAR2 TIMES GAL PLUS MIN;nv=432000 715 -> uprops.h, uchar.c & UCharacterProperty.java 716 -> cucdtst.c & UCharacterTest.java 717 718* generate normalization data files 719- ~/svn.icu/uni62/dbg$ export LD_LIBRARY_PATH=~/svn.icu/uni62/dbg/lib 720- ~/svn.icu/uni62/dbg$ SRC_DATA_IN=~/svn.icu/uni62/src/source/data/in 721- ~/svn.icu/uni62/dbg$ UNIDATA=~/svn.icu/uni62/src/source/data/unidata 722- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 723- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 724- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 725- ~/svn.icu/uni62/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 726 727* build ICU (make install) 728 so that the tools build can pick up the new definitions from the installed header files. 729* build Unicode tools using CMake+make 730 731* generate core properties data files 732- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/uni62/src 733- in initial bootstrapping, change the UCA version 734 in source/data/unidata/FractionalUCA.txt to match the new Unicode version 735- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/uni62/dbg/data/out/build/icudt50l ~/svn.icu/uni62/src 736- rebuild ICU (make install) & tools 737 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 738 check if the UCA version in FractionalUCA.txt matches the new Unicode version 739 (see step above) 740- run genuca again (see step above) so that it picks up the new case mappings and nfc.nrm 741- rebuild ICU (make install) & tools 742 743* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 744 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 745- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 746- Unicode 6.0..6.2: U+2260, U+226E, U+226F 747- nothing new in 6.2, no test file to update 748 749* update Java data files 750- refresh just the UCD-related files, just to be safe 751- see (ICU4C)/source/data/icu4j-readme.txt 752- mkdir /tmp/icu4j 753- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 754 output: 755 ... 756 Unicode .icu files built to ./out/build/icudt50l 757 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt50b 758 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b 759 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 760 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt50l.dat ./out/icu4j/icudt50b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt50l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt50b 761 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt50b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt50b" 762 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt50b/ 763 mkdir -p /tmp/icu4j/main/shared/data 764 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 765 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt50b/ 766 mkdir -p /tmp/icu4j/main/shared/data 767 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 768 make[1]: Leaving directory `/home/mscherer/svn.icu/uni62/dbg/data' 769- copy the big-endian Unicode data files to another location, 770 separate from the other data files 771 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 772 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 773 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 774 ~/svn.icu/uni62/dbg/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/cnvalias.icu 775 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt50b 776 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 777 ~/svn.icu/uni62/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/brkitr 778- refresh ICU4J 779 ~/svn.icu/uni62/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 780 781* refresh Java test .txt files 782- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 783 784* UCA 785 786- get output from Mark's tools; look in http://www.unicode.org/Public/UCA/<beta version>/ 787- CLDR root files for ICU are in CollationAuxiliary.zip; unpack that 788- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 789- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 790 (note removing the underscore before "Rules") 791- update (ICU4C)/source/test/testdata/CollationTest_*.txt 792 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 793 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 794- check test file diffs for previously commented-out, known-failing data lines; 795 probably need to keep those commented out 796- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 797- run genuca, see command line above 798- rebuild ICU4C 799- refresh ICU4J collation data: 800 (subset of instructions above for properties data refresh, except copies all coll/*) 801 ~/svn.icu/uni62/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 802 ~/svn.icu/uni62/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 803 ~/svn.icu/uni62/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt50b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt50b/coll 804 ~/svn.icu/uni62/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt50b 805- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 806- note on intltest: if collate/UCAConformanceTest fails, then 807 utility/MultithreadTest/TestCollators will fail as well; 808 fix the conformance test before looking into the multi-thread test 809 810* test ICU, fix test code where necessary 811 812* When refreshing all of ICU4J data from ICU4C 813- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 814- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 815or 816- ~/svn.icu/uni62/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 817 818*** LayoutEngine script information 819- skipped for Unicode 6.2: no new scripts 820 821*** merge the Unicode update branches back onto the trunk 822- do not merge the icudata.jar and testdata.jar, 823 instead rebuild them from merged & tested ICU4C 824 825---------------------------------------------------------------------------- *** 826 827Future Unicode update 828 829Tools simplified since the Unicode 6.1 update. See 830- http://site.icu-project.org/design/props/ppucd 831- http://bugs.icu-project.org/trac/wiki/Markus/ReviewTicket8972 832 833* Unicode version numbers 834- icutools/unicode/makedefs.sh was deleted, so one fewer place for version & path updates 835 836* file preparation 837- ucdcopy.py, idna2nrm.py and genpname/preparse.pl replaced by preparseucd.py: 838- ~/svn.icu/tools/trunk/src/unicode$ py/preparseucd.py ~/uni61/20120118 ~/svn.icu/trunk/src ~/svn.icu/tools/trunk/src 839- This writes files (especially ppucd.txt) to the ICU4C unidata and testdata subfolders. 840- Check test file diffs for previously commented-out, known-failing data lines; 841 probably need to keep those commented out. 842 843* PropertyValueAliases.txt changes 844- Script codes that are in ISO 15924 but not in Unicode are now listed in 845 preparseucd.py, in the _scripts_only_in_iso15924 variable. 846 If there are new ISO codes, then add them. 847 If Unicode adds some of them, then remove them from the .py variable. 848 849* UnicodeData.txt changes 850- No more manual changes for CJK ranges for algorithmic names; 851 those are now written to ppucd.txt and genprops reads them from there. 852 853* generate core properties data files (makeprops.sh was deleted) 854- ~/svn.icu/tools/trunk/dbg/unicode$ c/genprops/genprops ~/svn.icu/trunk/src 855 856* no more manual updates of source/data/unidata/norm2/nfkc_cf.txt 857- it is now generated by preparseucd.py 858 859* no more separate idna2nrm.py run and manual copying to generate source/data/unidata/norm2/uts46.txt 860- it is now generated by preparseucd.py 861- make sure that the Unicode data folder passed into preparseucd.py 862 includes a copy of http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 863 (can be in some subfolder) 864 865* generate normalization data files 866- ~/svn.icu/trunk/dbg$ export LD_LIBRARY_PATH=~/svn.icu/trunk/dbg/lib 867- ~/svn.icu/trunk/dbg$ SRC_DATA_IN=~/svn.icu/trunk/src/source/data/in 868- ~/svn.icu/trunk/dbg$ UNIDATA=~/svn.icu/trunk/src/source/data/unidata 869- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt 870- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt 871- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nfkc_cf.txt 872- ~/svn.icu/trunk/dbg$ bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt 873 874* build ICU (make install) 875* build Unicode tools using CMake+make 876 877* new way to call genuca (makeuca.sh was deleted) 878- ~/svn.icu/tools/trunk/dbg/unicode$ c/genuca/genuca -i ~/svn.icu/trunk/dbg/data/out/build/icudt49l ~/svn.icu/trunk/src 879 880---------------------------------------------------------------------------- *** 881 882Unicode 6.1 update 883 884*** ICU Trac 885 886- ticket 8995 final update to Unicode 6.1 887- ticket 8994 regenerate source/layout/CanonData.cpp 888 889- ticket 8961 support Unicode "Age" value *names* 890- ticket 8963 support multiple character name aliases & types 891 892- ticket 8827 "update ICU to Unicode 6.1" 893- C++ branches/markus/uni61 at r30864 from trunk at r30843 894- Java branches/markus/uni61 at r30865 from trunk at r30863 895 896*** Unicode version numbers 897- makedata.mak 898- uchar.h 899 (configure.in & configure: have been modified to extract the version from uchar.h) 900- com.ibm.icu.util.VersionInfo 901- icutools/unicode/makedefs.sh 902 + also review & update other definitions in that file, 903 e.g. the ICU version in this path: BLD_DATA_FILES=$ICU_BLD/data/out/build/icudt49l 904 905*** data files & enums & parser code 906 907* file preparation 908 909~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni61/20111205/ucd ~/uni61/processed 910- This prepares both unidata and testdata files in respective output subfolders. 911- Check test file diffs for previously commented-out, known-failing data lines; 912 probably need to keep those commented out. 913 914* PropertyValueAliases.txt changes 915- 11 new block names: 916 Arabic_Extended_A 917 Arabic_Mathematical_Alphabetic_Symbols 918 Chakma 919 Meetei_Mayek_Extensions 920 Meroitic_Cursive 921 Meroitic_Hieroglyphs 922 Miao 923 Sharada 924 Sora_Sompeng 925 Sundanese_Supplement 926 Takri 927 -> add to uchar.h 928 -> add to UCharacter.UnicodeBlock IDs 929 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) 930 replace public static final int \1_ID = \2; \3 931 -> add to UCharacter.UnicodeBlock objects 932 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 933 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 934- 1 new Joining_Group (jg) value: 935 Rohingya_Yeh 936 -> uchar.h & UCharacter.JoiningGroup 937- 2 new Line_Break (lb) values: 938 CJ=Conditional_Japanese_Starter 939 HL=Hebrew_Letter 940 -> uchar.h & UCharacter.LineBreak 941- 7 new scripts: 942 sc ; Cakm ; Chakma 943 sc ; Merc ; Meroitic_Cursive 944 sc ; Mero ; Meroitic_Hieroglyphs 945 sc ; Plrd ; Miao 946 sc ; Shrd ; Sharada 947 sc ; Sora ; Sora_Sompeng 948 sc ; Takr ; Takri 949 -> remove these from SyntheticPropertyValueAliases.txt 950 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 951 and in com.ibm.icu.dev.test.lang.TestUScript.java 952- 2 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 953 (added 2011-06-21) 954 Khoj 322 Khojki 955 Tirh 326 Tirhuta 956 and another one added 2011-12-09 957 Hluw 080 Anatolian Hieroglyphs (Luwian Hieroglyphs, Hittite Hieroglyphs) 958 -> uscript.h 959 -> com.ibm.icu.lang.UScript 960 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 961 replace public static final int \1 = \2;\3 962 -> SyntheticPropertyValueAliases.txt 963 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 964 and in com.ibm.icu.dev.test.lang.TestUScript.java 965 966* UnicodeData.txt changes 967- the last Unihan code point changes from U+9FCB to U+9FCC 968 search for both 9FCB (end) and 9FCC (limit) (regex 9FC[BC], case-insensitive) 969 + do change gennames.c 970 + do change swapCJK() in ucol.cpp & ImplicitCEGenerator.java 971 972* DerivedBidiClass.txt changes 973- 2 new default-AL blocks: 974# Arabic Extended-A: U+08A0 - U+08FF (was default-R) 975# Arabic Mathematical Alphabetic Symbols: 976# U+1EE00 - U+1EEFF (was default-R) 977- 2 new default-R blocks: 978# Meroitic Hieroglyphs: 979# U+10980 - U+1099F 980# Meroitic Cursive: U+109A0 - U+109FF 981 -> should be picked up by the explicit data in the file 982 983* NameAliases.txt changes 984- from 985 # Each line has two fields 986 # First field: Code point 987 # Second field: Alias 988- to 989 # Each line has three fields, as described here: 990 # 991 # First field: Code point 992 # Second field: Alias 993 # Third field: Type 994- Also, the file previously allowed multiple aliases but only now does it 995 actually provide multiple, even multiple of the same type. For example, 996 FEFF;BYTE ORDER MARK;alternate 997 FEFF;BOM;abbreviation 998 FEFF;ZWNBSP;abbreviation 999- This breaks our gennames parser, unames.icu data structure, and API. 1000 Fix gennames to only pick up "correction" aliases. 1001 New ticket #8963 for further changes. 1002 1003* run genpname/preparse.pl (on Linux) 1004 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 1005 + make sure that data.h is writable 1006 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 1007 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 1008 1009* build ICU (make install) 1010 so that the tools build can pick up the new definitions from the installed header files. 1011* build Unicode tools (at least genpname) using CMake+make 1012 1013* run genpname 1014 (builds both pnames.icu and propname_data.h) 1015- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 1016- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 1017 1018* build ICU (make install) 1019* build Unicode tools using CMake+make 1020 1021* update source/data/unidata/norm2/nfkc_cf.txt 1022- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 1023 1024* update source/data/unidata/norm2/uts46.txt 1025- download http://www.unicode.org/Public/idna/6.1.0/IdnaMappingTable.txt 1026 to ~/svn.icu/tools/trunk/src/unicode/py 1027- adjust idna2nrm.py to remove "; NV8": For UTS #46, we do not care about "not valid in IDNA2008". 1028- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 1029- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 1030 1031* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1032 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1033- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1034- Unicode 6.0..6.1: U+2260, U+226E, U+226F 1035- nothing new in 6.1, no test file to update 1036 1037* generate core properties data files 1038- in initial bootstrapping, change the UCA version 1039 in source/data/unidata/FractionalUCA.txt to match the new Unicode version 1040- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1041- rebuild ICU & tools 1042 + if genrb fails to build coll/root.res with an U_INVALID_FORMAT_ERROR, 1043 check if the UCA version in FractionalUCA.txt matches the new Unicode version 1044 (see step above) 1045- run makeuca.sh so that genuca picks up the new case mappings and nfc.nrm: 1046 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1047- rebuild ICU & tools 1048 1049* update Java data files 1050- refresh just the UCD-related files, just to be safe 1051- see (ICU4C)/source/data/icu4j-readme.txt 1052- mkdir /tmp/icu4j 1053- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1054 output: 1055 ... 1056 Unicode .icu files built to ./out/build/icudt49l 1057 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt49b 1058 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b 1059 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1060 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt49l.dat ./out/icu4j/icudt49b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt49l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt49b 1061 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/zoneinfo64.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt49b/windowsZones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt49b" 1062 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt49b/ 1063 mkdir -p /tmp/icu4j/main/shared/data 1064 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1065 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data/icudt49b/ 1066 mkdir -p /tmp/icu4j/main/shared/data 1067 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data 1068 make[1]: Leaving directory `/home/mscherer/svn.icu/trunk/bld/data' 1069- copy the big-endian Unicode data files to another location, 1070 separate from the other data files 1071 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1072 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 1073 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 1074 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/cnvalias.icu 1075 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt49b 1076 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1077 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/brkitr 1078- refresh ICU4J 1079 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 1080 1081* refresh Java test .txt files 1082- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1083 1084* test ICU so far, fix test code where necessary 1085- temporarily ignore collation issues that look like UCA/UCD mismatches, 1086 until UCA data is updated 1087 1088* UCA 1089 1090- get output from Mark's tools; look in 1091 http://www.unicode.org/Public/UCA/6.1.0/CollationAuxiliary-<dev. version>.txt 1092- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1093- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1094 (note removing the underscore before "Rules") 1095- update (ICU)/source/test/testdata/CollationTest_*.txt 1096 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1097 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) 1098- check test file diffs for previously commented-out, known-failing data lines; 1099 probably need to keep those commented out 1100- check FractionalUCA.txt for manual changes of lead bytes from IMPLICIT to Hani 1101- run makeuca.sh: 1102 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1103- rebuild ICU4C 1104- refresh ICU4J collation data: 1105 (subset of instructions above for properties data refresh, except copies all coll/*) 1106 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1107 ~/svn.icu/trunk/bld$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1108 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt49b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt49b/coll 1109 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt49b 1110- run all tests with the *_SHORT.txt or the full files (the full ones have comments, useful for debugging) 1111- note on intltest: if collate/UCAConformanceTest fails, then 1112 utility/MultithreadTest/TestCollators will fail as well; 1113 fix the conformance test before looking into the multi-thread test 1114 1115* When refreshing all of ICU4J data from ICU4C 1116- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1117- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1118or 1119- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1120 1121*** LayoutEngine script information 1122 1123(For details see the Unicode 5.2 change log below.) 1124 1125* Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. 1126 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp 1127 in the working directory. 1128 (It also generates ScriptRunData.cpp, which is no longer needed.) 1129 1130 The generated files have a current copyright date and "@draft" statement. 1131 1132- diff current <icu>/source/layout files vs. generated ones 1133 ~/svn.icu4j/trunk/src$ kdiff3 ~/svn.icu/trunk/src/source/layout tools/misc/src/com/ibm/icu/dev/tool/layout 1134 review and manually merge desired changes; 1135 fix gratuitous changes, incorrect @draft and missing aliases; 1136 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 1137- if you just copy the above files, then 1138 fix mixed line endings, review the diffs as above and restore changes to API tags etc.; 1139 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1140 1141*** merge the Unicode update branches back onto the trunk 1142- do not merge the icudata.jar and testdata.jar, 1143 instead rebuild them from merged & tested ICU4C 1144 1145---------------------------------------------------------------------------- *** 1146 1147ICU 4.8 (no Unicode update, just new script codes) 1148 1149* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1150 (added 2010-12-21) 1151 Afak 439 Afaka 1152 Jurc 510 Jurchen 1153 Mroo 199 Mro, Mru 1154 Nshu 499 Nüshu 1155 Shrd 319 Sharada, Śāradā 1156 Sora 398 Sora Sompeng 1157 Takr 321 Takri, Ṭākrī, Ṭāṅkrī 1158 Tang 520 Tangut 1159 Wole 480 Woleai 1160 -> uscript.h 1161 -> com.ibm.icu.lang.UScript 1162 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1163 replace public static final int \1 = \2;\3 1164 -> genpname/SyntheticPropertyValueAliases.txt 1165 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1166 and in com.ibm.icu.dev.test.lang.TestUScript.java 1167 1168* run genpname/preparse.pl (on Linux) 1169 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 1170 + make sure that data.h is writable 1171 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 1172 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 1173 1174* rebuild Unicode tools (at least genpname) using make 1175- You might first need to "make install" ICU so that the tools build can pick 1176 up the new definitions from the installed header files. 1177 1178* run genpname 1179 (builds both pnames.icu and propname_data.h) 1180- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 1181- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource 1182- rebuild ICU & tools 1183 1184* run genprops 1185- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 1186- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0 1187- rebuild ICU & tools 1188 1189* update Java data files 1190- refresh just the UCD-related files, just to be safe 1191- see (ICU4C)/source/data/icu4j-readme.txt 1192- mkdir /tmp/icu4j 1193- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1194- copy the big-endian Unicode data files to another location, 1195 separate from the other data files 1196 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 1197 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 1198 ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b 1199- refresh ICU4J 1200 ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b 1201 1202* should have updated the layout engine script codes but forgot 1203 1204---------------------------------------------------------------------------- *** 1205 1206Unicode 6.0 update 1207 1208*** related ICU Trac tickets 1209 12107264 Unicode 6.0 Update 1211 1212*** Unicode version numbers 1213- makedata.mak 1214- uchar.h 1215 (configure.in & configure: have been modified to extract the version from uchar.h) 1216- com.ibm.icu.util.VersionInfo 1217 1218*** data files & enums & parser code 1219 1220* file preparation 1221 1222~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed 1223- This now prepares both unidata and testdata files in respective output subfolders. 1224 1225* PropertyAliases.txt changes 1226- new Script_Extensions property defined in the new ScriptExtensions.txt file 1227 but not listed in PropertyAliases.txt; reported to unicode.org; 1228 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt 1229 scx; Script_Extensions 1230 -> uchar.h with new UProperty section 1231 -> com.ibm.icu.lang.UProperty, parallel with uchar.h 1232 1233* PropertyValueAliases.txt changes 1234- 12 new block names: 1235 Alchemical_Symbols 1236 Bamum_Supplement 1237 Batak 1238 Brahmi 1239 CJK_Unified_Ideographs_Extension_D 1240 Emoticons 1241 Ethiopic_Extended_A 1242 Kana_Supplement 1243 Mandaic 1244 Miscellaneous_Symbols_And_Pictographs 1245 Playing_Cards 1246 Transport_And_Map_Symbols 1247 -> add to uchar.h 1248 -> add to UCharacter.UnicodeBlock 1249 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) 1250 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1251- Joining_Group (jg) values: 1252 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias 1253 -> uchar.h & UCharacter.JoiningGroup 1254- 3 new scripts: 1255 sc ; Batk ; Batak 1256 sc ; Brah ; Brahmi 1257 sc ; Mand ; Mandaic 1258 -> remove these from SyntheticPropertyValueAliases.txt 1259 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN 1260 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() 1261 and in com.ibm.icu.dev.test.lang.TestUScript.java 1262- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html 1263 (added 2009-11-11..2010-07-18) 1264 Bass 259 Bassa Vah 1265 Dupl 755 Duployan shortand 1266 Elba 226 Elbasan 1267 Gran 343 Grantha 1268 Kpel 436 Kpelle 1269 Loma 437 Loma 1270 Mend 438 Mende 1271 Merc 101 Meroitic Cursive 1272 Narb 106 Old North Arabian 1273 Nbat 159 Nabataean 1274 Palm 126 Palmyrene 1275 Sind 318 Sindhi 1276 Wara 262 Warang Citi 1277 -> uscript.h 1278 -> com.ibm.icu.lang.UScript 1279 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) 1280 replace public static final int \1 = \2;\3 1281 -> SyntheticPropertyValueAliases.txt 1282 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() 1283 and in com.ibm.icu.dev.test.lang.TestUScript.java 1284- ISO 15924 name change 1285 Mero 100 Meroitic Hieroglyphs (was Meroitic) 1286 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC 1287- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt 1288 1289* UnicodeData.txt changes 1290- new CJK block: 1291 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; 1292 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; 1293 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion 1294 1295* build Unicode tools using CMake+make 1296 1297* run genpname/preparse.pl (on Linux) 1298 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname 1299 + make sure that data.h is writable 1300 + perl preparse.pl ~/svn.icu/trunk/src > out.txt 1301 + preparse.pl shows no errors, out.txt Info and Warning lines look ok 1302 1303* rebuild Unicode tools (at least genpname) using make 1304- You might first need to "make install" ICU so that the tools build can pick 1305 up the new definitions from the installed header files. 1306 1307* run genpname 1308- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in 1309- rebuild ICU & tools 1310 1311* update source/data/unidata/norm2/nfkc_cf.txt 1312- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt 1313 1314* update source/data/unidata/norm2/uts46.txt 1315- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt 1316 to ~/svn.icu/tools/trunk/src/unicode/py 1317- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values 1318- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py 1319- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 1320 1321* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to 1322 sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) 1323- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters 1324- Unicode 6.0: U+2260, U+226E, U+226F 1325 1326* generate core properties data files 1327- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1328- rebuild ICU & tools 1329- run makeuca.sh so that genuca picks up the new nfc.nrm: 1330 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1331- rebuild ICU & tools 1332 1333* implement new Script_Extensions property (provisional) 1334- parser & generator: genprops & uprops.icu 1335- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp 1336- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java 1337 1338* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 1339- (one-time change) 1340- genbidi/gencase/genprops tools changes 1341- re-run makeprops.sh (see above) 1342- UCharacterProperty.java, UCharacterTypeIterator.java, 1343 UBiDiProps.java, UCaseProps.java, and several others with minor changes; 1344 UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java 1345 1346* update Java data files 1347- refresh just the UCD-related files, just to be safe 1348- see (ICU4C)/source/data/icu4j-readme.txt 1349- mkdir /tmp/icu4j 1350- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1351 output: 1352 ... 1353 Unicode .icu files built to ./out/build/icudt45l 1354 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b 1355 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt 1356 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b 1357 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b 1358 mkdir -p /tmp/icu4j/main/shared/data 1359 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data 1360- copy the big-endian Unicode data files to another location, 1361 separate from the other data files 1362 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1363 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 1364 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 1365 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu 1366 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b 1367 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1368 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr 1369- refresh ICU4J 1370 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 1371 1372* refresh Java test .txt files 1373- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode 1374 1375* un-hardcode normalization skippable (NF*_Inert) test data 1376- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools 1377 1378* copy updated break iterator test files 1379- now handled by early ucdcopy.py and 1380 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata 1381 (old instructions: 1382 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt 1383 to ~/svn.icu/trunk/src/source/test/testdata) 1384- they are not used in ICU4J 1385 1386* UCA 1387 1388- get output from Mark's tools; look in 1389 http://www.unicode.org/~book/incoming/mark/uca6.0.0/ 1390 http://www.macchiato.com/unicode/utc/additional-uca-files 1391 http://www.unicode.org/Public/UCA/6.0.0/ 1392 http://www.unicode.org/~mdavis/uca/ 1393- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt 1394- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt 1395- update Han-implicit ranges for new CJK extensions: 1396 swapCJK() in ucol.cpp & ImplicitCEGenerator.java 1397- genuca: allow bytes 02 for U+FFFE, new merge-sort character; 1398 do not add it into invuca so that tailoring primary-after an ignorable works 1399- genuca: permit space between [variable top] bytes 1400- ucol.cpp: treat noncharacters like unassigned rather than ignorable 1401- run makeuca.sh: 1402 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld 1403- rebuild ICU4C 1404- refresh ICU4J collation data: 1405 (subset of instructions above for properties data refresh, except copies all coll/*) 1406 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1407 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1408 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll 1409 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b 1410- update (ICU)/source/test/testdata/CollationTest_*.txt 1411 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt 1412 with output from Mark's Unicode tools 1413- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 1414- note on intltest: if collate/UCAConformanceTest fails, then 1415 utility/MultithreadTest/TestCollators will fail as well; 1416 fix the conformance test before looking into the multi-thread test 1417 1418* When refreshing all of ICU4J data from ICU4C 1419- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install 1420- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data 1421or 1422- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install 1423 1424*** LayoutEngine script information 1425 1426(For details see the Unicode 5.2 change log below.) 1427 1428* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 1429ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 1430ScriptRunData.cpp, which is no longer needed.) 1431 1432The generated files have a current copyright date and "@draft" statement. 1433 1434* copy the above files into <icu>/source/layout, replacing the old files. 1435* fix mixed line endings 1436* review the diffs and fix incorrect @draft and missing aliases; 1437 Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. 1438* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1439 1440---------------------------------------------------------------------------- *** 1441 1442Unicode 5.2 update 1443 1444*** related ICU Trac tickets 1445 14467084 Unicode 5.2 1447 14487167 verify collation bytes 14497235 Java test NAME_ALIAS 14507236 Java DerivedCoreProperties.txt test 14517237 Java BidiTest.txt 14527238 UTrie2 in core unidata 14537239 test for tailoring gaps 14547240 Java fix CollationMiscTest 14557243 update layout engine for Unicode 5.2 1456 1457*** Unicode version numbers 1458- makedata.mak 1459- uchar.h 1460- configure.in & configure 1461- update ucdVersion in gennames.c if an algorithmic range changes 1462 1463*** data files & enums & parser code 1464 1465* file preparation 1466 1467python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata 1468- includes finding files regardless of version numbers, 1469 copying them, and performing the equivalent processing of the 1470 ucdstrip and ucdmerge tools on the desired set of files 1471 1472* notes on changes 1473- PropertyAliases.txt 1474 moved from numeric to enumerated: 1475 ccc ; Canonical_Combining_Class 1476 new string properties: 1477 NFKC_CF ; NFKC_Casefold 1478 Name_Alias; Name_Alias 1479 new binary properties: 1480 Cased ; Cased 1481 CI ; Case_Ignorable 1482 CWCF ; Changes_When_Casefolded 1483 CWCM ; Changes_When_Casemapped 1484 CWKCF ; Changes_When_NFKC_Casefolded 1485 CWL ; Changes_When_Lowercased 1486 CWT ; Changes_When_Titlecased 1487 CWU ; Changes_When_Uppercased 1488 new CJK Unihan properties (not supported by ICU) 1489- PropertyValueAliases.txt 1490 new block names 1491 new scripts 1492 one script code change: 1493 sc ; Qaai ; Inherited 1494 -> 1495 sc ; Zinh ; Inherited ; Qaai 1496 new Line_Break (lb) value: 1497 lb ; CP ; Close_Parenthesis 1498 new Joining_Group (jg) values: Farsi_Yeh, Nya 1499 other new values: 1500 ccc; 214; ATA ; Attached_Above 1501- DerivedBidiClass.txt 1502 new default-R range: U+1E800 - U+1EFFF 1503- UnicodeData.txt 1504 all of the ISO comments are gone 1505 new CJK block end: 1506 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> 1507 new CJK block: 1508 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; 1509 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; 1510 1511* genpname 1512- run preparse.pl 1513 + cd \svn\icuproj\icu\trunk\source\tools\genpname 1514 + make sure that data.h is writable 1515 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt 1516 + preparse.pl complains with errors like the following: 1517 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. 1518 This is because ICU 4.0 had scripts from ISO 15924 which are now 1519 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt 1520 and PropertyValueAliases.txt. 1521 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 1522 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt 1523 + preparse.pl complains with errors about block names missing from uchar.h; add them 1524 1525* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1526- new block & script values 1527 + 26 new blocks 1528 copy new blocks from Blocks.txt 1529 MS VC++ 2008 regular expression: 1530 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" 1531 replace with " UBLOCK_\3 = 172, /*[\1]*/" 1532 + several new script values already added in ICU 4.0 for ISO 15924 coverage 1533 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) 1534 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage 1535 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) 1536 (added to SyntheticPropertyValueAliases.txt) 1537- new Joining Group (JG) values: Farsi_Yeh, Nya 1538- new Line_Break (lb) value: 1539 lb ; CP ; Close_Parenthesis 1540 1541* hardcoded Unihan range end/limit 1542- Unihan range end moves from 9FC3 to 9FCB 1543 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) 1544 + do change gennames.c 1545 1546* Compare definitions of new binary properties with what we used to use 1547 in algorithms, to see if the definitions changed. 1548- Verified that definitions for Cased and Case_Ignorable are unchanged. 1549 The gencase tool now parses the newly public Case_Ignorable values 1550 in case the definition changes in the future. 1551 1552* uchar.c & uprops.h & uprops.c & genprops 1553- new numeric values that didn't exist in Unicode data before: 1554 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 1555 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, 1556 therefore redesign the encoding of numeric types and values for formatVersion 6; 1557 design for simple numbers up to at least 144 ("one gross"), 1558 large values up to at least 10^20, 1559 and fractions with numerators -1..17 and denominators 1..16 1560 to cover current and expected future values 1561 (e.g., more Han numeric values, Meroitic twelfths) 1562 1563* reimplement Hangul_Syllable_Type for new Jamo characters 1564- the old code assumed that all Jamo characters are in the 11xx block 1565- Unicode 5.2 fills holes there and adds new Jamo characters in 1566 A960..A97F; Hangul Jamo Extended-A 1567 and in 1568 D7B0..D7FF; Hangul Jamo Extended-B 1569- Hangul_Syllable_Type can be trivially derived from a subset of 1570 Grapheme_Cluster_Break values 1571 1572* build Unicode data source code for hardcoding core data 1573C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data 1574 1575ICU data make path is \svn\icuproj\icu\trunk\source\data\ 1576ICU root path is \svn\icuproj\icu\trunk 1577Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1578Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 1579Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 1580Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 1581Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 1582Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 1583Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 1584Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. 1585Creating data file for Unicode Property Names 1586Creating data file for Unicode Character Properties 1587Creating data file for Unicode Case Mapping Properties 1588Creating data file for Unicode BiDi/Shaping Properties 1589Creating data file for Unicode Normalization 1590Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" 1591Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" 1592 1593- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common 1594 and rebuild the common library 1595 1596*** UCA 1597 1598- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) 1599- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools 1600- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools 1601[ Begin obsolete instructions: 1602 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. 1603 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py 1604 on Windows: 1605 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt 1606 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt 1607 End obsolete instructions] 1608- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 1609 not just the *_STUB.txt files 1610- note on intltest: if collate/UCAConformanceTest fails, then 1611 utility/MultithreadTest/TestCollators will fail as well; 1612 fix the conformance test before looking into the multi-thread test 1613 1614*** Implement Cased & Case_Ignorable properties 1615- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() 1616- Problem: These properties should be disjoint, but aren't 1617- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not 1618- change ucase.icu to be able to store any combination of Cased and Case_Ignorable 1619 1620*** Implement Changes_When_Xyz properties 1621- without stored data 1622 1623*** Implement Name_Alias property 1624- add it as another name field in unames.icu 1625- make it available via u_charName() and UCharNameChoice and 1626- consider it in u_charFromName() 1627 1628*** Break iterators 1629 1630* Update break iterator rules to new UAX versions and new property values 1631* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary 1632 1633*** new BidiTest file 1634- review format and data 1635- copy BidiTest.txt to source/test/testdata 1636- write test code using this data 1637- fix ICU code where it fails the conformance test 1638 1639*** Java 1640- generally, find and update code corresponding to C/C++ 1641- UCharacter.UnicodeBlock constants: 1642 a) add an _ID integer per new block, update COUNT 1643 b) add a class instance per new block 1644 Visual Studio regex: 1645 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} 1646 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 1647- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() 1648 1649- port test changes to Java 1650 1651*** LayoutEngine script information 1652 1653(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) 1654 1655* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 1656ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 1657ScriptRunData.cpp, which is no longer needed.) 1658 1659The generated files have a current copyright date and "@draft" statement. 1660 1661-> Eric Mader wrote in email on 20090930: 1662 "I think the tool has been modified to update @draft to @stable for 1663 older scripts and to add @draft for new scripts. 1664 (I worked with an intern on this last year.) 1665 You should check the output after you run it." 1666 1667* copy the above files into <icu>/source/layout, replacing the old files. 1668* fix mixed line endings 1669* review the diffs and fix incorrect @draft and missing aliases 1670* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 1671 1672Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1673and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1674 1675-> Eric Mader wrote in email on 20090930: 1676 "This is just a matter of making sure that all the per-script tables have 1677 entries for any new scripts that were added. 1678 If any new Indic characters were added, then the class tables in 1679 IndicClassTables.cpp should be updated to reflect this. 1680 John Emmons should know how to do this if it's required." 1681 1682* rebuild the layout and layoutex libraries. 1683 1684*** Documentation 1685- Update User Guide 1686 + Jamo_Short_Name, sfc->scf, binary property value aliases 1687 1688---------------------------------------------------------------------------- *** 1689 1690Unicode 5.1 update 1691 1692*** related ICU Trac tickets 1693 16945696 Update to Unicode 5.1 1695 1696*** Unicode version numbers 1697- makedata.mak 1698- uchar.h 1699- configure.in & configure 1700- update ucdVersion in gennames.c if an algorithmic range changes 1701 1702*** data files & enums & parser code 1703 1704* file preparation 1705- ucdstrip: 1706 DerivedCoreProperties.txt 1707 DerivedNormalizationProps.txt 1708 NormalizationTest.txt 1709 PropList.txt 1710 Scripts.txt 1711 GraphemeBreakProperty.txt 1712 SentenceBreakProperty.txt 1713 WordBreakProperty.txt 1714- ucdstrip and ucdmerge: 1715 EastAsianWidth.txt 1716 LineBreak.txt 1717 1718* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 1719copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ 1720copy 5.1.0\ucd\Blocks.txt ..\unidata\ 1721copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ 1722copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ 1723copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 1724copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 1725copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 1726copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 1727copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ 1728copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ 1729copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ 1730copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ 1731copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ 1732 1733ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 1734ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 1735ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 1736ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt 1737ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 1738ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 1739ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 1740ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 1741ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 1742ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 1743 1744* genpname 1745- run preparse.pl 1746 + cd \svn\icuproj\icu\uni51\source\tools\genpname 1747 + make sure that data.h is writable 1748 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt 1749 + preparse.pl complains with errors like the following: 1750 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. 1751 This is because ICU 3.8 had scripts from ISO 15924 which are now 1752 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt 1753 and PropertyValueAliases.txt. 1754 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 1755 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii 1756 + PropertyValueAliases.txt now explicitly contains values for boolean properties: 1757 N/Y, No/Yes, F/T, False/True 1758 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. 1759 It will use further values from the file if present. 1760 1761* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1762- new block & script values 1763 + 17 new blocks 1764 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage 1765 (removed from SyntheticPropertyValueAliases.txt) 1766 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) 1767 (added to SyntheticPropertyValueAliases.txt) 1768- uprops.icu (uprops.h) only provides 7 bits for script codes. 1769 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. 1770 There is none above 127 yet which is the script code for an 1771 assigned Unicode character, so ICU 4.0 uprops.icu does not store any 1772 script code values greater than 127. 1773 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 1774 in a parallel bit field, and that overflows now. 1775 Also, future values >=128 would be incompatible anyway. 1776 uprops.h is modified to move around several of the bit fields 1777 in the properties vector words, and now uses 8 bits for the script code. 1778 Two other bit fields also grow to accommodate future growth: 1779 Block (current count: 172) grows from 8 to 9 bits, 1780 and Word_Break grows from 4 to 5 bits. 1781- renamed property Simple_Case_Folding (sfc->scf) 1782 + nothing to be done: handled as normal alias 1783- new property JSN Jamo_Short_Name 1784 + no new API: only contributes to the Name property 1785- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark 1786- new Joining Group (JG) value: Burushashki_Yeh_Barree 1787- new Sentence_Break (SB) values: 1788 SB ; CR ; CR 1789 SB ; EX ; Extend 1790 SB ; LF ; LF 1791 SB ; SC ; SContinue 1792- new Word_Break (WB) values: 1793 WB ; CR ; CR 1794 WB ; Extend ; Extend 1795 WB ; LF ; LF 1796 WB ; MB ; MidNumLet 1797 1798* Further changes in the 2008-02-29 update: 1799- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP 1800 because they should not normally be invisible. 1801- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) 1802- new Grapheme_Cluster_Break (GCB) value: PP=Prepend 1803- new Word_Break (WB) value: NL=Newline 1804 1805* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) 1806- Unihan range end moves from 9FBB to 9FC3 1807 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) 1808 + do change gennames.c 1809 1810* build Unicode data source code for hardcoding core data 1811C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data 1812 1813ICU data make path is \svn\icuproj\icu\uni51\source\data\ 1814ICU root path is \svn\icuproj\icu\uni51 1815Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1816Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 1817Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 1818Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 1819Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 1820Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 1821Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 1822Creating data file for Unicode Character Properties 1823Creating data file for Unicode Case Mapping Properties 1824Creating data file for Unicode BiDi/Shaping Properties 1825Creating data file for Unicode Normalization 1826Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" 1827Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" 1828 1829- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common 1830 and rebuild the common library 1831 1832*** Break iterators 1833 1834* Update break iterator rules to new UAX versions and new property values 1835 1836*** UCA 1837 1838* update FractionalUCA.txt and UCARules.txt with new canonical closure 1839 1840*** Test suites 1841- Test that APIs using Unicode property value aliases (like UnicodeSet) 1842 support all of the boolean values N/Y, No/Yes, F/T, False/True 1843 -> TestBinaryValues() tests in both cintltst and intltest 1844 1845*** LayoutEngine script information 1846* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 1847ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 1848ScriptRunData.cpp, which is no longer needed.) 1849 1850The generated files have a current copyright date and "@draft" statement. 1851 1852* copy the above files into <icu>/source/layout, replacing the old files. 1853 1854Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1855and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1856 1857* rebuild the layout and layoutex libraries. 1858 1859*** Documentation 1860- Update User Guide 1861 + Jamo_Short_Name, sfc->scf, binary property value aliases 1862 1863---------------------------------------------------------------------------- *** 1864 1865Unicode 5.0 update 1866 1867*** related Jitterbugs 1868 18695084 RFE: Update to Unicode 5.0 1870 1871*** data files & enums & parser code 1872 1873* file preparation 1874- ucdstrip: 1875 DerivedCoreProperties.txt 1876 DerivedNormalizationProps.txt 1877 NormalizationTest.txt 1878 PropList.txt 1879 Scripts.txt 1880 GraphemeBreakProperty.txt 1881 SentenceBreakProperty.txt 1882 WordBreakProperty.txt 1883- ucdstrip and ucdmerge: 1884 EastAsianWidth.txt 1885 LineBreak.txt 1886 1887* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 1888copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ 1889copy 5.0.0\ucd\Blocks.txt ..\unidata\ 1890copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ 1891copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ 1892copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 1893copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 1894copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 1895copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 1896copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ 1897copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ 1898copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ 1899copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ 1900copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ 1901 1902ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 1903ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 1904ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 1905ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt 1906ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 1907ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 1908ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 1909ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 1910ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 1911ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 1912 1913* update FractionalUCA.txt and UCARules.txt with new canonical closure 1914 1915* genpname 1916- run preparse.pl 1917 + make sure that data.h is writable 1918 + perl preparse.pl \cvs\oss\icu > out.txt 1919 1920* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1921- new block & script values 1922 + script values already added in ICU 3.6 because all of ISO 15924 is now covered 1923 1924* build Unicode data source code for hardcoding core data 1925C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data 1926 1927ICU data make path is \cvs\oss\icu\source\data\ 1928ICU root path is \cvs\oss\icu 1929Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 1930[etc.] 1931Creating data file for Unicode Character Properties 1932Creating data file for Unicode Case Mapping Properties 1933Creating data file for Unicode BiDi/Shaping Properties 1934Creating data file for Unicode Normalization 1935Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" 1936Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" 1937 1938- copy the .c source files to C:\cvs\oss\icu\source\common 1939 and rebuild the common library 1940 1941*** Unicode version numbers 1942- makedata.mak 1943- uchar.h 1944- configure.in 1945 1946*** LayoutEngine script information 1947* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 1948ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 1949ScriptRunData.cpp, which is no longer needed.) 1950 1951The generated files have a current copyright date and "@draft" statement. 1952 1953* copy the above files into <icu>/source/layout, replacing the old files. 1954 1955Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 1956and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 1957 1958* rebuild the layout and layoutex libraries. 1959 1960---------------------------------------------------------------------------- *** 1961 1962Unicode 4.1 update 1963 1964*** related Jitterbugs 1965 19664332 RFE: Update to Unicode 4.1 19674157 RBBI, TR29 4.1 updates 1968 1969*** data files & enums & parser code 1970 1971* file preparation 1972- ucdstrip: 1973 DerivedCoreProperties.txt 1974 DerivedNormalizationProps.txt 1975 NormalizationTest.txt 1976 GraphemeBreakProperty.txt 1977 SentenceBreakProperty.txt 1978 WordBreakProperty.txt 1979- ucdstrip and ucdmerge: 1980 EastAsianWidth.txt 1981 LineBreak.txt 1982 1983* add new files to the repository 1984 GraphemeBreakProperty.txt 1985 SentenceBreakProperty.txt 1986 WordBreakProperty.txt 1987 1988* update FractionalUCA.txt and UCARules.txt with new canonical closure 1989 1990* genpname 1991- handle new enumerated properties in sub read_uchar 1992- run preparse.pl 1993 1994* uchar.h & uscript.h & uprops.h & uprops.c & genprops 1995- new binary properties 1996 + Pattern_Syntax 1997 + Pattern_White_Space 1998- new enumerated properties 1999 + Grapheme_Cluster_Break 2000 + Sentence_Break 2001 + Word_Break 2002- new block & script & line break values 2003 2004* gencase 2005- case-ignorable changes 2006 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 2007 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk 2008 2009*** Unicode version numbers 2010- makedata.mak 2011- uchar.h 2012- configure.in 2013 2014*** tests 2015- verify that u_charMirror() round-trips 2016- test all new properties and some new values of old properties 2017 2018*** other code 2019 2020* hardcoded Unihan range end/limit 2021- Unihan range end moves from 9FA5 to 9FBB 2022 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) 2023 + do not modify BOCU/BOCSU code because that would change the encoding 2024 and break binary compatibility! 2025 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), 2026 NamePrepProfile.txt 2027 + ignore trietest.c: test data is arbitrary 2028 + ignore tstnorm.cpp: test optimization, not important 2029 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF 2030 + do change line_th.txt and word_th.txt 2031 by replacing hardcoded ranges with the new property values 2032 + do change gennames.c 2033 2034source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 2035source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 2036source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, 2037 2038* case mappings 2039- compare new special casing context conditions with previous ones 2040 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 2041 2042* genpname 2043- consider storing only the short name if it is the same as the long name 2044 2045*** other reviews 2046- UAX #29 changes (grapheme/word/sentence breaks) 2047- UAX #14 changes (line breaks) 2048- Pattern_Syntax & Pattern_White_Space 2049 2050---------------------------------------------------------------------------- *** 2051 2052Unicode 4.0.1 update 2053 2054*** related Jitterbugs 2055 20563170 RFE: Update to Unicode 4.0.1 20573171 Add new Unicode 4.0.1 properties 20583520 use Unicode 4.0.1 updates for break iteration 2059 2060*** data files & enums & parser code 2061 2062* file preparation 2063- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt 2064- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt 2065 2066* file fixes 2067- fix UnicodeData.txt general categories of Ethiopic digits Nd->No 2068 according to PRI #26 2069 http://www.unicode.org/review/resolved-pri.html#pri26 2070- undone again because no corrigendum in sight; 2071 instead modified tests to not check consistency on this for Unicode 4.0.1 2072 2073* ucdterms.txt 2074- update from http://www.unicode.org/copyright.html 2075 formatted for plain text 2076 2077* uchar.h & uprops.h & uprops.c & genprops 2078- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed 2079- add U_LB_INSEPARABLE due to a spelling fix 2080 + put short name comment only on line with new constant 2081 for genpname perl script parser 2082- new binary properties 2083 + STerm 2084 + Variation_Selector 2085 2086* genpname 2087- fix genpname perl script so that it doesn't choke on more than 2 names per property value 2088- perl script: correctly calculate the maximum number of fields per row 2089 2090* uscript.h 2091- new script code Hrkt=Katakana_Or_Hiragana 2092 2093* gennorm.c track changes in DerivedNormalizationProps.txt 2094- "FNC" -> "FC_NFKC" 2095- single field "NFD_NO" -> two fields "NFD_QC; N" etc. 2096 2097* genprops/props2.c track changes in DerivedNumericValues.txt 2098- changed from 3 columns to 2, dropping the numeric type 2099 + assume that the type is always numeric for Han characters, 2100 and that only those are added in addition to what UnicodeData.txt lists 2101 2102*** Unicode version numbers 2103- makedata.mak 2104- uchar.h 2105- configure.in 2106 2107*** tests 2108- update test of default bidi classes according to PRI #28 2109 /tsutil/cucdtst/TestUnicodeData 2110 http://www.unicode.org/review/resolved-pri.html#pri28 2111- bidi tests: change exemplar character for ES depending on Unicode version 2112- change hardcoded expected property values where they change 2113 2114*** other code 2115 2116* name matching 2117- read UCD.html 2118 2119* scripts 2120- use new Hrkt=Katakana_Or_Hiragana 2121 2122* ZWJ & ZWNJ 2123- are now part of combining character sequences 2124- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ 2125