1* Copyright (C) 2004-2010, International Business Machines 2* Corporation and others. All Rights Reserved. 3* 4* file name: changes.txt 5* encoding: US-ASCII 6* tab size: 8 (not used) 7* indentation:4 8* 9* created on: 2004may06 10* created by: Markus W. Scherer 11* 12* change log for Unicode updates 13 14---------------------------------------------------------------------------- *** 15 16Unicode 5.2 update 17 18*** related ICU Trac tickets 19 207084 Unicode 5.2 21 227167 verify collation bytes 237235 Java test NAME_ALIAS 247236 Java DerivedCoreProperties.txt test 257237 Java BidiTest.txt 267238 UTrie2 in core unidata 277239 test for tailoring gaps 287240 Java fix CollationMiscTest 297243 update layout engine for Unicode 5.2 30 31*** Unicode version numbers 32- makedata.mak 33- uchar.h 34- configure.in & configure 35- update ucdVersion in gennames.c if an algorithmic range changes 36 37*** data files & enums & parser code 38 39* file preparation 40 41python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata 42- includes finding files regardless of version numbers, 43 copying them, and performing the equivalent processing of the 44 ucdstrip and ucdmerge tools on the desired set of files 45 46* notes on changes 47- PropertyAliases.txt 48 moved from numeric to enumerated: 49 ccc ; Canonical_Combining_Class 50 new string properties: 51 NFKC_CF ; NFKC_Casefold 52 Name_Alias; Name_Alias 53 new binary properties: 54 Cased ; Cased 55 CI ; Case_Ignorable 56 CWCF ; Changes_When_Casefolded 57 CWCM ; Changes_When_Casemapped 58 CWKCF ; Changes_When_NFKC_Casefolded 59 CWL ; Changes_When_Lowercased 60 CWT ; Changes_When_Titlecased 61 CWU ; Changes_When_Uppercased 62 new CJK Unihan properties (not supported by ICU) 63- PropertyValueAliases.txt 64 new block names 65 new scripts 66 one script code change: 67 sc ; Qaai ; Inherited 68 -> 69 sc ; Zinh ; Inherited ; Qaai 70 new Line_Break (lb) value: 71 lb ; CP ; Close_Parenthesis 72 new Joining_Group (jg) values: Farsi_Yeh, Nya 73 other new values: 74 ccc; 214; ATA ; Attached_Above 75- DerivedBidiClass.txt 76 new default-R range: U+1E800 - U+1EFFF 77- UnicodeData.txt 78 all of the ISO comments are gone 79 new CJK block end: 80 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> 81 new CJK block: 82 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; 83 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; 84 85* genpname 86- run preparse.pl 87 + cd \svn\icuproj\icu\trunk\source\tools\genpname 88 + make sure that data.h is writable 89 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt 90 + preparse.pl complains with errors like the following: 91 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. 92 This is because ICU 4.0 had scripts from ISO 15924 which are now 93 added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt 94 and PropertyValueAliases.txt. 95 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 96 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt 97 + preparse.pl complains with errors about block names missing from uchar.h; add them 98 99* uchar.h & uscript.h & uprops.h & uprops.c & genprops 100- new block & script values 101 + 26 new blocks 102 copy new blocks from Blocks.txt 103 MS VC++ 2008 regular expression: 104 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" 105 replace with " UBLOCK_\3 = 172, /*[\1]*/" 106 + several new script values already added in ICU 4.0 for ISO 15924 coverage 107 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) 108 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage 109 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) 110 (added to SyntheticPropertyValueAliases.txt) 111- new Joining Group (JG) values: Farsi_Yeh, Nya 112- new Line_Break (lb) value: 113 lb ; CP ; Close_Parenthesis 114 115* hardcoded Unihan range end/limit 116- Unihan range end moves from 9FC3 to 9FCB 117 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) 118 + do change gennames.c 119 120* Compare definitions of new binary properties with what we used to use 121 in algorithms, to see if the definitions changed. 122- Verified that definitions for Cased and Case_Ignorable are unchanged. 123 The gencase tool now parses the newly public Case_Ignorable values 124 in case the definition changes in the future. 125 126* uchar.c & uprops.h & uprops.c & genprops 127- new numeric values that didn't exist in Unicode data before: 128 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 129 the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, 130 therefore redesign the encoding of numeric types and values for formatVersion 6; 131 design for simple numbers up to at least 144 ("one gross"), 132 large values up to at least 10^20, 133 and fractions with numerators -1..17 and denominators 1..16 134 to cover current and expected future values 135 (e.g., more Han numeric values, Meroitic twelfths) 136 137* reimplement Hangul_Syllable_Type for new Jamo characters 138- the old code assumed that all Jamo characters are in the 11xx block 139- Unicode 5.2 fills holes there and adds new Jamo characters in 140 A960..A97F; Hangul Jamo Extended-A 141 and in 142 D7B0..D7FF; Hangul Jamo Extended-B 143- Hangul_Syllable_Type can be trivially derived from a subset of 144 Grapheme_Cluster_Break values 145 146* build Unicode data source code for hardcoding core data 147C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data 148 149ICU data make path is \svn\icuproj\icu\trunk\source\data\ 150ICU root path is \svn\icuproj\icu\trunk 151Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 152Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 153Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 154Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 155Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 156Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 157Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 158Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. 159Creating data file for Unicode Property Names 160Creating data file for Unicode Character Properties 161Creating data file for Unicode Case Mapping Properties 162Creating data file for Unicode BiDi/Shaping Properties 163Creating data file for Unicode Normalization 164Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" 165Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" 166 167- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common 168 and rebuild the common library 169 170*** UCA 171 172- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) 173- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools 174- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools 175[ Begin obsolete instructions: 176 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. 177 - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py 178 on Windows: 179 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt 180 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt 181 End obsolete instructions] 182- run all tests with the *_SHORT.txt or the full files (the full ones have comments) 183 not just the *_STUB.txt files 184- note on intltest: if collate/UCAConformanceTest fails, then 185 utility/MultithreadTest/TestCollators will fail as well; 186 fix the conformance test before looking into the multi-thread test 187 188*** Implement Cased & Case_Ignorable properties 189- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() 190- Problem: These properties should be disjoint, but aren't 191- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not 192- change ucase.icu to be able to store any combination of Cased and Case_Ignorable 193 194*** Implement Changes_When_Xyz properties 195- without stored data 196 197*** Implement Name_Alias property 198- add it as another name field in unames.icu 199- make it available via u_charName() and UCharNameChoice and 200- consider it in u_charFromName() 201 202*** Break iterators 203 204* Update break iterator rules to new UAX versions and new property values 205* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary 206 207*** new BidiTest file 208- review format and data 209- copy BidiTest.txt to source/test/testdata 210- write test code using this data 211- fix ICU code where it fails the conformance test 212 213*** Java 214- generally, find and update code corresponding to C/C++ 215- UCharacter.UnicodeBlock constants: 216 a) add an _ID integer per new block, update COUNT 217 b) add a class instance per new block 218 Visual Studio regex: 219 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} 220 replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 221- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() 222 223- port test changes to Java 224 225*** LayoutEngine script information 226 227(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) 228 229* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, 230ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates 231ScriptRunData.cpp, which is no longer needed.) 232 233The generated files have a current copyright date and "@draft" statement. 234 235-> Eric Mader wrote in email on 20090930: 236 "I think the tool has been modified to update @draft to @stable for 237 older scripts and to add @draft for new scripts. 238 (I worked with an intern on this last year.) 239 You should check the output after you run it." 240 241* copy the above files into <icu>/source/layout, replacing the old files. 242* fix mixed line endings 243* review the diffs and fix incorrect @draft and missing aliases 244* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h 245 246Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 247and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 248 249-> Eric Mader wrote in email on 20090930: 250 "This is just a matter of making sure that all the per-script tables have 251 entries for any new scripts that were added. 252 If any new Indic characters were added, then the class tables in 253 IndicClassTables.cpp should be updated to reflect this. 254 John Emmons should know how to do this if it's required." 255 256* rebuild the layout and layoutex libraries. 257 258*** Documentation 259- Update User Guide 260 + Jamo_Short_Name, sfc->scf, binary property value aliases 261 262---------------------------------------------------------------------------- *** 263 264Unicode 5.1 update 265 266*** related ICU Trac tickets 267 2685696 Update to Unicode 5.1 269 270*** Unicode version numbers 271- makedata.mak 272- uchar.h 273- configure.in & configure 274- update ucdVersion in gennames.c if an algorithmic range changes 275 276*** data files & enums & parser code 277 278* file preparation 279- ucdstrip: 280 DerivedCoreProperties.txt 281 DerivedNormalizationProps.txt 282 NormalizationTest.txt 283 PropList.txt 284 Scripts.txt 285 GraphemeBreakProperty.txt 286 SentenceBreakProperty.txt 287 WordBreakProperty.txt 288- ucdstrip and ucdmerge: 289 EastAsianWidth.txt 290 LineBreak.txt 291 292* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 293copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ 294copy 5.1.0\ucd\Blocks.txt ..\unidata\ 295copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ 296copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ 297copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 298copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 299copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 300copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 301copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ 302copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ 303copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ 304copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ 305copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ 306 307ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 308ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 309ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 310ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt 311ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 312ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 313ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 314ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 315ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 316ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 317 318* genpname 319- run preparse.pl 320 + cd \svn\icuproj\icu\uni51\source\tools\genpname 321 + make sure that data.h is writable 322 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt 323 + preparse.pl complains with errors like the following: 324 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. 325 This is because ICU 3.8 had scripts from ISO 15924 which are now 326 added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt 327 and PropertyValueAliases.txt. 328 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: 329 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii 330 + PropertyValueAliases.txt now explicitly contains values for boolean properties: 331 N/Y, No/Yes, F/T, False/True 332 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. 333 It will use further values from the file if present. 334 335* uchar.h & uscript.h & uprops.h & uprops.c & genprops 336- new block & script values 337 + 17 new blocks 338 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage 339 (removed from SyntheticPropertyValueAliases.txt) 340 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) 341 (added to SyntheticPropertyValueAliases.txt) 342- uprops.icu (uprops.h) only provides 7 bits for script codes. 343 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. 344 There is none above 127 yet which is the script code for an 345 assigned Unicode character, so ICU 4.0 uprops.icu does not store any 346 script code values greater than 127. 347 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 348 in a parallel bit field, and that overflows now. 349 Also, future values >=128 would be incompatible anyway. 350 uprops.h is modified to move around several of the bit fields 351 in the properties vector words, and now uses 8 bits for the script code. 352 Two other bit fields also grow to accommodate future growth: 353 Block (current count: 172) grows from 8 to 9 bits, 354 and Word_Break grows from 4 to 5 bits. 355- renamed property Simple_Case_Folding (sfc->scf) 356 + nothing to be done: handled as normal alias 357- new property JSN Jamo_Short_Name 358 + no new API: only contributes to the Name property 359- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark 360- new Joining Group (JG) value: Burushashki_Yeh_Barree 361- new Sentence_Break (SB) values: 362 SB ; CR ; CR 363 SB ; EX ; Extend 364 SB ; LF ; LF 365 SB ; SC ; SContinue 366- new Word_Break (WB) values: 367 WB ; CR ; CR 368 WB ; Extend ; Extend 369 WB ; LF ; LF 370 WB ; MB ; MidNumLet 371 372* Further changes in the 2008-02-29 update: 373- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP 374 because they should not normally be invisible. 375- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) 376- new Grapheme_Cluster_Break (GCB) value: PP=Prepend 377- new Word_Break (WB) value: NL=Newline 378 379* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) 380- Unihan range end moves from 9FBB to 9FC3 381 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) 382 + do change gennames.c 383 384* build Unicode data source code for hardcoding core data 385C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data 386 387ICU data make path is \svn\icuproj\icu\uni51\source\data\ 388ICU root path is \svn\icuproj\icu\uni51 389Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 390Information: cannot find "brklocal.mk". Not building user-additional break iterator files. 391Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. 392Information: cannot find "collocal.mk". Not building user-additional resource bundle files. 393Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. 394Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. 395Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. 396Creating data file for Unicode Character Properties 397Creating data file for Unicode Case Mapping Properties 398Creating data file for Unicode BiDi/Shaping Properties 399Creating data file for Unicode Normalization 400Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" 401Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" 402 403- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common 404 and rebuild the common library 405 406*** Break iterators 407 408* Update break iterator rules to new UAX versions and new property values 409 410*** UCA 411 412* update FractionalUCA.txt and UCARules.txt with new canonical closure 413 414*** Test suites 415- Test that APIs using Unicode property value aliases (like UnicodeSet) 416 support all of the boolean values N/Y, No/Yes, F/T, False/True 417 -> TestBinaryValues() tests in both cintltst and intltest 418 419*** LayoutEngine script information 420* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 421ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 422ScriptRunData.cpp, which is no longer needed.) 423 424The generated files have a current copyright date and "@draft" statement. 425 426* copy the above files into <icu>/source/layout, replacing the old files. 427 428Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 429and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 430 431* rebuild the layout and layoutex libraries. 432 433*** Documentation 434- Update User Guide 435 + Jamo_Short_Name, sfc->scf, binary property value aliases 436 437---------------------------------------------------------------------------- *** 438 439Unicode 5.0 update 440 441*** related Jitterbugs 442 4435084 RFE: Update to Unicode 5.0 444 445*** data files & enums & parser code 446 447* file preparation 448- ucdstrip: 449 DerivedCoreProperties.txt 450 DerivedNormalizationProps.txt 451 NormalizationTest.txt 452 PropList.txt 453 Scripts.txt 454 GraphemeBreakProperty.txt 455 SentenceBreakProperty.txt 456 WordBreakProperty.txt 457- ucdstrip and ucdmerge: 458 EastAsianWidth.txt 459 LineBreak.txt 460 461* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) 462copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ 463copy 5.0.0\ucd\Blocks.txt ..\unidata\ 464copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ 465copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ 466copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ 467copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ 468copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ 469copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ 470copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ 471copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ 472copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ 473copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ 474copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ 475 476ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt 477ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt 478ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt 479ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt 480ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt 481ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt 482ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt 483ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt 484ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt 485ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt 486 487* update FractionalUCA.txt and UCARules.txt with new canonical closure 488 489* genpname 490- run preparse.pl 491 + make sure that data.h is writable 492 + perl preparse.pl \cvs\oss\icu > out.txt 493 494* uchar.h & uscript.h & uprops.h & uprops.c & genprops 495- new block & script values 496 + script values already added in ICU 3.6 because all of ISO 15924 is now covered 497 498* build Unicode data source code for hardcoding core data 499C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data 500 501ICU data make path is \cvs\oss\icu\source\data\ 502ICU root path is \cvs\oss\icu 503Information: cannot find "ucmlocal.mk". Not building user-additional converter files. 504[etc.] 505Creating data file for Unicode Character Properties 506Creating data file for Unicode Case Mapping Properties 507Creating data file for Unicode BiDi/Shaping Properties 508Creating data file for Unicode Normalization 509Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" 510Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" 511 512- copy the .c source files to C:\cvs\oss\icu\source\common 513 and rebuild the common library 514 515*** Unicode version numbers 516- makedata.mak 517- uchar.h 518- configure.in 519 520*** LayoutEngine script information 521* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, 522ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates 523ScriptRunData.cpp, which is no longer needed.) 524 525The generated files have a current copyright date and "@draft" statement. 526 527* copy the above files into <icu>/source/layout, replacing the old files. 528 529Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp 530and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) 531 532* rebuild the layout and layoutex libraries. 533 534---------------------------------------------------------------------------- *** 535 536Unicode 4.1 update 537 538*** related Jitterbugs 539 5404332 RFE: Update to Unicode 4.1 5414157 RBBI, TR29 4.1 updates 542 543*** data files & enums & parser code 544 545* file preparation 546- ucdstrip: 547 DerivedCoreProperties.txt 548 DerivedNormalizationProps.txt 549 NormalizationTest.txt 550 GraphemeBreakProperty.txt 551 SentenceBreakProperty.txt 552 WordBreakProperty.txt 553- ucdstrip and ucdmerge: 554 EastAsianWidth.txt 555 LineBreak.txt 556 557* add new files to the repository 558 GraphemeBreakProperty.txt 559 SentenceBreakProperty.txt 560 WordBreakProperty.txt 561 562* update FractionalUCA.txt and UCARules.txt with new canonical closure 563 564* genpname 565- handle new enumerated properties in sub read_uchar 566- run preparse.pl 567 568* uchar.h & uscript.h & uprops.h & uprops.c & genprops 569- new binary properties 570 + Pattern_Syntax 571 + Pattern_White_Space 572- new enumerated properties 573 + Grapheme_Cluster_Break 574 + Sentence_Break 575 + Word_Break 576- new block & script & line break values 577 578* gencase 579- case-ignorable changes 580 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 581 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk 582 583*** Unicode version numbers 584- makedata.mak 585- uchar.h 586- configure.in 587 588*** tests 589- verify that u_charMirror() round-trips 590- test all new properties and some new values of old properties 591 592*** other code 593 594* hardcoded Unihan range end/limit 595- Unihan range end moves from 9FA5 to 9FBB 596 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) 597 + do not modify BOCU/BOCSU code because that would change the encoding 598 and break binary compatibility! 599 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), 600 NamePrepProfile.txt 601 + ignore trietest.c: test data is arbitrary 602 + ignore tstnorm.cpp: test optimization, not important 603 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF 604 + do change line_th.txt and word_th.txt 605 by replacing hardcoded ranges with the new property values 606 + do change gennames.c 607 608source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 609source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 610source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, 611 612* case mappings 613- compare new special casing context conditions with previous ones 614 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods 615 616* genpname 617- consider storing only the short name if it is the same as the long name 618 619*** other reviews 620- UAX #29 changes (grapheme/word/sentence breaks) 621- UAX #14 changes (line breaks) 622- Pattern_Syntax & Pattern_White_Space 623 624---------------------------------------------------------------------------- *** 625 626Unicode 4.0.1 update 627 628*** related Jitterbugs 629 6303170 RFE: Update to Unicode 4.0.1 6313171 Add new Unicode 4.0.1 properties 6323520 use Unicode 4.0.1 updates for break iteration 633 634*** data files & enums & parser code 635 636* file preparation 637- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt 638- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt 639 640* file fixes 641- fix UnicodeData.txt general categories of Ethiopic digits Nd->No 642 according to PRI #26 643 http://www.unicode.org/review/resolved-pri.html#pri26 644- undone again because no corrigendum in sight; 645 instead modified tests to not check consistency on this for Unicode 4.0.1 646 647* ucdterms.txt 648- update from http://www.unicode.org/copyright.html 649 formatted for plain text 650 651* uchar.h & uprops.h & uprops.c & genprops 652- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed 653- add U_LB_INSEPARABLE due to a spelling fix 654 + put short name comment only on line with new constant 655 for genpname perl script parser 656- new binary properties 657 + STerm 658 + Variation_Selector 659 660* genpname 661- fix genpname perl script so that it doesn't choke on more than 2 names per property value 662- perl script: correctly calculate the maximum number of fields per row 663 664* uscript.h 665- new script code Hrkt=Katakana_Or_Hiragana 666 667* gennorm.c track changes in DerivedNormalizationProps.txt 668- "FNC" -> "FC_NFKC" 669- single field "NFD_NO" -> two fields "NFD_QC; N" etc. 670 671* genprops/props2.c track changes in DerivedNumericValues.txt 672- changed from 3 columns to 2, dropping the numeric type 673 + assume that the type is always numeric for Han characters, 674 and that only those are added in addition to what UnicodeData.txt lists 675 676*** Unicode version numbers 677- makedata.mak 678- uchar.h 679- configure.in 680 681*** tests 682- update test of default bidi classes according to PRI #28 683 /tsutil/cucdtst/TestUnicodeData 684 http://www.unicode.org/review/resolved-pri.html#pri28 685- bidi tests: change exemplar character for ES depending on Unicode version 686- change hardcoded expected property values where they change 687 688*** other code 689 690* name matching 691- read UCD.html 692 693* scripts 694- use new Hrkt=Katakana_Or_Hiragana 695 696* ZWJ & ZWNJ 697- are now part of combining character sequences 698- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ 699