• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2004-2010, International Business Machines
2* Corporation and others.  All Rights Reserved.
3*
4*   file name:  changes.txt
5*   encoding:   US-ASCII
6*   tab size:   8 (not used)
7*   indentation:4
8*
9*   created on: 2004may06
10*   created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
15
16Unicode 6.0 update
17
18*** related ICU Trac tickets
19
207264 Unicode 6.0 Update
21
22*** Unicode version numbers
23- makedata.mak
24- uchar.h
25  (configure.in & configure: have been modified to extract the version from uchar.h)
26- com.ibm.icu.util.VersionInfo
27
28*** data files & enums & parser code
29
30* file preparation
31
32~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
33- This now prepares both unidata and testdata files in respective output subfolders.
34
35* PropertyAliases.txt changes
36- new Script_Extensions property defined in the new ScriptExtensions.txt file
37  but not listed in PropertyAliases.txt; reported to unicode.org;
38  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
39    scx; Script_Extensions
40  -> uchar.h with new UProperty section
41  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
42
43* PropertyValueAliases.txt changes
44- 12 new block names:
45  Alchemical_Symbols
46  Bamum_Supplement
47  Batak
48  Brahmi
49  CJK_Unified_Ideographs_Extension_D
50  Emoticons
51  Ethiopic_Extended_A
52  Kana_Supplement
53  Mandaic
54  Miscellaneous_Symbols_And_Pictographs
55  Playing_Cards
56  Transport_And_Map_Symbols
57  -> add to uchar.h
58  -> add to UCharacter.UnicodeBlock
59    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
60            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
61- Joining_Group (jg) values:
62  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
63  -> uchar.h & UCharacter.JoiningGroup
64- 3 new scripts:
65  sc ; Batk      ; Batak
66  sc ; Brah      ; Brahmi
67  sc ; Mand      ; Mandaic
68  -> remove these from SyntheticPropertyValueAliases.txt
69  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
70  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
71      and in com.ibm.icu.dev.test.lang.TestUScript.java
72- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
73  (added 2009-11-11..2010-07-18)
74  Bass        259     Bassa Vah
75  Dupl        755     Duployan shortand
76  Elba        226     Elbasan
77  Gran        343     Grantha
78  Kpel        436     Kpelle
79  Loma        437     Loma
80  Mend        438     Mende
81  Merc        101     Meroitic Cursive
82  Narb        106     Old North Arabian
83  Nbat        159     Nabataean
84  Palm        126     Palmyrene
85  Sind        318     Sindhi
86  Wara        262     Warang Citi
87  -> uscript.h
88  -> com.ibm.icu.lang.UScript
89    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
90    replace  public static final int \1 = \2;\3
91  -> SyntheticPropertyValueAliases.txt
92  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
93      and in com.ibm.icu.dev.test.lang.TestUScript.java
94- ISO 15924 name change
95  Mero        100     Meroitic Hieroglyphs (was Meroitic)
96  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
97- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
98
99* UnicodeData.txt changes
100- new CJK block:
101  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
102  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
103  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
104
105* build Unicode tools using CMake+make
106
107* run genpname/preparse.pl (on Linux)
108  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
109  + make sure that data.h is writable
110  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
111  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
112
113* rebuild Unicode tools (at least genpname) using make
114- You might first need to "make install" ICU so that the tools build can pick
115  up the new definitions from the installed header files.
116
117* run genpname
118- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
119- rebuild ICU & tools
120
121* update source/data/unidata/norm2/nfkc_cf.txt
122- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
123
124* update source/data/unidata/norm2/uts46.txt
125- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
126  to ~/svn.icu/tools/trunk/src/unicode/py
127- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
128- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
129- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
130
131* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
132  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
133- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
134- Unicode 6.0: U+2260, U+226E, U+226F
135
136* generate core properties data files
137- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
138- rebuild ICU & tools
139- run makeuca.sh so that genuca picks up the new nfc.nrm:
140  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
141- rebuild ICU & tools
142
143* implement new Script_Extensions property (provisional)
144- parser & generator: genprops & uprops.icu
145- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
146- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
147
148* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
149- (one-time change)
150- genbidi/gencase/genprops tools changes
151- re-run makeprops.sh (see above)
152- UCharacterProperty.java, UCharacterTypeIterator.java,
153  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
154  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
155
156* update Java data files
157- refresh just the UCD-related files, just to be safe
158- see (ICU4C)/source/data/icu4j-readme.txt
159- mkdir /tmp/icu4j
160- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
161  output:
162    ...
163    Unicode .icu files built to ./out/build/icudt45l
164    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
165    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
166    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
167    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
168    mkdir -p /tmp/icu4j/main/shared/data
169    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
170- copy the big-endian Unicode data files to another location,
171  separate from the other data files
172    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
173    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
174    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
175    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
176    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
177    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
178    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
179- refresh ICU4J
180    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
181
182* refresh Java test .txt files
183- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
184
185* un-hardcode normalization skippable (NF*_Inert) test data
186- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
187
188* copy updated break iterator test files
189- now handled by early ucdcopy.py and
190  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
191  (old instructions:
192   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
193   to ~/svn.icu/trunk/src/source/test/testdata)
194- they are not used in ICU4J
195
196* UCA
197
198- get output from Mark's tools; look in
199    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
200    http://www.macchiato.com/unicode/utc/additional-uca-files
201    http://www.unicode.org/Public/UCA/6.0.0/
202    http://www.unicode.org/~mdavis/uca/
203- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
204- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
205- update Han-implicit ranges for new CJK extensions:
206  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
207- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
208  do not add it into invuca so that tailoring primary-after an ignorable works
209- genuca: permit space between [variable top] bytes
210- ucol.cpp: treat noncharacters like unassigned rather than ignorable
211- run makeuca.sh:
212  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
213- rebuild ICU4C
214- refresh ICU4J collation data:
215  (subset of instructions above for properties data refresh, except copies all coll/*)
216    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
217    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
218    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
219    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
220- update (ICU)/source/test/testdata/CollationTest_*.txt
221  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
222  with output from Mark's Unicode tools
223- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
224- note on intltest: if collate/UCAConformanceTest fails, then
225  utility/MultithreadTest/TestCollators will fail as well;
226  fix the conformance test before looking into the multi-thread test
227
228* When refreshing all of ICU4J data from ICU4C
229- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
230- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
231or
232- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
233
234*** LayoutEngine script information
235
236(For details see the Unicode 5.2 change log below.)
237
238* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
239ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
240ScriptRunData.cpp, which is no longer needed.)
241
242The generated files have a current copyright date and "@draft" statement.
243
244* copy the above files into <icu>/source/layout, replacing the old files.
245* fix mixed line endings
246* review the diffs and fix incorrect @draft and missing aliases;
247  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
248* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
249
250---------------------------------------------------------------------------- ***
251
252Unicode 5.2 update
253
254*** related ICU Trac tickets
255
2567084 Unicode 5.2
257
2587167 verify collation bytes
2597235 Java test NAME_ALIAS
2607236 Java DerivedCoreProperties.txt test
2617237 Java BidiTest.txt
2627238 UTrie2 in core unidata
2637239 test for tailoring gaps
2647240 Java fix CollationMiscTest
2657243 update layout engine for Unicode 5.2
266
267*** Unicode version numbers
268- makedata.mak
269- uchar.h
270- configure.in & configure
271- update ucdVersion in gennames.c if an algorithmic range changes
272
273*** data files & enums & parser code
274
275* file preparation
276
277python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
278- includes finding files regardless of version numbers,
279  copying them, and performing the equivalent processing of the
280  ucdstrip and ucdmerge tools on the desired set of files
281
282* notes on changes
283- PropertyAliases.txt
284  moved from numeric to enumerated:
285    ccc       ; Canonical_Combining_Class
286  new string properties:
287    NFKC_CF   ; NFKC_Casefold
288    Name_Alias; Name_Alias
289  new binary properties:
290    Cased     ; Cased
291    CI        ; Case_Ignorable
292    CWCF      ; Changes_When_Casefolded
293    CWCM      ; Changes_When_Casemapped
294    CWKCF     ; Changes_When_NFKC_Casefolded
295    CWL       ; Changes_When_Lowercased
296    CWT       ; Changes_When_Titlecased
297    CWU       ; Changes_When_Uppercased
298  new CJK Unihan properties (not supported by ICU)
299- PropertyValueAliases.txt
300  new block names
301  new scripts
302  one script code change:
303    sc ; Qaai      ; Inherited
304    ->
305    sc ; Zinh      ; Inherited                        ; Qaai
306  new Line_Break (lb) value:
307    lb ; CP        ; Close_Parenthesis
308  new Joining_Group (jg) values: Farsi_Yeh, Nya
309  other new values:
310    ccc; 214; ATA  ; Attached_Above
311- DerivedBidiClass.txt
312  new default-R range: U+1E800 - U+1EFFF
313- UnicodeData.txt
314  all of the ISO comments are gone
315  new CJK block end:
316    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
317  new CJK block:
318    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
319    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
320
321* genpname
322- run preparse.pl
323  + cd \svn\icuproj\icu\trunk\source\tools\genpname
324  + make sure that data.h is writable
325  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
326  + preparse.pl complains with errors like the following:
327      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
328    This is because ICU 4.0 had scripts from ISO 15924 which are now
329    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
330    and PropertyValueAliases.txt.
331    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
332       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
333  + preparse.pl complains with errors about block names missing from uchar.h; add them
334
335* uchar.h & uscript.h & uprops.h & uprops.c & genprops
336- new block & script values
337  + 26 new blocks
338    copy new blocks from Blocks.txt
339    MS VC++ 2008 regular expression:
340      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
341      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
342  + several new script values already added in ICU 4.0 for ISO 15924 coverage
343    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
344  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
345  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
346    (added to SyntheticPropertyValueAliases.txt)
347- new Joining Group (JG) values: Farsi_Yeh, Nya
348- new Line_Break (lb) value:
349    lb ; CP        ; Close_Parenthesis
350
351* hardcoded Unihan range end/limit
352- Unihan range end moves from 9FC3 to 9FCB
353  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
354  + do change gennames.c
355
356* Compare definitions of new binary properties with what we used to use
357  in algorithms, to see if the definitions changed.
358- Verified that definitions for Cased and Case_Ignorable are unchanged.
359  The gencase tool now parses the newly public Case_Ignorable values
360  in case the definition changes in the future.
361
362* uchar.c & uprops.h & uprops.c & genprops
363- new numeric values that didn't exist in Unicode data before:
364    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
365  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
366  therefore redesign the encoding of numeric types and values for formatVersion 6;
367  design for simple numbers up to at least 144 ("one gross"),
368  large values up to at least 10^20,
369  and fractions with numerators -1..17 and denominators 1..16
370  to cover current and expected future values
371  (e.g., more Han numeric values, Meroitic twelfths)
372
373* reimplement Hangul_Syllable_Type for new Jamo characters
374- the old code assumed that all Jamo characters are in the 11xx block
375- Unicode 5.2 fills holes there and adds new Jamo characters in
376    A960..A97F; Hangul Jamo Extended-A
377  and in
378    D7B0..D7FF; Hangul Jamo Extended-B
379- Hangul_Syllable_Type can be trivially derived from a subset of
380  Grapheme_Cluster_Break values
381
382* build Unicode data source code for hardcoding core data
383C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
384
385ICU data make path is \svn\icuproj\icu\trunk\source\data\
386ICU root path is \svn\icuproj\icu\trunk
387Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
388Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
389Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
390Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
391Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
392Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
393Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
394Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
395Creating data file for Unicode Property Names
396Creating data file for Unicode Character Properties
397Creating data file for Unicode Case Mapping Properties
398Creating data file for Unicode BiDi/Shaping Properties
399Creating data file for Unicode Normalization
400Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
401Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
402
403- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
404  and rebuild the common library
405
406*** UCA
407
408- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
409- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
410- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
411[ Begin obsolete instructions:
412  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
413    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
414      on Windows:
415        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
416        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
417  End obsolete instructions]
418- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
419  not just the *_STUB.txt files
420- note on intltest: if collate/UCAConformanceTest fails, then
421  utility/MultithreadTest/TestCollators will fail as well;
422  fix the conformance test before looking into the multi-thread test
423
424*** Implement Cased & Case_Ignorable properties
425- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
426- Problem: These properties should be disjoint, but aren't
427- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
428- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
429
430*** Implement Changes_When_Xyz properties
431- without stored data
432
433*** Implement Name_Alias property
434- add it as another name field in unames.icu
435- make it available via u_charName() and UCharNameChoice and
436- consider it in u_charFromName()
437
438*** Break iterators
439
440* Update break iterator rules to new UAX versions and new property values
441* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
442
443*** new BidiTest file
444- review format and data
445- copy BidiTest.txt to source/test/testdata
446- write test code using this data
447- fix ICU code where it fails the conformance test
448
449*** Java
450- generally, find and update code corresponding to C/C++
451- UCharacter.UnicodeBlock constants:
452  a) add an _ID integer per new block, update COUNT
453  b) add a class instance per new block
454     Visual Studio regex:
455        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
456        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
457- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
458
459- port test changes to Java
460
461*** LayoutEngine script information
462
463(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
464
465* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
466ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
467ScriptRunData.cpp, which is no longer needed.)
468
469The generated files have a current copyright date and "@draft" statement.
470
471-> Eric Mader wrote in email on 20090930:
472    "I think the tool has been modified to update @draft to @stable for
473     older scripts and to add @draft for new scripts.
474     (I worked with an intern on this last year.)
475     You should check the output after you run it."
476
477* copy the above files into <icu>/source/layout, replacing the old files.
478* fix mixed line endings
479* review the diffs and fix incorrect @draft and missing aliases
480* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
481
482Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
483and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
484
485-> Eric Mader wrote in email on 20090930:
486    "This is just a matter of making sure that all the per-script tables have
487     entries for any new scripts that were added.
488     If any new Indic characters were added, then the class tables in
489     IndicClassTables.cpp should be updated to reflect this.
490     John Emmons should know how to do this if it's required."
491
492* rebuild the layout and layoutex libraries.
493
494*** Documentation
495- Update User Guide
496  + Jamo_Short_Name, sfc->scf, binary property value aliases
497
498---------------------------------------------------------------------------- ***
499
500Unicode 5.1 update
501
502*** related ICU Trac tickets
503
5045696 Update to Unicode 5.1
505
506*** Unicode version numbers
507- makedata.mak
508- uchar.h
509- configure.in & configure
510- update ucdVersion in gennames.c if an algorithmic range changes
511
512*** data files & enums & parser code
513
514* file preparation
515- ucdstrip:
516    DerivedCoreProperties.txt
517    DerivedNormalizationProps.txt
518    NormalizationTest.txt
519    PropList.txt
520    Scripts.txt
521    GraphemeBreakProperty.txt
522    SentenceBreakProperty.txt
523    WordBreakProperty.txt
524- ucdstrip and ucdmerge:
525    EastAsianWidth.txt
526    LineBreak.txt
527
528* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
529copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
530copy 5.1.0\ucd\Blocks.txt ..\unidata\
531copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
532copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
533copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
534copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
535copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
536copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
537copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
538copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
539copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
540copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
541copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
542
543ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
544ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
545ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
546ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
547ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
548ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
549ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
550ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
551ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
552ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
553
554* genpname
555- run preparse.pl
556  + cd \svn\icuproj\icu\uni51\source\tools\genpname
557  + make sure that data.h is writable
558  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
559  + preparse.pl complains with errors like the following:
560      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
561    This is because ICU 3.8 had scripts from ISO 15924 which are now
562    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
563    and PropertyValueAliases.txt.
564    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
565       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
566  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
567      N/Y, No/Yes, F/T, False/True
568    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
569       It will use further values from the file if present.
570
571* uchar.h & uscript.h & uprops.h & uprops.c & genprops
572- new block & script values
573  + 17 new blocks
574  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
575    (removed from SyntheticPropertyValueAliases.txt)
576  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
577    (added to SyntheticPropertyValueAliases.txt)
578- uprops.icu (uprops.h) only provides 7 bits for script codes.
579  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
580  There is none above 127 yet which is the script code for an
581  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
582  script code values greater than 127.
583  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
584  in a parallel bit field, and that overflows now.
585  Also, future values >=128 would be incompatible anyway.
586  uprops.h is modified to move around several of the bit fields
587  in the properties vector words, and now uses 8 bits for the script code.
588  Two other bit fields also grow to accommodate future growth:
589  Block (current count: 172) grows from 8 to 9 bits,
590  and Word_Break grows from 4 to 5 bits.
591- renamed property Simple_Case_Folding (sfc->scf)
592  + nothing to be done: handled as normal alias
593- new property JSN Jamo_Short_Name
594  + no new API: only contributes to the Name property
595- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
596- new Joining Group (JG) value: Burushashki_Yeh_Barree
597- new Sentence_Break (SB) values:
598    SB ; CR        ; CR
599    SB ; EX        ; Extend
600    SB ; LF        ; LF
601    SB ; SC        ; SContinue
602- new Word_Break (WB) values:
603    WB ; CR        ; CR
604    WB ; Extend    ; Extend
605    WB ; LF        ; LF
606    WB ; MB        ; MidNumLet
607
608* Further changes in the 2008-02-29 update:
609- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
610  because they should not normally be invisible.
611- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
612- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
613- new Word_Break (WB) value: NL=Newline
614
615* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
616- Unihan range end moves from 9FBB to 9FC3
617  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
618  + do change gennames.c
619
620* build Unicode data source code for hardcoding core data
621C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
622
623ICU data make path is \svn\icuproj\icu\uni51\source\data\
624ICU root path is \svn\icuproj\icu\uni51
625Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
626Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
627Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
628Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
629Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
630Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
631Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
632Creating data file for Unicode Character Properties
633Creating data file for Unicode Case Mapping Properties
634Creating data file for Unicode BiDi/Shaping Properties
635Creating data file for Unicode Normalization
636Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
637Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
638
639- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
640  and rebuild the common library
641
642*** Break iterators
643
644* Update break iterator rules to new UAX versions and new property values
645
646*** UCA
647
648* update FractionalUCA.txt and UCARules.txt with new canonical closure
649
650*** Test suites
651- Test that APIs using Unicode property value aliases (like UnicodeSet)
652  support all of the boolean values N/Y, No/Yes, F/T, False/True
653  -> TestBinaryValues() tests in both cintltst and intltest
654
655*** LayoutEngine script information
656* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
657ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
658ScriptRunData.cpp, which is no longer needed.)
659
660The generated files have a current copyright date and "@draft" statement.
661
662* copy the above files into <icu>/source/layout, replacing the old files.
663
664Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
665and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
666
667* rebuild the layout and layoutex libraries.
668
669*** Documentation
670- Update User Guide
671  + Jamo_Short_Name, sfc->scf, binary property value aliases
672
673---------------------------------------------------------------------------- ***
674
675Unicode 5.0 update
676
677*** related Jitterbugs
678
6795084 RFE: Update to Unicode 5.0
680
681*** data files & enums & parser code
682
683* file preparation
684- ucdstrip:
685    DerivedCoreProperties.txt
686    DerivedNormalizationProps.txt
687    NormalizationTest.txt
688    PropList.txt
689    Scripts.txt
690    GraphemeBreakProperty.txt
691    SentenceBreakProperty.txt
692    WordBreakProperty.txt
693- ucdstrip and ucdmerge:
694    EastAsianWidth.txt
695    LineBreak.txt
696
697* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
698copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
699copy 5.0.0\ucd\Blocks.txt ..\unidata\
700copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
701copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
702copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
703copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
704copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
705copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
706copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
707copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
708copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
709copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
710copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
711
712ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
713ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
714ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
715ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
716ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
717ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
718ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
719ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
720ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
721ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
722
723* update FractionalUCA.txt and UCARules.txt with new canonical closure
724
725* genpname
726- run preparse.pl
727  + make sure that data.h is writable
728  + perl preparse.pl \cvs\oss\icu > out.txt
729
730* uchar.h & uscript.h & uprops.h & uprops.c & genprops
731- new block & script values
732  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
733
734* build Unicode data source code for hardcoding core data
735C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
736
737ICU data make path is \cvs\oss\icu\source\data\
738ICU root path is \cvs\oss\icu
739Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
740[etc.]
741Creating data file for Unicode Character Properties
742Creating data file for Unicode Case Mapping Properties
743Creating data file for Unicode BiDi/Shaping Properties
744Creating data file for Unicode Normalization
745Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
746Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
747
748- copy the .c source files to C:\cvs\oss\icu\source\common
749  and rebuild the common library
750
751*** Unicode version numbers
752- makedata.mak
753- uchar.h
754- configure.in
755
756*** LayoutEngine script information
757* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
758ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
759ScriptRunData.cpp, which is no longer needed.)
760
761The generated files have a current copyright date and "@draft" statement.
762
763* copy the above files into <icu>/source/layout, replacing the old files.
764
765Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
766and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
767
768* rebuild the layout and layoutex libraries.
769
770---------------------------------------------------------------------------- ***
771
772Unicode 4.1 update
773
774*** related Jitterbugs
775
7764332 RFE: Update to Unicode 4.1
7774157 RBBI, TR29 4.1 updates
778
779*** data files & enums & parser code
780
781* file preparation
782- ucdstrip:
783    DerivedCoreProperties.txt
784    DerivedNormalizationProps.txt
785    NormalizationTest.txt
786    GraphemeBreakProperty.txt
787    SentenceBreakProperty.txt
788    WordBreakProperty.txt
789- ucdstrip and ucdmerge:
790    EastAsianWidth.txt
791    LineBreak.txt
792
793* add new files to the repository
794    GraphemeBreakProperty.txt
795    SentenceBreakProperty.txt
796    WordBreakProperty.txt
797
798* update FractionalUCA.txt and UCARules.txt with new canonical closure
799
800* genpname
801- handle new enumerated properties in sub read_uchar
802- run preparse.pl
803
804* uchar.h & uscript.h & uprops.h & uprops.c & genprops
805- new binary properties
806  + Pattern_Syntax
807  + Pattern_White_Space
808- new enumerated properties
809  + Grapheme_Cluster_Break
810  + Sentence_Break
811  + Word_Break
812- new block & script & line break values
813
814* gencase
815- case-ignorable changes
816  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
817  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
818
819*** Unicode version numbers
820- makedata.mak
821- uchar.h
822- configure.in
823
824*** tests
825- verify that u_charMirror() round-trips
826- test all new properties and some new values of old properties
827
828*** other code
829
830* hardcoded Unihan range end/limit
831- Unihan range end moves from 9FA5 to 9FBB
832  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
833  + do not modify BOCU/BOCSU code because that would change the encoding
834    and break binary compatibility!
835  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
836    NamePrepProfile.txt
837  + ignore trietest.c: test data is arbitrary
838  + ignore tstnorm.cpp: test optimization, not important
839  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
840  + do change line_th.txt and word_th.txt
841    by replacing hardcoded ranges with the new property values
842  + do change gennames.c
843
844source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
845source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
846source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
847
848* case mappings
849- compare new special casing context conditions with previous ones
850  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
851
852* genpname
853- consider storing only the short name if it is the same as the long name
854
855*** other reviews
856- UAX #29 changes (grapheme/word/sentence breaks)
857- UAX #14 changes (line breaks)
858- Pattern_Syntax & Pattern_White_Space
859
860---------------------------------------------------------------------------- ***
861
862Unicode 4.0.1 update
863
864*** related Jitterbugs
865
8663170 RFE: Update to Unicode 4.0.1
8673171 Add new Unicode 4.0.1 properties
8683520 use Unicode 4.0.1 updates for break iteration
869
870*** data files & enums & parser code
871
872* file preparation
873- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
874- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
875
876* file fixes
877- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
878  according to PRI #26
879  http://www.unicode.org/review/resolved-pri.html#pri26
880- undone again because no corrigendum in sight;
881  instead modified tests to not check consistency on this for Unicode 4.0.1
882
883* ucdterms.txt
884- update from http://www.unicode.org/copyright.html
885  formatted for plain text
886
887* uchar.h & uprops.h & uprops.c & genprops
888- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
889- add U_LB_INSEPARABLE due to a spelling fix
890  + put short name comment only on line with new constant
891    for genpname perl script parser
892- new binary properties
893  + STerm
894  + Variation_Selector
895
896* genpname
897- fix genpname perl script so that it doesn't choke on more than 2 names per property value
898- perl script: correctly calculate the maximum number of fields per row
899
900* uscript.h
901- new script code Hrkt=Katakana_Or_Hiragana
902
903* gennorm.c track changes in DerivedNormalizationProps.txt
904- "FNC" -> "FC_NFKC"
905- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
906
907* genprops/props2.c track changes in DerivedNumericValues.txt
908- changed from 3 columns to 2, dropping the numeric type
909  + assume that the type is always numeric for Han characters,
910    and that only those are added in addition to what UnicodeData.txt lists
911
912*** Unicode version numbers
913- makedata.mak
914- uchar.h
915- configure.in
916
917*** tests
918- update test of default bidi classes according to PRI #28
919  /tsutil/cucdtst/TestUnicodeData
920  http://www.unicode.org/review/resolved-pri.html#pri28
921- bidi tests: change exemplar character for ES depending on Unicode version
922- change hardcoded expected property values where they change
923
924*** other code
925
926* name matching
927- read UCD.html
928
929* scripts
930- use new Hrkt=Katakana_Or_Hiragana
931
932* ZWJ & ZWNJ
933- are now part of combining character sequences
934- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
935