• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2004-2011, International Business Machines
2* Corporation and others.  All Rights Reserved.
3*
4*   file name:  changes.txt
5*   encoding:   US-ASCII
6*   tab size:   8 (not used)
7*   indentation:4
8*
9*   created on: 2004may06
10*   created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
15
16Unicode 6.1 update
17
18(TODO: Copy and adjust most of the 6.0 update instructions,
19 except retain this following section in this new form.
20 So far, this just documents the new procedure for building the property names data.)
21
22* run genpname
23  (builds both pnames.icu and propname_data.h)
24- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
25- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
26- rebuild ICU & tools
27
28---------------------------------------------------------------------------- ***
29
30ICU 4.8 (no Unicode update, just new script codes)
31
32* 9 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
33  (added 2010-12-21)
34    Afak    439     Afaka
35    Jurc    510     Jurchen
36    Mroo    199     Mro, Mru
37    Nshu    499     Nüshu
38    Shrd    319     Sharada, Śāradā
39    Sora    398     Sora Sompeng
40    Takr    321     Takri, Ṭākrī, Ṭāṅkrī
41    Tang    520     Tangut
42    Wole    480     Woleai
43  -> uscript.h
44  -> com.ibm.icu.lang.UScript
45    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
46    replace  public static final int \1 = \2;\3
47  -> genpname/SyntheticPropertyValueAliases.txt
48  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
49      and in com.ibm.icu.dev.test.lang.TestUScript.java
50
51* run genpname/preparse.pl (on Linux)
52  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
53  + make sure that data.h is writable
54  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
55  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
56
57* rebuild Unicode tools (at least genpname) using make
58- You might first need to "make install" ICU so that the tools build can pick
59  up the new definitions from the installed header files.
60
61* run genpname
62  (builds both pnames.icu and propname_data.h)
63- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
64- ~/svn.icu/tools/trunk/bld/unicode/c$ genpname/genpname -v -d ~/svn.icu/trunk/src/source/common --csource
65- rebuild ICU & tools
66
67* run genprops
68- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/data/in -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
69- ~/svn.icu/tools/trunk/bld/unicode/c$ genprops/genprops -d ~/svn.icu/trunk/src/source/common --csource -s ~/svn.icu/trunk/src/source/data/unidata -i ~/svn.icu/trunk/dbg/data/out/build/icudt48l -u 6.0
70- rebuild ICU & tools
71
72* update Java data files
73- refresh just the UCD-related files, just to be safe
74- see (ICU4C)/source/data/icu4j-readme.txt
75- mkdir /tmp/icu4j
76- ~/svn.icu/trunk/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
77- copy the big-endian Unicode data files to another location,
78  separate from the other data files
79    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
80    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/pnames.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
81    ~/svn.icu/trunk/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt48b/uprops.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt48b
82- refresh ICU4J
83    ~/svn.icu/trunk/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt48b
84
85---------------------------------------------------------------------------- ***
86
87Unicode 6.0 update
88
89*** related ICU Trac tickets
90
917264 Unicode 6.0 Update
92
93*** Unicode version numbers
94- makedata.mak
95- uchar.h
96  (configure.in & configure: have been modified to extract the version from uchar.h)
97- com.ibm.icu.util.VersionInfo
98
99*** data files & enums & parser code
100
101* file preparation
102
103~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
104- This now prepares both unidata and testdata files in respective output subfolders.
105
106* PropertyAliases.txt changes
107- new Script_Extensions property defined in the new ScriptExtensions.txt file
108  but not listed in PropertyAliases.txt; reported to unicode.org;
109  -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
110    scx; Script_Extensions
111  -> uchar.h with new UProperty section
112  -> com.ibm.icu.lang.UProperty, parallel with uchar.h
113
114* PropertyValueAliases.txt changes
115- 12 new block names:
116  Alchemical_Symbols
117  Bamum_Supplement
118  Batak
119  Brahmi
120  CJK_Unified_Ideographs_Extension_D
121  Emoticons
122  Ethiopic_Extended_A
123  Kana_Supplement
124  Mandaic
125  Miscellaneous_Symbols_And_Pictographs
126  Playing_Cards
127  Transport_And_Map_Symbols
128  -> add to uchar.h
129  -> add to UCharacter.UnicodeBlock
130    Eclipse find     UBLOCK_([^ ]+) = [0-9]+, (/.+)
131            replace  public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
132- Joining_Group (jg) values:
133  Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
134  -> uchar.h & UCharacter.JoiningGroup
135- 3 new scripts:
136  sc ; Batk      ; Batak
137  sc ; Brah      ; Brahmi
138  sc ; Mand      ; Mandaic
139  -> remove these from SyntheticPropertyValueAliases.txt
140  -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
141  -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
142      and in com.ibm.icu.dev.test.lang.TestUScript.java
143- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
144  (added 2009-11-11..2010-07-18)
145  Bass        259     Bassa Vah
146  Dupl        755     Duployan shortand
147  Elba        226     Elbasan
148  Gran        343     Grantha
149  Kpel        436     Kpelle
150  Loma        437     Loma
151  Mend        438     Mende
152  Merc        101     Meroitic Cursive
153  Narb        106     Old North Arabian
154  Nbat        159     Nabataean
155  Palm        126     Palmyrene
156  Sind        318     Sindhi
157  Wara        262     Warang Citi
158  -> uscript.h
159  -> com.ibm.icu.lang.UScript
160    find     USCRIPT_([^ ]+) *= ([0-9]+),(.+)
161    replace  public static final int \1 = \2;\3
162  -> SyntheticPropertyValueAliases.txt
163  -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
164      and in com.ibm.icu.dev.test.lang.TestUScript.java
165- ISO 15924 name change
166  Mero        100     Meroitic Hieroglyphs (was Meroitic)
167  -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
168- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
169
170* UnicodeData.txt changes
171- new CJK block:
172  2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
173  2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
174  -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
175
176* build Unicode tools using CMake+make
177
178* run genpname/preparse.pl (on Linux)
179  + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
180  + make sure that data.h is writable
181  + perl preparse.pl ~/svn.icu/trunk/src > out.txt
182  + preparse.pl shows no errors, out.txt Info and Warning lines look ok
183
184* rebuild Unicode tools (at least genpname) using make
185- You might first need to "make install" ICU so that the tools build can pick
186  up the new definitions from the installed header files.
187
188* run genpname
189- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
190- rebuild ICU & tools
191
192* update source/data/unidata/norm2/nfkc_cf.txt
193- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
194
195* update source/data/unidata/norm2/uts46.txt
196- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
197  to ~/svn.icu/tools/trunk/src/unicode/py
198- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
199- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
200- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
201
202* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
203  sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
204- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
205- Unicode 6.0: U+2260, U+226E, U+226F
206
207* generate core properties data files
208- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
209- rebuild ICU & tools
210- run makeuca.sh so that genuca picks up the new nfc.nrm:
211  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
212- rebuild ICU & tools
213
214* implement new Script_Extensions property (provisional)
215- parser & generator: genprops & uprops.icu
216- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
217- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
218
219* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
220- (one-time change)
221- genbidi/gencase/genprops tools changes
222- re-run makeprops.sh (see above)
223- UCharacterProperty.java, UCharacterTypeIterator.java,
224  UBiDiProps.java, UCaseProps.java, and several others with minor changes;
225  UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
226
227* update Java data files
228- refresh just the UCD-related files, just to be safe
229- see (ICU4C)/source/data/icu4j-readme.txt
230- mkdir /tmp/icu4j
231- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
232  output:
233    ...
234    Unicode .icu files built to ./out/build/icudt45l
235    mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
236    echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
237    LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH  ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
238    jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
239    mkdir -p /tmp/icu4j/main/shared/data
240    cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
241- copy the big-endian Unicode data files to another location,
242  separate from the other data files
243    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
244    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
245    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
246    ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
247    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
248    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
249    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
250- refresh ICU4J
251    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
252
253* refresh Java test .txt files
254- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
255
256* un-hardcode normalization skippable (NF*_Inert) test data
257- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
258
259* copy updated break iterator test files
260- now handled by early ucdcopy.py and
261  copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
262  (old instructions:
263   copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
264   to ~/svn.icu/trunk/src/source/test/testdata)
265- they are not used in ICU4J
266
267* UCA
268
269- get output from Mark's tools; look in
270    http://www.unicode.org/~book/incoming/mark/uca6.0.0/
271    http://www.macchiato.com/unicode/utc/additional-uca-files
272    http://www.unicode.org/Public/UCA/6.0.0/
273    http://www.unicode.org/~mdavis/uca/
274- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
275- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
276- update Han-implicit ranges for new CJK extensions:
277  swapCJK() in ucol.cpp & ImplicitCEGenerator.java
278- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
279  do not add it into invuca so that tailoring primary-after an ignorable works
280- genuca: permit space between [variable top] bytes
281- ucol.cpp: treat noncharacters like unassigned rather than ignorable
282- run makeuca.sh:
283  ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
284- rebuild ICU4C
285- refresh ICU4J collation data:
286  (subset of instructions above for properties data refresh, except copies all coll/*)
287    ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
288    mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
289    ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
290    ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
291- update (ICU)/source/test/testdata/CollationTest_*.txt
292  and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
293  with output from Mark's Unicode tools
294- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
295- note on intltest: if collate/UCAConformanceTest fails, then
296  utility/MultithreadTest/TestCollators will fail as well;
297  fix the conformance test before looking into the multi-thread test
298
299* When refreshing all of ICU4J data from ICU4C
300- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
301- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
302or
303- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
304
305*** LayoutEngine script information
306
307(For details see the Unicode 5.2 change log below.)
308
309* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
310ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
311ScriptRunData.cpp, which is no longer needed.)
312
313The generated files have a current copyright date and "@draft" statement.
314
315* copy the above files into <icu>/source/layout, replacing the old files.
316* fix mixed line endings
317* review the diffs and fix incorrect @draft and missing aliases;
318  Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
319* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
320
321---------------------------------------------------------------------------- ***
322
323Unicode 5.2 update
324
325*** related ICU Trac tickets
326
3277084 Unicode 5.2
328
3297167 verify collation bytes
3307235 Java test NAME_ALIAS
3317236 Java DerivedCoreProperties.txt test
3327237 Java BidiTest.txt
3337238 UTrie2 in core unidata
3347239 test for tailoring gaps
3357240 Java fix CollationMiscTest
3367243 update layout engine for Unicode 5.2
337
338*** Unicode version numbers
339- makedata.mak
340- uchar.h
341- configure.in & configure
342- update ucdVersion in gennames.c if an algorithmic range changes
343
344*** data files & enums & parser code
345
346* file preparation
347
348python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
349- includes finding files regardless of version numbers,
350  copying them, and performing the equivalent processing of the
351  ucdstrip and ucdmerge tools on the desired set of files
352
353* notes on changes
354- PropertyAliases.txt
355  moved from numeric to enumerated:
356    ccc       ; Canonical_Combining_Class
357  new string properties:
358    NFKC_CF   ; NFKC_Casefold
359    Name_Alias; Name_Alias
360  new binary properties:
361    Cased     ; Cased
362    CI        ; Case_Ignorable
363    CWCF      ; Changes_When_Casefolded
364    CWCM      ; Changes_When_Casemapped
365    CWKCF     ; Changes_When_NFKC_Casefolded
366    CWL       ; Changes_When_Lowercased
367    CWT       ; Changes_When_Titlecased
368    CWU       ; Changes_When_Uppercased
369  new CJK Unihan properties (not supported by ICU)
370- PropertyValueAliases.txt
371  new block names
372  new scripts
373  one script code change:
374    sc ; Qaai      ; Inherited
375    ->
376    sc ; Zinh      ; Inherited                        ; Qaai
377  new Line_Break (lb) value:
378    lb ; CP        ; Close_Parenthesis
379  new Joining_Group (jg) values: Farsi_Yeh, Nya
380  other new values:
381    ccc; 214; ATA  ; Attached_Above
382- DerivedBidiClass.txt
383  new default-R range: U+1E800 - U+1EFFF
384- UnicodeData.txt
385  all of the ISO comments are gone
386  new CJK block end:
387    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
388  new CJK block:
389    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
390    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
391
392* genpname
393- run preparse.pl
394  + cd \svn\icuproj\icu\trunk\source\tools\genpname
395  + make sure that data.h is writable
396  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
397  + preparse.pl complains with errors like the following:
398      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
399    This is because ICU 4.0 had scripts from ISO 15924 which are now
400    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
401    and PropertyValueAliases.txt.
402    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
403       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
404  + preparse.pl complains with errors about block names missing from uchar.h; add them
405
406* uchar.h & uscript.h & uprops.h & uprops.c & genprops
407- new block & script values
408  + 26 new blocks
409    copy new blocks from Blocks.txt
410    MS VC++ 2008 regular expression:
411      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
412      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
413  + several new script values already added in ICU 4.0 for ISO 15924 coverage
414    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
415  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
416  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
417    (added to SyntheticPropertyValueAliases.txt)
418- new Joining Group (JG) values: Farsi_Yeh, Nya
419- new Line_Break (lb) value:
420    lb ; CP        ; Close_Parenthesis
421
422* hardcoded Unihan range end/limit
423- Unihan range end moves from 9FC3 to 9FCB
424  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
425  + do change gennames.c
426
427* Compare definitions of new binary properties with what we used to use
428  in algorithms, to see if the definitions changed.
429- Verified that definitions for Cased and Case_Ignorable are unchanged.
430  The gencase tool now parses the newly public Case_Ignorable values
431  in case the definition changes in the future.
432
433* uchar.c & uprops.h & uprops.c & genprops
434- new numeric values that didn't exist in Unicode data before:
435    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
436  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
437  therefore redesign the encoding of numeric types and values for formatVersion 6;
438  design for simple numbers up to at least 144 ("one gross"),
439  large values up to at least 10^20,
440  and fractions with numerators -1..17 and denominators 1..16
441  to cover current and expected future values
442  (e.g., more Han numeric values, Meroitic twelfths)
443
444* reimplement Hangul_Syllable_Type for new Jamo characters
445- the old code assumed that all Jamo characters are in the 11xx block
446- Unicode 5.2 fills holes there and adds new Jamo characters in
447    A960..A97F; Hangul Jamo Extended-A
448  and in
449    D7B0..D7FF; Hangul Jamo Extended-B
450- Hangul_Syllable_Type can be trivially derived from a subset of
451  Grapheme_Cluster_Break values
452
453* build Unicode data source code for hardcoding core data
454C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
455
456ICU data make path is \svn\icuproj\icu\trunk\source\data\
457ICU root path is \svn\icuproj\icu\trunk
458Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
459Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
460Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
461Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
462Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
463Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
464Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
465Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
466Creating data file for Unicode Property Names
467Creating data file for Unicode Character Properties
468Creating data file for Unicode Case Mapping Properties
469Creating data file for Unicode BiDi/Shaping Properties
470Creating data file for Unicode Normalization
471Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
472Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
473
474- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
475  and rebuild the common library
476
477*** UCA
478
479- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
480- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
481- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
482[ Begin obsolete instructions:
483  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
484    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
485      on Windows:
486        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
487        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
488  End obsolete instructions]
489- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
490  not just the *_STUB.txt files
491- note on intltest: if collate/UCAConformanceTest fails, then
492  utility/MultithreadTest/TestCollators will fail as well;
493  fix the conformance test before looking into the multi-thread test
494
495*** Implement Cased & Case_Ignorable properties
496- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
497- Problem: These properties should be disjoint, but aren't
498- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
499- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
500
501*** Implement Changes_When_Xyz properties
502- without stored data
503
504*** Implement Name_Alias property
505- add it as another name field in unames.icu
506- make it available via u_charName() and UCharNameChoice and
507- consider it in u_charFromName()
508
509*** Break iterators
510
511* Update break iterator rules to new UAX versions and new property values
512* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
513
514*** new BidiTest file
515- review format and data
516- copy BidiTest.txt to source/test/testdata
517- write test code using this data
518- fix ICU code where it fails the conformance test
519
520*** Java
521- generally, find and update code corresponding to C/C++
522- UCharacter.UnicodeBlock constants:
523  a) add an _ID integer per new block, update COUNT
524  b) add a class instance per new block
525     Visual Studio regex:
526        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
527        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
528- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
529
530- port test changes to Java
531
532*** LayoutEngine script information
533
534(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
535
536* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
537ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
538ScriptRunData.cpp, which is no longer needed.)
539
540The generated files have a current copyright date and "@draft" statement.
541
542-> Eric Mader wrote in email on 20090930:
543    "I think the tool has been modified to update @draft to @stable for
544     older scripts and to add @draft for new scripts.
545     (I worked with an intern on this last year.)
546     You should check the output after you run it."
547
548* copy the above files into <icu>/source/layout, replacing the old files.
549* fix mixed line endings
550* review the diffs and fix incorrect @draft and missing aliases
551* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
552
553Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
554and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
555
556-> Eric Mader wrote in email on 20090930:
557    "This is just a matter of making sure that all the per-script tables have
558     entries for any new scripts that were added.
559     If any new Indic characters were added, then the class tables in
560     IndicClassTables.cpp should be updated to reflect this.
561     John Emmons should know how to do this if it's required."
562
563* rebuild the layout and layoutex libraries.
564
565*** Documentation
566- Update User Guide
567  + Jamo_Short_Name, sfc->scf, binary property value aliases
568
569---------------------------------------------------------------------------- ***
570
571Unicode 5.1 update
572
573*** related ICU Trac tickets
574
5755696 Update to Unicode 5.1
576
577*** Unicode version numbers
578- makedata.mak
579- uchar.h
580- configure.in & configure
581- update ucdVersion in gennames.c if an algorithmic range changes
582
583*** data files & enums & parser code
584
585* file preparation
586- ucdstrip:
587    DerivedCoreProperties.txt
588    DerivedNormalizationProps.txt
589    NormalizationTest.txt
590    PropList.txt
591    Scripts.txt
592    GraphemeBreakProperty.txt
593    SentenceBreakProperty.txt
594    WordBreakProperty.txt
595- ucdstrip and ucdmerge:
596    EastAsianWidth.txt
597    LineBreak.txt
598
599* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
600copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
601copy 5.1.0\ucd\Blocks.txt ..\unidata\
602copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
603copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
604copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
605copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
606copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
607copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
608copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
609copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
610copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
611copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
612copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
613
614ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
615ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
616ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
617ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
618ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
619ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
620ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
621ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
622ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
623ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
624
625* genpname
626- run preparse.pl
627  + cd \svn\icuproj\icu\uni51\source\tools\genpname
628  + make sure that data.h is writable
629  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
630  + preparse.pl complains with errors like the following:
631      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
632    This is because ICU 3.8 had scripts from ISO 15924 which are now
633    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
634    and PropertyValueAliases.txt.
635    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
636       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
637  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
638      N/Y, No/Yes, F/T, False/True
639    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
640       It will use further values from the file if present.
641
642* uchar.h & uscript.h & uprops.h & uprops.c & genprops
643- new block & script values
644  + 17 new blocks
645  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
646    (removed from SyntheticPropertyValueAliases.txt)
647  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
648    (added to SyntheticPropertyValueAliases.txt)
649- uprops.icu (uprops.h) only provides 7 bits for script codes.
650  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
651  There is none above 127 yet which is the script code for an
652  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
653  script code values greater than 127.
654  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
655  in a parallel bit field, and that overflows now.
656  Also, future values >=128 would be incompatible anyway.
657  uprops.h is modified to move around several of the bit fields
658  in the properties vector words, and now uses 8 bits for the script code.
659  Two other bit fields also grow to accommodate future growth:
660  Block (current count: 172) grows from 8 to 9 bits,
661  and Word_Break grows from 4 to 5 bits.
662- renamed property Simple_Case_Folding (sfc->scf)
663  + nothing to be done: handled as normal alias
664- new property JSN Jamo_Short_Name
665  + no new API: only contributes to the Name property
666- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
667- new Joining Group (JG) value: Burushashki_Yeh_Barree
668- new Sentence_Break (SB) values:
669    SB ; CR        ; CR
670    SB ; EX        ; Extend
671    SB ; LF        ; LF
672    SB ; SC        ; SContinue
673- new Word_Break (WB) values:
674    WB ; CR        ; CR
675    WB ; Extend    ; Extend
676    WB ; LF        ; LF
677    WB ; MB        ; MidNumLet
678
679* Further changes in the 2008-02-29 update:
680- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
681  because they should not normally be invisible.
682- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
683- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
684- new Word_Break (WB) value: NL=Newline
685
686* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
687- Unihan range end moves from 9FBB to 9FC3
688  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
689  + do change gennames.c
690
691* build Unicode data source code for hardcoding core data
692C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
693
694ICU data make path is \svn\icuproj\icu\uni51\source\data\
695ICU root path is \svn\icuproj\icu\uni51
696Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
697Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
698Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
699Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
700Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
701Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
702Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
703Creating data file for Unicode Character Properties
704Creating data file for Unicode Case Mapping Properties
705Creating data file for Unicode BiDi/Shaping Properties
706Creating data file for Unicode Normalization
707Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
708Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
709
710- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
711  and rebuild the common library
712
713*** Break iterators
714
715* Update break iterator rules to new UAX versions and new property values
716
717*** UCA
718
719* update FractionalUCA.txt and UCARules.txt with new canonical closure
720
721*** Test suites
722- Test that APIs using Unicode property value aliases (like UnicodeSet)
723  support all of the boolean values N/Y, No/Yes, F/T, False/True
724  -> TestBinaryValues() tests in both cintltst and intltest
725
726*** LayoutEngine script information
727* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
728ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
729ScriptRunData.cpp, which is no longer needed.)
730
731The generated files have a current copyright date and "@draft" statement.
732
733* copy the above files into <icu>/source/layout, replacing the old files.
734
735Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
736and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
737
738* rebuild the layout and layoutex libraries.
739
740*** Documentation
741- Update User Guide
742  + Jamo_Short_Name, sfc->scf, binary property value aliases
743
744---------------------------------------------------------------------------- ***
745
746Unicode 5.0 update
747
748*** related Jitterbugs
749
7505084 RFE: Update to Unicode 5.0
751
752*** data files & enums & parser code
753
754* file preparation
755- ucdstrip:
756    DerivedCoreProperties.txt
757    DerivedNormalizationProps.txt
758    NormalizationTest.txt
759    PropList.txt
760    Scripts.txt
761    GraphemeBreakProperty.txt
762    SentenceBreakProperty.txt
763    WordBreakProperty.txt
764- ucdstrip and ucdmerge:
765    EastAsianWidth.txt
766    LineBreak.txt
767
768* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
769copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
770copy 5.0.0\ucd\Blocks.txt ..\unidata\
771copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
772copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
773copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
774copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
775copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
776copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
777copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
778copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
779copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
780copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
781copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
782
783ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
784ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
785ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
786ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
787ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
788ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
789ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
790ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
791ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
792ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
793
794* update FractionalUCA.txt and UCARules.txt with new canonical closure
795
796* genpname
797- run preparse.pl
798  + make sure that data.h is writable
799  + perl preparse.pl \cvs\oss\icu > out.txt
800
801* uchar.h & uscript.h & uprops.h & uprops.c & genprops
802- new block & script values
803  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
804
805* build Unicode data source code for hardcoding core data
806C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
807
808ICU data make path is \cvs\oss\icu\source\data\
809ICU root path is \cvs\oss\icu
810Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
811[etc.]
812Creating data file for Unicode Character Properties
813Creating data file for Unicode Case Mapping Properties
814Creating data file for Unicode BiDi/Shaping Properties
815Creating data file for Unicode Normalization
816Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
817Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
818
819- copy the .c source files to C:\cvs\oss\icu\source\common
820  and rebuild the common library
821
822*** Unicode version numbers
823- makedata.mak
824- uchar.h
825- configure.in
826
827*** LayoutEngine script information
828* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
829ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
830ScriptRunData.cpp, which is no longer needed.)
831
832The generated files have a current copyright date and "@draft" statement.
833
834* copy the above files into <icu>/source/layout, replacing the old files.
835
836Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
837and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
838
839* rebuild the layout and layoutex libraries.
840
841---------------------------------------------------------------------------- ***
842
843Unicode 4.1 update
844
845*** related Jitterbugs
846
8474332 RFE: Update to Unicode 4.1
8484157 RBBI, TR29 4.1 updates
849
850*** data files & enums & parser code
851
852* file preparation
853- ucdstrip:
854    DerivedCoreProperties.txt
855    DerivedNormalizationProps.txt
856    NormalizationTest.txt
857    GraphemeBreakProperty.txt
858    SentenceBreakProperty.txt
859    WordBreakProperty.txt
860- ucdstrip and ucdmerge:
861    EastAsianWidth.txt
862    LineBreak.txt
863
864* add new files to the repository
865    GraphemeBreakProperty.txt
866    SentenceBreakProperty.txt
867    WordBreakProperty.txt
868
869* update FractionalUCA.txt and UCARules.txt with new canonical closure
870
871* genpname
872- handle new enumerated properties in sub read_uchar
873- run preparse.pl
874
875* uchar.h & uscript.h & uprops.h & uprops.c & genprops
876- new binary properties
877  + Pattern_Syntax
878  + Pattern_White_Space
879- new enumerated properties
880  + Grapheme_Cluster_Break
881  + Sentence_Break
882  + Word_Break
883- new block & script & line break values
884
885* gencase
886- case-ignorable changes
887  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
888  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
889
890*** Unicode version numbers
891- makedata.mak
892- uchar.h
893- configure.in
894
895*** tests
896- verify that u_charMirror() round-trips
897- test all new properties and some new values of old properties
898
899*** other code
900
901* hardcoded Unihan range end/limit
902- Unihan range end moves from 9FA5 to 9FBB
903  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
904  + do not modify BOCU/BOCSU code because that would change the encoding
905    and break binary compatibility!
906  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
907    NamePrepProfile.txt
908  + ignore trietest.c: test data is arbitrary
909  + ignore tstnorm.cpp: test optimization, not important
910  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
911  + do change line_th.txt and word_th.txt
912    by replacing hardcoded ranges with the new property values
913  + do change gennames.c
914
915source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
916source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
917source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
918
919* case mappings
920- compare new special casing context conditions with previous ones
921  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
922
923* genpname
924- consider storing only the short name if it is the same as the long name
925
926*** other reviews
927- UAX #29 changes (grapheme/word/sentence breaks)
928- UAX #14 changes (line breaks)
929- Pattern_Syntax & Pattern_White_Space
930
931---------------------------------------------------------------------------- ***
932
933Unicode 4.0.1 update
934
935*** related Jitterbugs
936
9373170 RFE: Update to Unicode 4.0.1
9383171 Add new Unicode 4.0.1 properties
9393520 use Unicode 4.0.1 updates for break iteration
940
941*** data files & enums & parser code
942
943* file preparation
944- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
945- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
946
947* file fixes
948- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
949  according to PRI #26
950  http://www.unicode.org/review/resolved-pri.html#pri26
951- undone again because no corrigendum in sight;
952  instead modified tests to not check consistency on this for Unicode 4.0.1
953
954* ucdterms.txt
955- update from http://www.unicode.org/copyright.html
956  formatted for plain text
957
958* uchar.h & uprops.h & uprops.c & genprops
959- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
960- add U_LB_INSEPARABLE due to a spelling fix
961  + put short name comment only on line with new constant
962    for genpname perl script parser
963- new binary properties
964  + STerm
965  + Variation_Selector
966
967* genpname
968- fix genpname perl script so that it doesn't choke on more than 2 names per property value
969- perl script: correctly calculate the maximum number of fields per row
970
971* uscript.h
972- new script code Hrkt=Katakana_Or_Hiragana
973
974* gennorm.c track changes in DerivedNormalizationProps.txt
975- "FNC" -> "FC_NFKC"
976- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
977
978* genprops/props2.c track changes in DerivedNumericValues.txt
979- changed from 3 columns to 2, dropping the numeric type
980  + assume that the type is always numeric for Han characters,
981    and that only those are added in addition to what UnicodeData.txt lists
982
983*** Unicode version numbers
984- makedata.mak
985- uchar.h
986- configure.in
987
988*** tests
989- update test of default bidi classes according to PRI #28
990  /tsutil/cucdtst/TestUnicodeData
991  http://www.unicode.org/review/resolved-pri.html#pri28
992- bidi tests: change exemplar character for ES depending on Unicode version
993- change hardcoded expected property values where they change
994
995*** other code
996
997* name matching
998- read UCD.html
999
1000* scripts
1001- use new Hrkt=Katakana_Or_Hiragana
1002
1003* ZWJ & ZWNJ
1004- are now part of combining character sequences
1005- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
1006