• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1* Copyright (C) 2004-2010, International Business Machines
2* Corporation and others.  All Rights Reserved.
3*
4*   file name:  changes.txt
5*   encoding:   US-ASCII
6*   tab size:   8 (not used)
7*   indentation:4
8*
9*   created on: 2004may06
10*   created by: Markus W. Scherer
11*
12* change log for Unicode updates
13
14---------------------------------------------------------------------------- ***
15
16Unicode 5.2 update
17
18*** related ICU Trac tickets
19
207084 Unicode 5.2
21
227167 verify collation bytes
237235 Java test NAME_ALIAS
247236 Java DerivedCoreProperties.txt test
257237 Java BidiTest.txt
267238 UTrie2 in core unidata
277239 test for tailoring gaps
287240 Java fix CollationMiscTest
297243 update layout engine for Unicode 5.2
30
31*** Unicode version numbers
32- makedata.mak
33- uchar.h
34- configure.in & configure
35- update ucdVersion in gennames.c if an algorithmic range changes
36
37*** data files & enums & parser code
38
39* file preparation
40
41python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
42- includes finding files regardless of version numbers,
43  copying them, and performing the equivalent processing of the
44  ucdstrip and ucdmerge tools on the desired set of files
45
46* notes on changes
47- PropertyAliases.txt
48  moved from numeric to enumerated:
49    ccc       ; Canonical_Combining_Class
50  new string properties:
51    NFKC_CF   ; NFKC_Casefold
52    Name_Alias; Name_Alias
53  new binary properties:
54    Cased     ; Cased
55    CI        ; Case_Ignorable
56    CWCF      ; Changes_When_Casefolded
57    CWCM      ; Changes_When_Casemapped
58    CWKCF     ; Changes_When_NFKC_Casefolded
59    CWL       ; Changes_When_Lowercased
60    CWT       ; Changes_When_Titlecased
61    CWU       ; Changes_When_Uppercased
62  new CJK Unihan properties (not supported by ICU)
63- PropertyValueAliases.txt
64  new block names
65  new scripts
66  one script code change:
67    sc ; Qaai      ; Inherited
68    ->
69    sc ; Zinh      ; Inherited                        ; Qaai
70  new Line_Break (lb) value:
71    lb ; CP        ; Close_Parenthesis
72  new Joining_Group (jg) values: Farsi_Yeh, Nya
73  other new values:
74    ccc; 214; ATA  ; Attached_Above
75- DerivedBidiClass.txt
76  new default-R range: U+1E800 - U+1EFFF
77- UnicodeData.txt
78  all of the ISO comments are gone
79  new CJK block end:
80    9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
81  new CJK block:
82    2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
83    2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
84
85* genpname
86- run preparse.pl
87  + cd \svn\icuproj\icu\trunk\source\tools\genpname
88  + make sure that data.h is writable
89  + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
90  + preparse.pl complains with errors like the following:
91      Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
92    This is because ICU 4.0 had scripts from ISO 15924 which are now
93    added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
94    and PropertyValueAliases.txt.
95    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
96       Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
97  + preparse.pl complains with errors about block names missing from uchar.h; add them
98
99* uchar.h & uscript.h & uprops.h & uprops.c & genprops
100- new block & script values
101  + 26 new blocks
102    copy new blocks from Blocks.txt
103    MS VC++ 2008 regular expression:
104      find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
105      replace with "    UBLOCK_\3 = 172, /*[\1]*/"
106  + several new script values already added in ICU 4.0 for ISO 15924 coverage
107    (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
108  + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
109  + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
110    (added to SyntheticPropertyValueAliases.txt)
111- new Joining Group (JG) values: Farsi_Yeh, Nya
112- new Line_Break (lb) value:
113    lb ; CP        ; Close_Parenthesis
114
115* hardcoded Unihan range end/limit
116- Unihan range end moves from 9FC3 to 9FCB
117  search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
118  + do change gennames.c
119
120* Compare definitions of new binary properties with what we used to use
121  in algorithms, to see if the definitions changed.
122- Verified that definitions for Cased and Case_Ignorable are unchanged.
123  The gencase tool now parses the newly public Case_Ignorable values
124  in case the definition changes in the future.
125
126* uchar.c & uprops.h & uprops.c & genprops
127- new numeric values that didn't exist in Unicode data before:
128    1/7, 1/9, 1/10, 3/10, 1/16, 3/16
129  the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
130  therefore redesign the encoding of numeric types and values for formatVersion 6;
131  design for simple numbers up to at least 144 ("one gross"),
132  large values up to at least 10^20,
133  and fractions with numerators -1..17 and denominators 1..16
134  to cover current and expected future values
135  (e.g., more Han numeric values, Meroitic twelfths)
136
137* reimplement Hangul_Syllable_Type for new Jamo characters
138- the old code assumed that all Jamo characters are in the 11xx block
139- Unicode 5.2 fills holes there and adds new Jamo characters in
140    A960..A97F; Hangul Jamo Extended-A
141  and in
142    D7B0..D7FF; Hangul Jamo Extended-B
143- Hangul_Syllable_Type can be trivially derived from a subset of
144  Grapheme_Cluster_Break values
145
146* build Unicode data source code for hardcoding core data
147C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
148
149ICU data make path is \svn\icuproj\icu\trunk\source\data\
150ICU root path is \svn\icuproj\icu\trunk
151Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
152Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
153Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
154Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
155Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
156Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
157Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
158Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
159Creating data file for Unicode Property Names
160Creating data file for Unicode Character Properties
161Creating data file for Unicode Case Mapping Properties
162Creating data file for Unicode BiDi/Shaping Properties
163Creating data file for Unicode Normalization
164Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
165Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
166
167- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
168  and rebuild the common library
169
170*** UCA
171
172- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
173- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
174- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
175[ Begin obsolete instructions:
176  Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
177    - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
178      on Windows:
179        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
180        python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
181  End obsolete instructions]
182- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
183  not just the *_STUB.txt files
184- note on intltest: if collate/UCAConformanceTest fails, then
185  utility/MultithreadTest/TestCollators will fail as well;
186  fix the conformance test before looking into the multi-thread test
187
188*** Implement Cased & Case_Ignorable properties
189- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
190- Problem: These properties should be disjoint, but aren't
191- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
192- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
193
194*** Implement Changes_When_Xyz properties
195- without stored data
196
197*** Implement Name_Alias property
198- add it as another name field in unames.icu
199- make it available via u_charName() and UCharNameChoice and
200- consider it in u_charFromName()
201
202*** Break iterators
203
204* Update break iterator rules to new UAX versions and new property values
205* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
206
207*** new BidiTest file
208- review format and data
209- copy BidiTest.txt to source/test/testdata
210- write test code using this data
211- fix ICU code where it fails the conformance test
212
213*** Java
214- generally, find and update code corresponding to C/C++
215- UCharacter.UnicodeBlock constants:
216  a) add an _ID integer per new block, update COUNT
217  b) add a class instance per new block
218     Visual Studio regex:
219        find            UBLOCK_{[^ ]+} = [0-9]+, {/.+}
220        replace with    public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
221- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
222
223- port test changes to Java
224
225*** LayoutEngine script information
226
227(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
228
229* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
230ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
231ScriptRunData.cpp, which is no longer needed.)
232
233The generated files have a current copyright date and "@draft" statement.
234
235-> Eric Mader wrote in email on 20090930:
236    "I think the tool has been modified to update @draft to @stable for
237     older scripts and to add @draft for new scripts.
238     (I worked with an intern on this last year.)
239     You should check the output after you run it."
240
241* copy the above files into <icu>/source/layout, replacing the old files.
242* fix mixed line endings
243* review the diffs and fix incorrect @draft and missing aliases
244* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
245
246Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
247and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
248
249-> Eric Mader wrote in email on 20090930:
250    "This is just a matter of making sure that all the per-script tables have
251     entries for any new scripts that were added.
252     If any new Indic characters were added, then the class tables in
253     IndicClassTables.cpp should be updated to reflect this.
254     John Emmons should know how to do this if it's required."
255
256* rebuild the layout and layoutex libraries.
257
258*** Documentation
259- Update User Guide
260  + Jamo_Short_Name, sfc->scf, binary property value aliases
261
262---------------------------------------------------------------------------- ***
263
264Unicode 5.1 update
265
266*** related ICU Trac tickets
267
2685696 Update to Unicode 5.1
269
270*** Unicode version numbers
271- makedata.mak
272- uchar.h
273- configure.in & configure
274- update ucdVersion in gennames.c if an algorithmic range changes
275
276*** data files & enums & parser code
277
278* file preparation
279- ucdstrip:
280    DerivedCoreProperties.txt
281    DerivedNormalizationProps.txt
282    NormalizationTest.txt
283    PropList.txt
284    Scripts.txt
285    GraphemeBreakProperty.txt
286    SentenceBreakProperty.txt
287    WordBreakProperty.txt
288- ucdstrip and ucdmerge:
289    EastAsianWidth.txt
290    LineBreak.txt
291
292* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
293copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
294copy 5.1.0\ucd\Blocks.txt ..\unidata\
295copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
296copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
297copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
298copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
299copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
300copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
301copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
302copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
303copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
304copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
305copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
306
307ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
308ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
309ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
310ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
311ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
312ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
313ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
314ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
315ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
316ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
317
318* genpname
319- run preparse.pl
320  + cd \svn\icuproj\icu\uni51\source\tools\genpname
321  + make sure that data.h is writable
322  + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
323  + preparse.pl complains with errors like the following:
324      Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
325    This is because ICU 3.8 had scripts from ISO 15924 which are now
326    added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
327    and PropertyValueAliases.txt.
328    -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
329       Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
330  + PropertyValueAliases.txt now explicitly contains values for boolean properties:
331      N/Y, No/Yes, F/T, False/True
332    -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
333       It will use further values from the file if present.
334
335* uchar.h & uscript.h & uprops.h & uprops.c & genprops
336- new block & script values
337  + 17 new blocks
338  + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
339    (removed from SyntheticPropertyValueAliases.txt)
340  + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
341    (added to SyntheticPropertyValueAliases.txt)
342- uprops.icu (uprops.h) only provides 7 bits for script codes.
343  In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
344  There is none above 127 yet which is the script code for an
345  assigned Unicode character, so ICU 4.0 uprops.icu does not store any
346  script code values greater than 127.
347  However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
348  in a parallel bit field, and that overflows now.
349  Also, future values >=128 would be incompatible anyway.
350  uprops.h is modified to move around several of the bit fields
351  in the properties vector words, and now uses 8 bits for the script code.
352  Two other bit fields also grow to accommodate future growth:
353  Block (current count: 172) grows from 8 to 9 bits,
354  and Word_Break grows from 4 to 5 bits.
355- renamed property Simple_Case_Folding (sfc->scf)
356  + nothing to be done: handled as normal alias
357- new property JSN Jamo_Short_Name
358  + no new API: only contributes to the Name property
359- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
360- new Joining Group (JG) value: Burushashki_Yeh_Barree
361- new Sentence_Break (SB) values:
362    SB ; CR        ; CR
363    SB ; EX        ; Extend
364    SB ; LF        ; LF
365    SB ; SC        ; SContinue
366- new Word_Break (WB) values:
367    WB ; CR        ; CR
368    WB ; Extend    ; Extend
369    WB ; LF        ; LF
370    WB ; MB        ; MidNumLet
371
372* Further changes in the 2008-02-29 update:
373- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
374  because they should not normally be invisible.
375- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
376- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
377- new Word_Break (WB) value: NL=Newline
378
379* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
380- Unihan range end moves from 9FBB to 9FC3
381  search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
382  + do change gennames.c
383
384* build Unicode data source code for hardcoding core data
385C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
386
387ICU data make path is \svn\icuproj\icu\uni51\source\data\
388ICU root path is \svn\icuproj\icu\uni51
389Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
390Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
391Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
392Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
393Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
394Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
395Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
396Creating data file for Unicode Character Properties
397Creating data file for Unicode Case Mapping Properties
398Creating data file for Unicode BiDi/Shaping Properties
399Creating data file for Unicode Normalization
400Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
401Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
402
403- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
404  and rebuild the common library
405
406*** Break iterators
407
408* Update break iterator rules to new UAX versions and new property values
409
410*** UCA
411
412* update FractionalUCA.txt and UCARules.txt with new canonical closure
413
414*** Test suites
415- Test that APIs using Unicode property value aliases (like UnicodeSet)
416  support all of the boolean values N/Y, No/Yes, F/T, False/True
417  -> TestBinaryValues() tests in both cintltst and intltest
418
419*** LayoutEngine script information
420* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
421ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
422ScriptRunData.cpp, which is no longer needed.)
423
424The generated files have a current copyright date and "@draft" statement.
425
426* copy the above files into <icu>/source/layout, replacing the old files.
427
428Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
429and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
430
431* rebuild the layout and layoutex libraries.
432
433*** Documentation
434- Update User Guide
435  + Jamo_Short_Name, sfc->scf, binary property value aliases
436
437---------------------------------------------------------------------------- ***
438
439Unicode 5.0 update
440
441*** related Jitterbugs
442
4435084 RFE: Update to Unicode 5.0
444
445*** data files & enums & parser code
446
447* file preparation
448- ucdstrip:
449    DerivedCoreProperties.txt
450    DerivedNormalizationProps.txt
451    NormalizationTest.txt
452    PropList.txt
453    Scripts.txt
454    GraphemeBreakProperty.txt
455    SentenceBreakProperty.txt
456    WordBreakProperty.txt
457- ucdstrip and ucdmerge:
458    EastAsianWidth.txt
459    LineBreak.txt
460
461* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
462copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
463copy 5.0.0\ucd\Blocks.txt ..\unidata\
464copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
465copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
466copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
467copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
468copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
469copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
470copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
471copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
472copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
473copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
474copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
475
476ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
477ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
478ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
479ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
480ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
481ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
482ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
483ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
484ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
485ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
486
487* update FractionalUCA.txt and UCARules.txt with new canonical closure
488
489* genpname
490- run preparse.pl
491  + make sure that data.h is writable
492  + perl preparse.pl \cvs\oss\icu > out.txt
493
494* uchar.h & uscript.h & uprops.h & uprops.c & genprops
495- new block & script values
496  + script values already added in ICU 3.6 because all of ISO 15924 is now covered
497
498* build Unicode data source code for hardcoding core data
499C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
500
501ICU data make path is \cvs\oss\icu\source\data\
502ICU root path is \cvs\oss\icu
503Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
504[etc.]
505Creating data file for Unicode Character Properties
506Creating data file for Unicode Case Mapping Properties
507Creating data file for Unicode BiDi/Shaping Properties
508Creating data file for Unicode Normalization
509Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
510Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
511
512- copy the .c source files to C:\cvs\oss\icu\source\common
513  and rebuild the common library
514
515*** Unicode version numbers
516- makedata.mak
517- uchar.h
518- configure.in
519
520*** LayoutEngine script information
521* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
522ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
523ScriptRunData.cpp, which is no longer needed.)
524
525The generated files have a current copyright date and "@draft" statement.
526
527* copy the above files into <icu>/source/layout, replacing the old files.
528
529Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
530and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
531
532* rebuild the layout and layoutex libraries.
533
534---------------------------------------------------------------------------- ***
535
536Unicode 4.1 update
537
538*** related Jitterbugs
539
5404332 RFE: Update to Unicode 4.1
5414157 RBBI, TR29 4.1 updates
542
543*** data files & enums & parser code
544
545* file preparation
546- ucdstrip:
547    DerivedCoreProperties.txt
548    DerivedNormalizationProps.txt
549    NormalizationTest.txt
550    GraphemeBreakProperty.txt
551    SentenceBreakProperty.txt
552    WordBreakProperty.txt
553- ucdstrip and ucdmerge:
554    EastAsianWidth.txt
555    LineBreak.txt
556
557* add new files to the repository
558    GraphemeBreakProperty.txt
559    SentenceBreakProperty.txt
560    WordBreakProperty.txt
561
562* update FractionalUCA.txt and UCARules.txt with new canonical closure
563
564* genpname
565- handle new enumerated properties in sub read_uchar
566- run preparse.pl
567
568* uchar.h & uscript.h & uprops.h & uprops.c & genprops
569- new binary properties
570  + Pattern_Syntax
571  + Pattern_White_Space
572- new enumerated properties
573  + Grapheme_Cluster_Break
574  + Sentence_Break
575  + Word_Break
576- new block & script & line break values
577
578* gencase
579- case-ignorable changes
580  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
581  now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
582
583*** Unicode version numbers
584- makedata.mak
585- uchar.h
586- configure.in
587
588*** tests
589- verify that u_charMirror() round-trips
590- test all new properties and some new values of old properties
591
592*** other code
593
594* hardcoded Unihan range end/limit
595- Unihan range end moves from 9FA5 to 9FBB
596  search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
597  + do not modify BOCU/BOCSU code because that would change the encoding
598    and break binary compatibility!
599  + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
600    NamePrepProfile.txt
601  + ignore trietest.c: test data is arbitrary
602  + ignore tstnorm.cpp: test optimization, not important
603  + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
604  + do change line_th.txt and word_th.txt
605    by replacing hardcoded ranges with the new property values
606  + do change gennames.c
607
608source\data\brkitr\line_th.txt(229):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
609source\data\brkitr\word_th.txt(23):        \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
610source\tools\gennames\gennames.c(971):        0x4e00, 0x9fa5,
611
612* case mappings
613- compare new special casing context conditions with previous ones
614  see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
615
616* genpname
617- consider storing only the short name if it is the same as the long name
618
619*** other reviews
620- UAX #29 changes (grapheme/word/sentence breaks)
621- UAX #14 changes (line breaks)
622- Pattern_Syntax & Pattern_White_Space
623
624---------------------------------------------------------------------------- ***
625
626Unicode 4.0.1 update
627
628*** related Jitterbugs
629
6303170 RFE: Update to Unicode 4.0.1
6313171 Add new Unicode 4.0.1 properties
6323520 use Unicode 4.0.1 updates for break iteration
633
634*** data files & enums & parser code
635
636* file preparation
637- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
638- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
639
640* file fixes
641- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
642  according to PRI #26
643  http://www.unicode.org/review/resolved-pri.html#pri26
644- undone again because no corrigendum in sight;
645  instead modified tests to not check consistency on this for Unicode 4.0.1
646
647* ucdterms.txt
648- update from http://www.unicode.org/copyright.html
649  formatted for plain text
650
651* uchar.h & uprops.h & uprops.c & genprops
652- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
653- add U_LB_INSEPARABLE due to a spelling fix
654  + put short name comment only on line with new constant
655    for genpname perl script parser
656- new binary properties
657  + STerm
658  + Variation_Selector
659
660* genpname
661- fix genpname perl script so that it doesn't choke on more than 2 names per property value
662- perl script: correctly calculate the maximum number of fields per row
663
664* uscript.h
665- new script code Hrkt=Katakana_Or_Hiragana
666
667* gennorm.c track changes in DerivedNormalizationProps.txt
668- "FNC" -> "FC_NFKC"
669- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
670
671* genprops/props2.c track changes in DerivedNumericValues.txt
672- changed from 3 columns to 2, dropping the numeric type
673  + assume that the type is always numeric for Han characters,
674    and that only those are added in addition to what UnicodeData.txt lists
675
676*** Unicode version numbers
677- makedata.mak
678- uchar.h
679- configure.in
680
681*** tests
682- update test of default bidi classes according to PRI #28
683  /tsutil/cucdtst/TestUnicodeData
684  http://www.unicode.org/review/resolved-pri.html#pri28
685- bidi tests: change exemplar character for ES depending on Unicode version
686- change hardcoded expected property values where they change
687
688*** other code
689
690* name matching
691- read UCD.html
692
693* scripts
694- use new Hrkt=Katakana_Or_Hiragana
695
696* ZWJ & ZWNJ
697- are now part of combining character sequences
698- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
699