• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Copyright (c) 2012-2015 International Business Machines
2# Corporation and others. All Rights Reserved.
3#
4# This file should be in UTF-8 with a signature byte sequence ("BOM").
5#
6# collationtest.txt: Collation test data.
7#
8# created on: 2012apr13
9# created by: Markus W. Scherer
10
11# A line with "** test: description" is used for verbose and error output.
12
13# A collator can be set with "@ root" or "@ locale language-tag",
14# for example "@ locale de-u-co-phonebk".
15# An old-style locale ID can also be used, for example "@ locale de@collation=phonebook".
16
17# A collator can be built with "@ rules".
18# An "@ rules" line is followed by one or more lines with the tailoring rules.
19
20# A collator can be modified with "% attribute=value".
21
22# "* compare" tests the order (= or <) of the following strings.
23# The relation can be "=" or "<" (the level of the difference is not specified)
24# or "<1", "<2", "<c", "<3", "<4" (indicating the level of the difference).
25
26# Test sections ("* compare") are terminated by
27# definitions of new collators, changing attributes, or new test sections.
28
29** test: simple CEs & expansions
30# Many types of mappings are tested elsewhere, including via the UCA conformance tests.
31# Here we mostly cover a few unusual mappings.
32@ rules
33&\x01                           # most control codes are ignorable
34<<<\u0300                       # tertiary CE
35&9<\x00                         # NUL not ignorable
36&\uA00A\uA00B=\uA002            # two long-primary CEs
37&\uA00A\uA00B\u00050005=\uA003  # three CEs, require 64 bits
38
39* compare
40=  \x01
41=  \x02
42<3 \u0300
43<1 9
44<1 \x00
45=  \x01\x00\x02
46<1 a
47<3 a\u0300
48<2 a\u0308
49=  ä
50<1 b
51<1 か        # Hiragana Ka (U+304B)
52<2 か\u3099  # plus voiced sound mark
53=  が        # Hiragana Ga (U+304C)
54<1 \uA00A\uA00B
55=  \uA002
56<1 \uA00A\uA00B\u00050004
57<1 \uA00A\uA00B\u00050005
58=  \uA003
59<1 \uA00A\uA00B\u00050006
60
61** test: contractions
62# Create some interesting mappings, and map some normalization-inert characters
63# (which are not subject to canonical reordering)
64# to some of the same CEs to check the sequence of CEs.
65@ rules
66
67# Contractions starting with 'a' should not continue with any character < U+0300
68# so that we can test a shortcut for that.
69&a=ⓐ
70&b<bz=ⓑ
71&d<dz\u0301=ⓓ           # d+z+acute
72&z
73<a\u0301=Ⓐ              # a+acute sorts after z
74<a\u0301\u0301=Ⓑ        # a+acute+acute
75<a\u0301\u0301\u0358=Ⓒ  # a+acute+acute+dot above right
76<a\u030a=Ⓓ              # a+ring
77<a\u0323=Ⓔ              # a+dot below
78<a\u0323\u0358=Ⓕ        # a+dot below+dot above right
79<a\u0327\u0323\u030a=Ⓖ  # a+cedilla+dot below+ring
80<a\u0327\u0323bz=Ⓗ      # a+cedilla+dot below+b+z
81
82&\U0001D158=⁰           # musical notehead black (has a symbol primary)
83<\U0001D158\U0001D165=¼ # musical quarter note
84
85# deliberately missing prefix contractions:
86# dz
87# a\u0327
88# a\u0327\u0323
89# a\u0327\u0323b
90
91&\x01
92<<<\U0001D165=¹         # musical stem (ccc=216)
93<<<\U0001D16D=²         # musical augmentation dot (ccc=226)
94<<<\U0001D165\U0001D16D=³  # stem+dot (ccc=216 226)
95&\u0301=❶               # acute (ccc=230)
96&\u030a=❷               # ring (ccc=230)
97&\u0308=❸               # diaeresis (ccc=230)
98<<\u0308\u0301=❹        # diaeresis+acute (=dialytika tonos) (ccc=230 230)
99&\u0327=❺               # cedilla (ccc=202)
100&\u0323=❻               # dot below (ccc=220)
101&\u0331=❼               # macron below (ccc=220)
102<<\u0331\u0358=❽        # macron below+dot above right (ccc=220 232)
103&\u0334=❾               # tilde overlay (ccc=1)
104&\u0358=❿               # dot above right (ccc=232)
105
106&\u0f71=①               # tibetan vowel sign aa
107&\u0f72=②               # tibetan vowel sign i
108#  \u0f71\u0f72         # tibetan vowel sign aa + i = ii = U+0F73
109&\u0f73=③               # tibetan vowel sign ii (ccc=0 but lccc=129)
110
111** test: simple contractions
112
113# Some strings are chosen to cause incremental contiguous contraction matching to
114# go into partial matches for prefixes of contractions
115# (where the prefixes are deliberately not also contractions).
116# When there is no complete match, then the matching code must back out of those
117# so that discontiguous contractions work as specified.
118
119* compare
120# contraction starter with no following text, or mismatch, or blocked
121<1 a
122=  ⓐ
123<1 aa
124=  ⓐⓐ
125<1 ab
126=  ⓐb
127<1 az
128=  ⓐz
129
130* compare
131<1 a
132<2 a\u0308\u030a  # ring blocked by diaeresis
133=  ⓐ❸❷
134<2 a\u0327
135=  ⓐ❺
136
137* compare
138<2 \u0308
139=  ❸
140<2 \u0308\u030a\u0301  # acute blocked by ring
141=  ❸❷❶
142
143* compare
144<1 \U0001D158
145=  ⁰
146<1 \U0001D158\U0001D165
147=  ¼
148
149# no discontiguous contraction because of missing prefix contraction d+z,
150# and a starter ('z') after the 'd'
151* compare
152<1 dz\u0323\u0301
153=  dz❻❶
154
155# contiguous contractions
156* compare
157<1 abz
158=  ⓐⓑ
159<1 abzz
160=  ⓐⓑz
161
162* compare
163<1 a
164<1 z
165<1 a\u0301
166=  Ⓐ
167<1 a\u0301\u0301
168=  Ⓑ
169<1 a\u0301\u0301\u0358
170=  Ⓒ
171<1 a\u030a
172=  Ⓓ
173<1 a\u0323\u0358
174=  Ⓕ
175<1 a\u0327\u0323\u030a  # match despite missing prefix
176=  Ⓖ
177<1 a\u0327\u0323bz
178=  Ⓗ
179
180* compare
181<2 \u0308\u0308\u0301  # acute blocked from first diaeresis, contracts with second
182=  ❸❹
183
184* compare
185<1 \U0001D158\U0001D165
186=  ¼
187
188* compare
189<3 \U0001D165\U0001D16D
190=  ³
191
192** test: discontiguous contractions
193* compare
194<1 a\u0327\u030a                # a+ring skips cedilla
195=  Ⓓ❺
196<2 a\u0327\u0327\u030a          # a+ring skips 2 cedillas
197=  Ⓓ❺❺
198<2 a\u0327\u0327\u0327\u030a    # a+ring skips 3 cedillas
199=  Ⓓ❺❺❺
200<2 a\u0334\u0327\u0327\u030a    # a+ring skips tilde overlay & 2 cedillas
201=  Ⓓ❾❺❺
202<1 a\u0327\u0323                # a+dot below skips cedilla
203=  Ⓔ❺
204<1 a\u0323\u0301\u0358          # a+dot below+dot ab.r.: 2-char match, then skips acute
205=  Ⓕ❶
206<2 a\u0334\u0323\u0358          # a+dot below skips tilde overlay
207=  Ⓕ❾
208
209* compare
210<2 \u0331\u0331\u0358           # macron below+dot ab.r. skips the second macron below
211=  ❽❼
212
213* compare
214<1 a\u0327\u0331\u0323\u030a    # a+ring skips cedilla, macron below, dot below (dot blocked by macron)
215=  Ⓓ❺❼❻
216<1 a\u0327\u0323\U0001D16D\u030a  # a+dot below skips cedilla
217=  Ⓔ❺²❷
218<2 a\u0327\u0327\u0323\u030a    # a+dot below skips 2 cedillas
219=  Ⓔ❺❺❷
220<2 a\u0327\u0323\u0323\u030a    # a+dot below skips cedilla
221=  Ⓔ❺❻❷
222<2 a\u0334\u0327\u0323\u030a    # a+dot below skips tilde overlay & cedilla
223=  Ⓔ❾❺❷
224
225* compare
226<1 \U0001D158\u0327\U0001D165   # quarter note skips cedilla
227=  ¼❺
228<1 a\U0001D165\u0323            # a+dot below skips stem
229=  Ⓔ¹
230
231# partial contiguous match, backs up, matches discontiguous contraction
232<1 a\u0327\u0323b
233=  Ⓔ❺b
234<1 a\u0327\u0323ba
235=  Ⓔ❺bⓐ
236
237# a+acute+acute+dot above right skips cedilla, continues matching 2 same-ccc combining marks
238* compare
239<1 a\u0327\u0301\u0301\u0358
240=  Ⓒ❺
241
242# FCD but not NFD
243* compare
244<1 a\u0f73\u0301                # a+acute skips tibetan ii
245=  Ⓐ③
246
247# FCD but the 0f71 inside the 0f73 must be skipped
248# to match the discontiguous contraction of the first 0f71 with the trailing 0f72 inside the 0f73
249* compare
250<1 \u0f71\u0f73                 # == \u0f73\u0f71 == \u0f71\u0f71\u0f72
251=  ③①
252
253** test: discontiguous contractions with nested contractions
254* compare
255<1 a\u0323\u0308\u0301\u0358
256=  Ⓕ❹
257<2 a\u0323\u0308\u0301\u0308\u0301\u0358
258=  Ⓕ❹❹
259
260** test: discontiguous contractions with interleaved contractions
261* compare
262# a+ring & cedilla & macron below+dot above right
263<1 a\u0327\u0331\u030a\u0358
264=  Ⓓ❺❽
265
266# a+ring & 1x..3x macron below+dot above right
267<2 a\u0331\u030a\u0358
268=  Ⓓ❽
269<2 a\u0331\u0331\u030a\u0358\u0358
270=  Ⓓ❽❽
271# also skips acute
272<2 a\u0331\u0331\u0331\u030a\u0301\u0358\u0358\u0358
273=  Ⓓ❽❽❽❶
274
275# a+dot below & stem+augmentation dot, followed by contiguous d+z+acute
276<1 a\U0001D165\u0323\U0001D16Ddz\u0301
277=  Ⓔ³ⓓ
278
279** test: some simple string comparisons
280@ root
281* compare
282# first string compares against ""
283= \u0000
284< a
285<1 b
286<3 B
287= \u0000B\u0000
288
289** test: compare with strength=primary
290% strength=primary
291* compare
292<1 a
293<1 b
294= B
295
296** test: compare with strength=secondary
297% strength=secondary
298* compare
299<1 a
300<1 b
301= B
302
303** test: compare with strength=tertiary
304% strength=tertiary
305* compare
306<1 a
307<1 b
308<3 B
309
310** test: compare with strength=quaternary
311% strength=quaternary
312* compare
313<1 a
314<1 b
315<3 B
316
317** test: compare with strength=identical
318% strength=identical
319* compare
320<1 a
321<1 b
322<3 B
323
324** test: côté with forwards secondary
325@ root
326* compare
327<1 cote
328<2 coté
329<2 côte
330<2 côté
331
332** test: côté with forwards secondary vs. U+FFFE merge separator
333# Merged sort keys: On each level, any difference in the first segment
334# must trump any further difference.
335* compare
336<1 cote\uFFFEcôté
337<2 coté\uFFFEcôte
338<2 côte\uFFFEcoté
339<2 côté\uFFFEcote
340
341** test: côté with backwards secondary
342% backwards=on
343* compare
344<1 cote
345<2 côte
346<2 coté
347<2 côté
348
349** test: côté with backwards secondary vs. U+FFFE merge separator
350# Merged sort keys: On each level, any difference in the first segment
351# must trump any further difference.
352* compare
353<1 cote\uFFFEcôté
354<2 côte\uFFFEcoté
355<2 coté\uFFFEcôte
356<2 côté\uFFFEcote
357
358** test: U+FFFE on identical level
359@ root
360% strength=identical
361* compare
362# All of these control codes are completely-ignorable, so that
363# their low code points are compared with the merge separator.
364# The merge separator must compare less than any other character.
365<1 \uFFFE\u0001\u0002\u0003
366<i \u0001\uFFFE\u0002\u0003
367<i \u0001\u0002\uFFFE\u0003
368<i \u0001\u0002\u0003\uFFFE
369
370* compare
371# The merge separator must even compare less than U+0000.
372<1 \uFFFE\u0000\u0000
373<i \u0000\uFFFE\u0000
374<i \u0000\u0000\uFFFE
375
376** test: Hani < surrogates < U+FFFD
377# Note: compareUTF8() treats unpaired surrogates like U+FFFD,
378# so with that the strings with surrogates will compare equal to each other
379# and equal to the string with U+FFFD.
380@ root
381% strength=identical
382* compare
383<1 abz
384<1 a\u4e00z
385<1 a\U00020000z
386<1 a\ud800z
387<1 a\udbffz
388<1 a\udc00z
389<1 a\udfffz
390<1 a\ufffdz
391
392** test: script reordering
393@ root
394% reorder Hani Zzzz digit
395* compare
396<1 ?
397<1 +
398<1 丂
399<1 a
400<1 α
401<1 5
402
403% reorder default
404* compare
405<1 ?
406<1 +
407<1 5
408<1 a
409<1 α
410<1 丂
411
412** test: empty rules
413@ rules
414* compare
415<1 a
416<2 ä
417<3 Ä
418<1 b
419
420** test: very simple rules
421@ rules
422&a=e<<<<q<<<<r<x<<<X<<y<<<Y;z,Z
423% strength=quaternary
424* compare
425<1 a
426=  e
427<4 q
428<4 r
429<1 x
430<3 X
431<2 y
432<3 Y
433<2 z
434<3 Z
435
436** test: tailoring twice before a root position: primary
437@ rules
438&[before 1]b<p
439&[before 1]b<q
440* compare
441<1 a
442<1 p
443<1 q
444<1 b
445
446** test: tailoring twice before a root position: secondary
447@ rules
448&[before 2]ſ<<p
449&[before 2]ſ<<q
450* compare
451<1 s
452<2 p
453<2 q
454<2 ſ
455
456# secondary-before common weight
457@ rules
458&[before 2]b<<p
459&[before 2]b<<q
460* compare
461<1 a
462<1 p
463<2 q
464<2 b
465
466** test: tailoring twice before a root position: tertiary
467@ rules
468&[before 3]B<<<p
469&[before 3]B<<<q
470* compare
471<1 b
472<3 p
473<3 q
474<3 B
475
476# tertiary-before common weight
477@ rules
478&[before 3]b<<<p
479&[before 3]b<<<q
480* compare
481<1 a
482<1 p
483<3 q
484<3 b
485
486@ rules
487&[before 2]b<<s
488&[before 3]s<<<p
489&[before 3]s<<<q
490* compare
491<1 a
492<1 p
493<3 q
494<3 s
495<2 b
496
497** test: tailor after completely ignorable
498@ rules
499&\x00<<<x<<y
500* compare
501= \x00
502= \x1F
503<3 x
504<2 y
505
506** test: secondary tailoring gaps, ICU ticket 9362
507@ rules
508&[before 2]s<<'_'
509&s<<r  # secondary between s and ſ (long s)
510&ſ<<*a-q  # more than 15 between ſ and secondary CE boundary
511&[before 2][first primary ignorable]<<u<<v  # between secondary CE boundary & lowest secondary CE
512&[last primary ignorable]<<y<<z
513
514* compare
515<2 u
516<2 v
517<2 \u0332  # lowest secondary CE
518<2 \u0308
519<2 y
520<2 z
521<1 s_
522<2 ss
523<2 sr
524<2 sſ
525<2 sa
526<2 sb
527<2 sp
528<2 sq
529<2 sus
530<2 svs
531<2 rs
532
533** test: tertiary tailoring gaps, ICU ticket 9362
534@ rules
535&[before 3]t<<<'_'
536&t<<<r  # tertiary between t and fullwidth t
537&ᵀ<<<*a-q  # more than 15 between ᵀ (modifier letter T) and tertiary CE boundary
538&[before 3][first secondary ignorable]<<<u<<<v  # between tertiary CE boundary & lowest tertiary CE
539&[last secondary ignorable]<<<y<<<z
540
541* compare
542<3 u
543<3 v
544# Note: The root collator currently does not map any characters to tertiary CEs.
545<3 y
546<3 z
547<1 t_
548<3 tt
549<3 tr
550<3 tt
551<3 tᵀ
552<3 ta
553<3 tb
554<3 tp
555<3 tq
556<3 tut
557<3 tvt
558<3 rt
559
560** test: secondary & tertiary around root character
561@ rules
562&[before 2]m<<r
563&m<<s
564&[before 3]m<<<u
565&m<<<v
566* compare
567<1 l
568<1 r
569<2 u
570<3 m
571<3 v
572<2 s
573<1 n
574
575** test: secondary & tertiary around tailored item
576@ rules
577&m<x
578&[before 2]x<<r
579&x<<s
580&[before 3]x<<<u
581&x<<<v
582* compare
583<1 m
584<1 r
585<2 u
586<3 x
587<3 v
588<2 s
589<1 n
590
591** test: more nesting of secondary & tertiary before
592@ rules
593&[before 3]m<<<u
594&[before 2]m<<r
595&[before 3]r<<<q
596&m<<<w
597&m<<t
598&[before 3]w<<<v
599&w<<<x
600&w<<s
601* compare
602<1 l
603<1 q
604<3 r
605<2 u
606<3 m
607<3 v
608<3 w
609<3 x
610<2 s
611<2 t
612<1 n
613
614** test: case bits
615@ rules
616&w<x  # tailored CE getting case bits
617  =uv=uV=Uv=UV  # 2 chars -> 1 CE
618&ae=ch=cH=Ch=CH  # 2 chars -> 2 CEs
619&rst=yz=yZ=Yz=YZ  # 2 chars -> 3 CEs
620% caseFirst=lower
621* compare
622<1 ae
623=  ch
624<3 cH
625<3 Ch
626<3 CH
627<1 rst
628=  yz
629<3 yZ
630<3 Yz
631<3 YZ
632<1 w
633<1 x
634=  uv
635<3 uV
636=  Uv  # mixed case on single CE cannot distinguish variations
637<3 UV
638
639** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=lower
640@ rules
641&\u0001<<<t<<<T  # tertiary CEs
642% caseFirst=lower
643* compare
644<1 aa
645<3 aat
646<3 aaT
647<3 aA
648<3 aAt
649<3 ata
650<3 aTa
651
652** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=upper
653% caseFirst=upper
654* compare
655<1 aA
656<3 aAt
657<3 aa
658<3 aat
659<3 aaT
660<3 ata
661<3 aTa
662
663** test: reset on expansion, ICU tickets 9415 & 9593
664@ rules
665&æ<x    # tailor the last primary CE so that x sorts between ae and af
666&æb=bæ  # copy all reset CEs to make bæ sort the same
667&각<h    # copy/tailor 3 CEs to make h sort before the next Hangul syllable 갂
668&⒀<<y   # copy/tailor 4 CEs to make y sort with only a secondary difference
669&l·=z   # handle the pre-context for · when fetching reset CEs
670   <<u  # copy/tailor 2 CEs
671
672* compare
673<1 ae
674<2 æ
675<1 x
676<1 af
677
678* compare
679<1 aeb
680<2 æb
681=  bæ
682
683* compare
684<1 각
685<1 h
686<1 갂
687<1 갃
688
689* compare
690<1 ·    # by itself: primary CE
691<1 l
692<2 l·   # l+middle dot has only a secondary difference from l
693=  z
694<2 u
695
696* compare
697<1 (13)
698<3 ⒀  # DUCET sets special tertiary weights in all CEs
699<2 y
700<1 (13[
701
702% alternate=shifted
703* compare
704<1 (13)
705=  13
706<3 ⒀
707=  y  # alternate=shifted removes the tailoring difference on the last CE
708<1 14
709
710** test: contraction inside extension, ICU ticket 9378
711@ rules
712&а<<х/й     # all letters are Cyrillic
713* compare
714<1 ай
715<2 х
716
717** test: no duplicate tailored CEs for different reset positions with same CEs, ICU ticket 10104
718@ rules
719&t<x &ᵀ<y           # same primary weights
720&q<u &[before 1]ꝗ<v # q and ꝗ are primary adjacent
721* compare
722<1 q
723<1 u
724<1 v
725<1 ꝗ
726<1 t
727<3 ᵀ
728<1 y
729<1 x
730
731# Principle: Each rule builds on the state of preceding rules and ignores following rules.
732
733** test: later rule does not affect earlier reset position, ICU ticket 10105
734@ rules
735&a < u < v < w  &ov < x  &b < v
736* compare
737<1 oa
738<1 ou
739<1 x    # CE(o) followed by CE between u and w
740<1 ow
741<1 ob
742<1 ov
743
744** test: later rule does not affect earlier extension (1), ICU ticket 10105
745@ rules
746&a=x/b &v=b
747% strength=secondary
748* compare
749<1 B
750<1 c
751<1 v
752=  b
753* compare
754<1 AB
755=  x
756<1 ac
757<1 av
758=  ab
759
760** test: later rule does not affect earlier extension (2), ICU ticket 10105
761@ rules
762&a <<< c / e &g <<< e / l
763% strength=secondary
764* compare
765<1 AE
766=  c
767<2 æ
768<1 agl
769=  ae
770
771** test: later rule does not affect earlier extension (3), ICU ticket 10105
772@ rules
773&a = b / c  &d = c / e
774% strength=secondary
775* compare
776<1 AC  # C is still only tertiary different from the original c
777=  b
778<1 ade
779=  ac
780
781** test: extension contains tailored character, ICU ticket 10105
782@ rules
783&a=e &b=u/e
784* compare
785<1 a
786=  e
787<1 ba
788=  be
789=  u
790
791** test: add simple mappings for characters with root context
792@ rules
793&z=·    # middle dot has a prefix mapping in the CLDR root
794&n=и    # и (U+0438) has contractions in the root
795* compare
796<1 l
797<2 l·   # root mapping for l|· still works
798<1 z
799=  ·
800* compare
801<1 n
802=  и
803<1 И
804<1 и\u0306  # root mapping for й=и\u0306 still works
805=  й
806<3 Й
807
808** test: add context mappings around characters with root context
809@ rules
810&z=·h   # middle dot has a prefix mapping in the CLDR root
811&n=ә|и  # и (U+0438) has contractions in the root
812* compare
813<1 l
814<2 l·   # root mapping for l|· still works
815<1 z
816=  ·h
817* compare
818<1 и
819<3 И
820<1 и\u0306  # root mapping for й=и\u0306 still works
821=  й
822* compare
823<1 әn
824=  әи
825<1 әo
826
827** test: many secondary CEs at the top of their range
828@ rules
829&[last primary ignorable]<<*\u2801-\u28ff
830* compare
831<2 \u0308
832<2 \u2801
833<2 \u2802
834<2 \u2803
835<2 \u2804
836<2 \u28fd
837<2 \u28fe
838<2 \u28ff
839<1 \x20
840
841** test: many tertiary CEs at the top of their range
842@ rules
843&[last secondary ignorable]<<<*a-z
844* compare
845<3 a
846<3 b
847<3 c
848<3 d
849# e..w
850<3 x
851<3 y
852<3 z
853<2 \u0308
854
855** test: tailor contraction together with nearly equivalent prefix, ICU ticket 10101
856@ rules
857&a=p|x &b=px &c=op
858* compare
859<1 b
860=  px
861<3 B
862<1 c
863=  op
864<3 C
865* compare
866<1 ca
867=  opx  # first contraction op, then prefix p|x
868<3 cA
869<3 Ca
870
871** test: reset position with prefix (pre-context), ICU ticket 10102
872@ rules
873&a=p|x &px=y
874* compare
875<1 pa
876=  px
877=  y
878<3 pA
879<1 q
880<1 x
881
882** test: prefix+contraction together (1), ICU ticket 10071
883@ rules
884&x=a|bc
885* compare
886<1 ab
887<1 Abc
888<1 abd
889<1 ac
890<1 aw
891<1 ax
892=  abc
893<3 aX
894<3 Ax
895<1 b
896<1 bb
897<1 bc
898<3 bC
899<3 Bc
900<1 bd
901
902** test: prefix+contraction together (2), ICU ticket 10071
903@ rules
904&w=bc &x=a|b
905* compare
906<1 w
907=  bc
908<3 W
909* compare
910<1 aw
911<1 ax
912=  ab
913<3 aX
914<1 axb
915<1 axc
916=  abc  # prefix match a|b takes precedence over contraction match bc
917<3 abC
918<1 abd
919<1 ay
920
921** test: prefix+contraction together (3), ICU ticket 10071
922@ rules
923&x=a|b &w=bc    # reverse order of rules as previous test, order should not matter here
924* compare       # same "compare" sequences as previous test
925<1 w
926=  bc
927<3 W
928* compare
929<1 aw
930<1 ax
931=  ab
932<3 aX
933<1 axb
934<1 axc
935=  abc  # prefix match a|b takes precedence over contraction match bc
936<3 abC
937<1 abd
938<1 ay
939
940** test: no mapping p|c, falls back to contraction ch, CLDR ticket 5962
941@ rules
942&d=ch &v=p|ci
943* compare
944<1 pc
945<3 pC
946<1 pcH
947<1 pcI
948<1 pd
949=  pch  # no-prefix contraction ch matches
950<3 pD
951<1 pv
952=  pci  # prefix+contraction p|ci matches
953<3 pV
954
955** test: tailor in & around compact ranges of root primaries
956# The Ogham characters U+1681..U+169A are in simple ascending order of primary CEs
957# which should be reliably encoded as one range in the root elements data.
958@ rules
959&[before 1]ᚁ<a
960&ᚁ<b
961&[before 1]ᚂ<c
962&ᚂ<d
963&[before 1]ᚚ<y
964&ᚚ<z
965&[before 2]ᚁ<<r
966&ᚁ<<s
967&[before 3]ᚚ<<<t
968&ᚚ<<<u
969* compare
970<1 ᣵ    # U+18F5 last Canadian Aboriginal
971<1 a
972<1 r
973<2 ᚁ
974<2 s
975<1 b
976<1 c
977<1 ᚂ
978<1 d
979<1 ᚃ
980<1 ᚙ
981<1 y
982<1 t
983<3 ᚚ
984<3 u
985<1 z
986<1 ᚠ    # U+16A0 first Runic
987
988** test: suppressContractions
989@ rules
990&z<ch<әж [suppressContractions [·cә]]
991* compare
992<1 ch
993<3 cH   # ch was suppressed
994<1 l
995<1 l·   # primary difference, not secondary, because l|· was suppressed
996<1 ә
997<2 ә\u0308  # secondary difference, not primary, because contractions for ә were suppressed
998<1 әж
999<3 әЖ
1000
1001** test: Hangul & Jamo
1002@ rules
1003&L=\u1100  # first Jamo L
1004&V=\u1161  # first Jamo V
1005&T=\u11A8  # first Jamo T
1006&\uAC01<<*\u4E00-\u4EFF  # first Hangul LVT syllable & lots of secondary diffs
1007* compare
1008<1 Lv
1009<3 LV
1010=  \u1100\u1161
1011=  \uAC00
1012<1 LVt
1013<3 LVT
1014=  \u1100\u1161\u11A8
1015=  \uAC00\u11A8
1016=  \uAC01
1017<2 LVT\u0308
1018<2 \u4E00
1019<2 \u4E01
1020<2 \u4E80
1021<2 \u4EFF
1022<2 LV\u0308T
1023<1 \uAC02
1024
1025** test: adjust special reset positions according to previous rules, CLDR ticket 6070
1026@ rules
1027&[last variable]<x
1028[maxVariable space]  # has effect only after building, no effect on following rules
1029&[last variable]<y
1030&[before 1][first regular]<z
1031* compare
1032<1 ?  # some punctuation
1033<1 x
1034<1 y
1035<1 z
1036<1 $  # some symbol
1037
1038@ rules
1039&[last primary ignorable]<<x<<<y
1040&[last primary ignorable]<<z
1041* compare
1042<2 \u0358
1043<2 x
1044<3 y
1045<2 z
1046<1 \x20
1047
1048@ rules
1049&[last secondary ignorable]<<<x
1050&[last secondary ignorable]<<<y
1051* compare
1052<3 x
1053<3 y
1054<2 \u0358
1055
1056@ rules
1057&[before 2][first variable]<<z
1058&[before 2][first variable]<<y
1059&[before 3][first variable]<<<x
1060&[before 3][first variable]<<<w
1061&[before 1][first variable]<v
1062&[before 2][first variable]<<u
1063&[before 3][first variable]<<<t
1064&[before 2]\uFDD1\xA0<<s  # FractionalUCA.txt: FDD1 00A0, SPACE first primary
1065* compare
1066<2 \u0358
1067<1 s
1068<2 \uFDD1\xA0
1069<1 t
1070<3 u
1071<2 v
1072<1 w
1073<3 x
1074<3 y
1075<2 z
1076<2 \t
1077
1078@ rules
1079&[before 2][first regular]<<z
1080&[before 3][first regular]<<<y
1081&[before 1][first regular]<x
1082&[before 3][first regular]<<<w
1083&[before 2]\uFDD1\u263A<<v  # FractionalUCA.txt: FDD1 263A, SYMBOL first primary
1084&[before 3][first regular]<<<u
1085&[before 1][first regular]<p  # primary before the boundary: becomes variable
1086&[before 3][first regular]<<<t  # not affected by p
1087&[last variable]<q              # after p!
1088* compare
1089<1 ?
1090<1 p
1091<1 q
1092<1 t
1093<3 u
1094<3 v
1095<1 w
1096<3 x
1097<1 y
1098<3 z
1099<1 $
1100
1101# check that p & q are indeed variable
1102% alternate=shifted
1103* compare
1104=  ?
1105=  p
1106=  q
1107<1 t
1108<3 u
1109<3 v
1110<1 w
1111<3 x
1112<1 y
1113<3 z
1114<1 $
1115
1116@ rules
1117&[before 2][first trailing]<<z
1118&[before 1][first trailing]<y
1119&[before 3][first trailing]<<<x
1120* compare
1121<1 \u4E00  # first Han, first implicit
1122<1 \uFDD1\uFDD0  # FractionalUCA.txt: unassigned first primary
1123# Note: The root collator currently does not map any characters to the trailing first boundary primary.
1124<1 x
1125<3 y
1126<1 z
1127<2 \uFFFD  # The root collator currently maps U+FFFD to the first real trailing primary.
1128
1129@ rules
1130&[before 2][first primary ignorable]<<z
1131&[before 2][first primary ignorable]<<y
1132&[before 3][first primary ignorable]<<<x
1133&[before 3][first primary ignorable]<<<w
1134* compare
1135=  \x01
1136<2 w
1137<3 x
1138<3 y
1139<2 z
1140<2 \u0301
1141
1142@ rules
1143&[before 3][first secondary ignorable]<<<y
1144&[before 3][first secondary ignorable]<<<x
1145* compare
1146=  \x01
1147<3 x
1148<3 y
1149<2 \u0301
1150
1151** test: canonical closure
1152@ rules
1153&X=A &U=Â
1154* compare
1155<1 U
1156=  Â
1157=  A\u0302
1158<2 Ú  # U with acute
1159=  U\u0301
1160=  Ấ  # A with circumflex & acute
1161=  Â\u0301
1162=  A\u0302\u0301
1163<1 X
1164=  A
1165<2 X\u030A  # with ring above
1166=  Å
1167=  A\u030A
1168=  \u212B  # Angstrom sign
1169
1170@ rules
1171&x=\u5140\u55C0
1172* compare
1173<1 x
1174=  \u5140\u55C0
1175=  \u5140\uFA0D
1176=  \uFA0C\u55C0
1177=  \uFA0C\uFA0D  # CJK compatibility characters
1178<3 X
1179
1180# canonical closure on prefix rules, ICU ticket 9444
1181@ rules
1182&x=ä|ŝ
1183* compare
1184<1 äs  # not tailored
1185<1 äx
1186=  äŝ
1187=  a\u0308s\u0302
1188=  a\u0308ŝ
1189=  äs\u0302
1190<3 äX
1191
1192** test: conjoining Jamo map to expansions
1193@ rules
1194&gg=\u1101  # Jamo Lead consonant GG
1195&nj=\u11AC  # Jamo Trail consonant NJ
1196* compare
1197<1 gg\u1161nj
1198=  \u1101\u1161\u11AC
1199=  \uAE4C\u11AC
1200=  \uAE51
1201<3 gg\u1161nJ
1202<1 \u1100\u1100
1203
1204** test: canonical tail closure, ICU ticket 5913
1205@ rules
1206&a<â
1207* compare
1208<1 a
1209<1 â              # tailored
1210=  a\u0302
1211<2 a\u0323\u0302  # discontiguous contraction
1212=  ạ\u0302        # equivalent
1213=  ậ              # equivalent
1214<1 b
1215
1216@ rules
1217&a<ạ
1218* compare
1219<1 a
1220<1 ạ              # tailored
1221=  a\u0323
1222<2 a\u0323\u0302  # contiguous contraction plus extra diacritic
1223=  ạ\u0302        # equivalent
1224=  ậ              # equivalent
1225<1 b
1226
1227# Tail closure should work even if there is a prefix and/or contraction.
1228@ rules
1229&a<\u5140|câ
1230# In order to find discontiguous contractions for \u5140|câ
1231# there must exist a mapping for \u5140|ca, regardless of what it maps to.
1232# (This follows from the UCA spec.)
1233&x=\u5140|ca
1234* compare
1235<1 \u5140a
1236=  \uFA0Ca
1237<1 \u5140câ              # tailored
1238=  \uFA0Ccâ
1239=  \u5140ca\u0302
1240=  \uFA0Cca\u0302
1241<2 \u5140ca\u0323\u0302  # discontiguous contraction
1242=  \uFA0Cca\u0323\u0302
1243=  \u5140cạ\u0302
1244=  \uFA0Ccạ\u0302
1245=  \u5140cậ
1246=  \uFA0Ccậ
1247<1 \u5140b
1248=  \uFA0Cb
1249<1 \u5140x
1250=  \u5140ca
1251
1252# Double-check that without the extra mapping there will be no discontiguous match.
1253@ rules
1254&a<\u5140|câ
1255* compare
1256<1 \u5140a
1257=  \uFA0Ca
1258<1 \u5140câ              # tailored
1259=  \uFA0Ccâ
1260=  \u5140ca\u0302
1261=  \uFA0Cca\u0302
1262<1 \u5140b
1263=  \uFA0Cb
1264<1 \u5140ca\u0323\u0302  # no discontiguous contraction
1265=  \uFA0Cca\u0323\u0302
1266=  \u5140cạ\u0302
1267=  \uFA0Ccạ\u0302
1268=  \u5140cậ
1269=  \uFA0Ccậ
1270
1271@ rules
1272&a<cạ
1273* compare
1274<1 a
1275<1 cạ              # tailored
1276=  ca\u0323
1277<2 ca\u0323\u0302  # contiguous contraction plus extra diacritic
1278=  cạ\u0302        # equivalent
1279=  cậ              # equivalent
1280<1 b
1281
1282# ᾢ = U+1FA2 GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI
1283#   = 03C9 0313 0300 0345
1284# ccc = 0, 230, 230, 240
1285@ rules
1286&δ=αῳ
1287# In order to find discontiguous contractions for αῳ
1288# there must exist a mapping for αω, regardless of what it maps to.
1289# (This follows from the UCA spec.)
1290&ε=αω
1291* compare
1292<1 δ
1293=  αῳ
1294=  αω\u0345
1295<2 αω\u0313\u0300\u0345  # discontiguous contraction
1296=  αὠ\u0300\u0345
1297=  αὢ\u0345
1298=  αᾢ
1299<2 αω\u0300\u0313\u0345
1300=  αὼ\u0313\u0345
1301=  αῲ\u0313  # not FCD
1302<1 ε
1303=  αω
1304
1305# Double-check that without the extra mapping there will be no discontiguous match.
1306@ rules
1307&δ=αῳ
1308* compare
1309<1 αω\u0313\u0300\u0345  # no discontiguous contraction
1310=  αὠ\u0300\u0345
1311=  αὢ\u0345
1312=  αᾢ
1313<2 αω\u0300\u0313\u0345
1314=  αὼ\u0313\u0345
1315=  αῲ\u0313  # not FCD
1316<1 δ
1317=  αῳ
1318=  αω\u0345
1319
1320# Add U+0315 COMBINING COMMA ABOVE RIGHT which has ccc=232.
1321# Tests code paths where the tailored string has a combining mark
1322# that does not occur in any composite's decomposition.
1323@ rules
1324&δ=αὼ\u0315
1325* compare
1326<1 αω\u0313\u0300\u0315  # Not tailored: The grave accent blocks the comma above.
1327=  αὠ\u0300\u0315
1328=  αὢ\u0315
1329<1 δ
1330=  αὼ\u0315
1331=  αω\u0300\u0315
1332<2 αω\u0300\u0315\u0345
1333=  αὼ\u0315\u0345
1334=  αῲ\u0315  # not FCD
1335
1336** test: danish a+a vs. a-umlaut, ICU ticket 9319
1337@ rules
1338&z<aa
1339* compare
1340<1 z
1341<1 aa
1342<2 aa\u0308
1343=  aä
1344
1345** test: Jamo L with and in prefix
1346# Useful for the Korean "searchjl" tailoring (instead of contractions of pairs of Jamo L).
1347@ rules
1348# Jamo Lead consonant G after G or GG
1349&[last primary ignorable]<<\u1100|\u1100=\u1101|\u1100
1350# Jamo Lead consonant GG sorts like G+G
1351&\u1100\u1100=\u1101
1352# Note: Making G|GG and GG|GG sort the same as G|G+G
1353# would require the ability to reset on G|G+G,
1354# or we could make G-after-G equal to some secondary-CE character,
1355# and reset on a pair of those.
1356# (It does not matter much if there are at most two G in a row in real text.)
1357* compare
1358<1 \u1100
1359<2 \u1100\u1100  # only one primary from a sequence of G lead consonants
1360=  \u1101
1361<2 \u1100\u1100\u1100
1362=  \u1101\u1100
1363# but not = \u1100\u1101, see above
1364<1 \u1100\u1161
1365=  \uAC00
1366<2 \u1100\u1100\u1161
1367=  \u1100\uAC00  # prefix match from the L of the LV syllable
1368=  \u1101\u1161
1369=  \uAE4C
1370
1371** test: proposed Korean "searchjl" tailoring with prefixes, CLDR ticket 6546
1372@ rules
1373# Low secondary CEs for Jamo V & T.
1374# Note: T should sort before V for proper syllable order.
1375&\u0332  # COMBINING LOW LINE (first primary ignorable)
1376<<\u1161<<\u1162
1377
1378# Korean Jamo lead consonant search rules, part 2:
1379# Make modern compound L jamo primary equivalent to non-compound forms.
1380
1381# Secondary CEs for Jamo L-after-L, greater than Jamo V & T.
1382&\u0313  # COMBINING COMMA ABOVE (second primary ignorable)
1383=\u1100|\u1100
1384=\u1103|\u1103
1385=\u1107|\u1107
1386=\u1109|\u1109
1387=\u110C|\u110C
1388
1389# Compound L Jamo map to equivalent expansions of primary+secondary CE.
1390&\u1100\u0313=\u1101<<<\u3132  # HANGUL CHOSEONG SSANGKIYEOK, HANGUL LETTER SSANGKIYEOK
1391&\u1103\u0313=\u1104<<<\u3138  # HANGUL CHOSEONG SSANGTIKEUT, HANGUL LETTER SSANGTIKEUT
1392&\u1107\u0313=\u1108<<<\u3143  # HANGUL CHOSEONG SSANGPIEUP, HANGUL LETTER SSANGPIEUP
1393&\u1109\u0313=\u110A<<<\u3146  # HANGUL CHOSEONG SSANGSIOS, HANGUL LETTER SSANGSIOS
1394&\u110C\u0313=\u110D<<<\u3149  # HANGUL CHOSEONG SSANGCIEUC, HANGUL LETTER SSANGCIEUC
1395
1396* compare
1397<1 \u1100\u1161
1398=  \uAC00
1399<2 \u1100\u1162
1400=  \uAC1C
1401<2 \u1100\u1100\u1161
1402=  \u1100\uAC00
1403=  \u1101\u1161
1404=  \uAE4C
1405<3 \u3132\u1161
1406
1407** test: Hangul syllables in prefix & in the interior of a contraction
1408@ rules
1409&x=\u1100\u1161|a\u1102\u1162z
1410* compare
1411<1 \u1100\u1161x
1412=  \u1100\u1161a\u1102\u1162z
1413=  \u1100\u1161a\uB0B4z
1414=  \uAC00a\u1102\u1162z
1415=  \uAC00a\uB0B4z
1416
1417** test: digits are unsafe-backwards when numeric=on
1418@ root
1419% numeric=on
1420* compare
1421# If digits are not unsafe, then numeric collation sees "1"=="01" and "b">"a".
1422# We need to back up before the identical prefix "1" and compare the full numbers.
1423<1 11b
1424<1 101a
1425
1426** test: simple locale data test
1427@ locale de
1428* compare
1429<1 a
1430<2 ä
1431<1 ae
1432<2 æ
1433
1434@ locale de-u-co-phonebk
1435* compare
1436<1 a
1437<1 ae
1438<2 ä
1439<2 æ
1440
1441# The following test cases were moved here from ICU 52's DataDrivenCollationTest.txt.
1442
1443** test: DataDrivenCollationTest/TestMorePinyin
1444# Testing the primary strength.
1445@ locale zh
1446% strength=primary
1447* compare
1448< lā
1449= lĀ
1450= Lā
1451= LĀ
1452< lān
1453= lĀn
1454< lē
1455= lĒ
1456= Lē
1457= LĒ
1458< lēn
1459= lĒn
1460
1461** test: DataDrivenCollationTest/TestLithuanian
1462# Lithuanian sort order.
1463@ locale lt
1464* compare
1465< cz
1466< č
1467< d
1468< iz
1469< j
1470< sz
1471< š
1472< t
1473< zz
1474< ž
1475
1476** test: DataDrivenCollationTest/TestLatvian
1477# Latvian sort order.
1478@ locale lv
1479* compare
1480< cz
1481< č
1482< d
1483< gz
1484< ģ
1485< h
1486< iz
1487< j
1488< kz
1489< ķ
1490< l
1491< lz
1492< ļ
1493< m
1494< nz
1495< ņ
1496< o
1497< rz
1498< ŗ
1499< s
1500< sz
1501< š
1502< t
1503< zz
1504< ž
1505
1506** test: DataDrivenCollationTest/TestEstonian
1507# Estonian sort order.
1508@ locale et
1509* compare
1510< sy
1511< š
1512< šy
1513< z
1514< zy
1515< ž
1516< v
1517< va
1518< w
1519< õ
1520< õy
1521< ä
1522< äy
1523< ö
1524< öy
1525< ü
1526< üy
1527< x
1528
1529** test: DataDrivenCollationTest/TestAlbanian
1530# Albanian sort order.
1531@ locale sq
1532* compare
1533< cz
1534< ç
1535< d
1536< dz
1537< dh
1538< e
1539< ez
1540< ë
1541< f
1542< gz
1543< gj
1544< h
1545< lz
1546< ll
1547< m
1548< nz
1549< nj
1550< o
1551< rz
1552< rr
1553< s
1554< sz
1555< sh
1556< t
1557< tz
1558< th
1559< u
1560< xz
1561< xh
1562< y
1563< zz
1564< zh
1565
1566** test: DataDrivenCollationTest/TestSimplifiedChineseOrder
1567# Sorted file has different order.
1568@ root
1569# normalization=on turned on & off automatically.
1570* compare
1571< \u5F20
1572< \u5F20\u4E00\u8E3F
1573
1574** test: DataDrivenCollationTest/TestTibetanNormalizedIterativeCrash
1575# This pretty much crashes.
1576@ root
1577* compare
1578< \u0f71\u0f72\u0f80\u0f71\u0f72
1579< \u0f80
1580
1581** test: DataDrivenCollationTest/TestThaiPartialSortKeyProblems
1582# These are examples of strings that caused trouble in partial sort key testing.
1583@ locale th-TH
1584* compare
1585< \u0E01\u0E01\u0E38\u0E18\u0E20\u0E31\u0E13\u0E11\u0E4C
1586< \u0E01\u0E01\u0E38\u0E2A\u0E31\u0E19\u0E42\u0E18
1587* compare
1588< \u0E01\u0E07\u0E01\u0E32\u0E23
1589< \u0E01\u0E07\u0E42\u0E01\u0E49
1590* compare
1591< \u0E01\u0E23\u0E19\u0E17\u0E32
1592< \u0E01\u0E23\u0E19\u0E19\u0E40\u0E0A\u0E49\u0E32
1593* compare
1594< \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E22\u0E27
1595< \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E4A\u0E22\u0E27
1596* compare
1597< \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E2D
1598< \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E49\u0E32
1599
1600** test: DataDrivenCollationTest/TestJavaStyleRule
1601# java.text allows rules to start as '<<<x<<<y...'
1602# we emulate this by assuming a &[first tertiary ignorable] in this case.
1603@ rules
1604&\u0001=equal<<<z<<x<<<w &[first tertiary ignorable]=a &[first primary ignorable]=b
1605* compare
1606= a
1607= equal
1608< z
1609< x
1610= b  # x had become the new first primary ignorable
1611< w
1612
1613** test: DataDrivenCollationTest/TestShiftedIgnorable
1614# The UCA states that primary ignorables should be completely
1615# ignorable when following a shifted code point.
1616@ root
1617% alternate=shifted
1618% strength=quaternary
1619* compare
1620< a\u0020b
1621= a\u0020\u0300b
1622= a\u0020\u0301b
1623< a_b
1624= a_\u0300b
1625= a_\u0301b
1626< A\u0020b
1627= A\u0020\u0300b
1628= A\u0020\u0301b
1629< A_b
1630= A_\u0300b
1631= A_\u0301b
1632< a\u0301b
1633< A\u0301b
1634< a\u0300b
1635< A\u0300b
1636
1637** test: DataDrivenCollationTest/TestNShiftedIgnorable
1638# The UCA states that primary ignorables should be completely
1639# ignorable when following a shifted code point.
1640@ root
1641% alternate=non-ignorable
1642% strength=tertiary
1643* compare
1644< a\u0020b
1645< A\u0020b
1646< a\u0020\u0301b
1647< A\u0020\u0301b
1648< a\u0020\u0300b
1649< A\u0020\u0300b
1650< a_b
1651< A_b
1652< a_\u0301b
1653< A_\u0301b
1654< a_\u0300b
1655< A_\u0300b
1656< a\u0301b
1657< A\u0301b
1658< a\u0300b
1659< A\u0300b
1660
1661** test: DataDrivenCollationTest/TestSafeSurrogates
1662# It turned out that surrogates were not skipped properly
1663# when iterating backwards if they were in the middle of a
1664# contraction. This test assures that this is fixed.
1665@ rules
1666&a < x\ud800\udc00b
1667* compare
1668< a
1669< x\ud800\udc00b
1670
1671** test: DataDrivenCollationTest/da_TestPrimary
1672# This test goes through primary strength cases
1673@ locale da
1674% strength=primary
1675* compare
1676< Lvi
1677< Lwi
1678* compare
1679< L\u00e4vi
1680< L\u00f6wi
1681* compare
1682< L\u00fcbeck
1683= Lybeck
1684
1685** test: DataDrivenCollationTest/da_TestTertiary
1686# This test goes through tertiary strength cases
1687@ locale da
1688% strength=tertiary
1689* compare
1690< Luc
1691< luck
1692* compare
1693< luck
1694< L\u00fcbeck
1695* compare
1696< lybeck
1697< L\u00fcbeck
1698* compare
1699< L\u00e4vi
1700< L\u00f6we
1701* compare
1702< L\u00f6ww
1703< mast
1704
1705* compare
1706< A/S
1707< ANDRE
1708< ANDR\u00c9
1709< ANDREAS
1710< AS
1711< CA
1712< \u00c7A
1713< CB
1714< \u00c7C
1715< D.S.B.
1716< DA
1717< \u00d0A
1718< DB
1719< \u00d0C
1720< DSB
1721< DSC
1722< EKSTRA_ARBEJDE
1723< EKSTRABUD0
1724< H\u00d8ST
1725< HAAG
1726< H\u00c5NDBOG
1727< HAANDV\u00c6RKSBANKEN
1728< Karl
1729< karl
1730< NIELS\u0020J\u00d8RGEN
1731< NIELS-J\u00d8RGEN
1732< NIELSEN
1733< R\u00c9E,\u0020A
1734< REE,\u0020B
1735< R\u00c9E,\u0020L
1736< REE,\u0020V
1737< SCHYTT,\u0020B
1738< SCHYTT,\u0020H
1739< SCH\u00dcTT,\u0020H
1740< SCHYTT,\u0020L
1741< SCH\u00dcTT,\u0020M
1742< SS
1743< \u00df
1744< SSA
1745< STORE\u0020VILDMOSE
1746< STOREK\u00c6R0
1747< STORM\u0020PETERSEN
1748< STORMLY
1749< THORVALD
1750< THORVARDUR
1751< \u00feORVAR\u00d0UR
1752< THYGESEN
1753< VESTERG\u00c5RD,\u0020A
1754< VESTERGAARD,\u0020A
1755< VESTERG\u00c5RD,\u0020B
1756< \u00c6BLE
1757< \u00c4BLE
1758< \u00d8BERG
1759< \u00d6BERG
1760
1761* compare
1762< andere
1763< chaque
1764< chemin
1765< cote
1766< cot\u00e9
1767< c\u00f4te
1768< c\u00f4t\u00e9
1769< \u010du\u010d\u0113t
1770< Czech
1771< hi\u0161a
1772< irdisch
1773< lie
1774< lire
1775< llama
1776< l\u00f5ug
1777< l\u00f2za
1778< lu\u010d
1779< luck
1780< L\u00fcbeck
1781< lye
1782< l\u00e4vi
1783< L\u00f6wen
1784< m\u00e0\u0161ta
1785< m\u00eer
1786< myndig
1787< M\u00e4nner
1788< m\u00f6chten
1789< pi\u00f1a
1790< pint
1791< pylon
1792< \u0161\u00e0ran
1793< savoir
1794< \u0160erb\u016bra
1795< Sietla
1796< \u015blub
1797< subtle
1798< symbol
1799< s\u00e4mtlich
1800< verkehrt
1801< vox
1802< v\u00e4ga
1803< waffle
1804< wood
1805< yen
1806< yuan
1807< yucca
1808< \u017eal
1809< \u017eena
1810< \u017den\u0113va
1811< zoo0
1812< Zviedrija
1813< Z\u00fcrich
1814< zysk0
1815< \u00e4ndere
1816
1817** test: DataDrivenCollationTest/hi_TestNewRules
1818# This test goes through new rules and tests against old rules
1819@ locale hi
1820* compare
1821< कॐ
1822< कं
1823< कँ
1824< कः
1825
1826** test: DataDrivenCollationTest/ro_TestNewRules
1827# This test goes through new rules and tests against old rules
1828@ locale ro
1829* compare
1830< xAx
1831< xă
1832< xĂ
1833< Xă
1834< XĂ
1835< xăx
1836< xĂx
1837< xâ
1838< xÂ
1839< Xâ
1840< XÂ
1841< xâx
1842< xÂx
1843< xb
1844< xIx
1845< xî
1846< xÎ
1847< Xî
1848< XÎ
1849< xîx
1850< xÎx
1851< xj
1852< xSx
1853< xș
1854= xş
1855< xȘ
1856= xŞ
1857< Xș
1858= Xş
1859< XȘ
1860= XŞ
1861< xșx
1862= xşx
1863< xȘx
1864= xŞx
1865< xT
1866< xTx
1867< xț
1868= xţ
1869< xȚ
1870= xŢ
1871< Xț
1872= Xţ
1873< XȚ
1874= XŢ
1875< xțx
1876= xţx
1877< xȚx
1878= xŢx
1879< xU
1880
1881** test: DataDrivenCollationTest/testOffsets
1882# This tests cases where forwards and backwards iteration get different offsets
1883@ locale en
1884% strength=tertiary
1885* compare
1886< a\uD800\uDC00\uDC00
1887< b\uD800\uDC00\uDC00
1888* compare
1889< \u0301A\u0301\u0301
1890< \u0301B\u0301\u0301
1891* compare
1892< abcd\r\u0301
1893< abce\r\u0301
1894# TODO: test offsets in new CollationTest
1895
1896# End of test cases moved here from ICU 52's DataDrivenCollationTest.txt.
1897
1898** test: was ICU 52 cmsccoll/TestRedundantRules
1899@ rules
1900& a < b < c < d& [before 1] c < m
1901* compare
1902<1 a
1903<1 b
1904<1 m
1905<1 c
1906<1 d
1907
1908@ rules
1909& a < b <<< c << d <<< e& [before 3] e <<< x
1910* compare
1911<1 a
1912<1 b
1913<3 c
1914<2 d
1915<3 x
1916<3 e
1917
1918@ rules
1919& a < b <<< c << d <<< e <<< f < g& [before 1] g < x
1920* compare
1921<1 a
1922<1 b
1923<3 c
1924<2 d
1925<3 e
1926<3 f
1927<1 x
1928<1 g
1929
1930@ rules
1931& a <<< b << c < d& a < m
1932* compare
1933<1 a
1934<3 b
1935<2 c
1936<1 m
1937<1 d
1938
1939@ rules
1940&a<b<<b\u0301 &z<b
1941* compare
1942<1 a
1943<1 b\u0301
1944<1 z
1945<1 b
1946
1947@ rules
1948&z<m<<<q<<<m
1949* compare
1950<1 z
1951<1 q
1952<3 m
1953
1954@ rules
1955&z<<<m<q<<<m
1956* compare
1957<1 z
1958<1 q
1959<3 m
1960
1961@ rules
1962& a < b < c < d& r < c
1963* compare
1964<1 a
1965<1 b
1966<1 d
1967<1 r
1968<1 c
1969
1970@ rules
1971& a < b < c < d& c < m
1972* compare
1973<1 a
1974<1 b
1975<1 c
1976<1 m
1977<1 d
1978
1979@ rules
1980& a < b < c < d& a < m
1981* compare
1982<1 a
1983<1 m
1984<1 b
1985<1 c
1986<1 d
1987
1988** test: was ICU 52 cmsccoll/TestExpansionSyntax
1989# The following two rules should sort the particular list of strings the same.
1990@ rules
1991&AE <<< a << b <<< c &d <<< f
1992* compare
1993<1 AE
1994<3 a
1995<2 b
1996<3 c
1997<1 d
1998<3 f
1999
2000@ rules
2001&A <<< a / E << b / E <<< c /E  &d <<< f
2002* compare
2003<1 AE
2004<3 a
2005<2 b
2006<3 c
2007<1 d
2008<3 f
2009
2010# The following two rules should sort the particular list of strings the same.
2011@ rules
2012&AE <<< a <<< b << c << d < e < f <<< g
2013* compare
2014<1 AE
2015<3 a
2016<3 b
2017<2 c
2018<2 d
2019<1 e
2020<1 f
2021<3 g
2022
2023@ rules
2024&A <<< a / E <<< b / E << c / E << d / E < e < f <<< g
2025* compare
2026<1 AE
2027<3 a
2028<3 b
2029<2 c
2030<2 d
2031<1 e
2032<1 f
2033<3 g
2034
2035# The following two rules should sort the particular list of strings the same.
2036@ rules
2037&AE <<< B <<< C / D <<< F
2038* compare
2039<1 AE
2040<3 B
2041<3 F
2042<1 AED
2043<3 C
2044
2045@ rules
2046&A <<< B / E <<< C / ED <<< F / E
2047* compare
2048<1 AE
2049<3 B
2050<3 F
2051<1 AED
2052<3 C
2053
2054** test: never reorder trailing primaries
2055@ root
2056% reorder Zzzz Grek
2057* compare
2058<1 L
2059<1 字
2060<1 Ω
2061<1 \uFFFD
2062<1 \uFFFF
2063
2064** test: fall back to mappings with shorter prefixes, not immediately to ones with no prefixes
2065@ rules
2066&u=ab|cd
2067&v=b|ce
2068* compare
2069<1 abc
2070<1 abcc
2071<1 abcf
2072<1 abcd
2073=  abu
2074<1 abce
2075=  abv
2076
2077# With the following rules, there is only one prefix per composite ĉ or ç,
2078# but both prefixes apply to just c in NFD form.
2079# We would get different results for composed vs. NFD input
2080# if we fell back directly from longest-prefix mappings to no-prefix mappings.
2081@ rules
2082&x=op|ĉ
2083&y=p|ç
2084* compare
2085<1 opc
2086<2 opć
2087<1 opcz
2088<1 opd
2089<1 opĉ
2090=  opc\u0302
2091=  opx
2092<1 opç
2093=  opc\u0327
2094=  opy
2095
2096# The mapping is used which has the longest matching prefix for which
2097# there is also a suffix match, with the longest suffix match among several for that prefix.
2098@ rules
2099&❶=d
2100&❷=de
2101&❸=def
2102&①=c|d
2103&②=c|de
2104&③=c|def
2105&④=bc|d
2106&⑤=bc|de
2107&⑥=bc|def
2108&⑦=abc|d
2109&⑧=abc|de
2110&⑨=abc|def
2111* compare
2112<1 9aadzz
2113=  9aa❶zz
2114<1 9aadez
2115=  9aa❷z
2116<1 9aadef
2117=  9aa❸
2118<1 9acdzz
2119=  9ac①zz
2120<1 9acdez
2121=  9ac②z
2122<1 9acdef
2123=  9ac③
2124<1 9bcdzz
2125=  9bc④zz
2126<1 9bcdez
2127=  9bc⑤z
2128<1 9bcdef
2129=  9bc⑥
2130<1 abcdzz
2131=  abc⑦zz
2132<1 abcdez
2133=  abc⑧z
2134<1 abcdef
2135=  abc⑨
2136
2137** test: prefix + discontiguous contraction with missing prefix contraction
2138# Unfortunate terminology: The first "prefix" here is the pre-context,
2139# the second "prefix" refers to the contraction/relation string that is
2140# one shorter than the one being tested.
2141@ rules
2142&x=p|e
2143&y=p|ê
2144&z=op|ê
2145# No mapping for op|e:
2146# Discontiguous contraction matching should not match op|ê in opệ
2147# because it would have to skip the dot below and extend a match on op|e by the circumflex,
2148# but there is no match on op|e.
2149* compare
2150<1 oPe
2151<1 ope
2152=  opx
2153<1 opệ
2154=  opy\u0323  # y not z
2155<1 opê
2156=  opz
2157
2158# We cannot test for fallback by whether the contraction default CE32
2159# is for another contraction. With the following rules, there is no mapping for op|e,
2160# and the fallback to prefix p has no contractions.
2161@ rules
2162&x=p|e
2163&z=op|ê
2164* compare
2165<1 oPe
2166<1 ope
2167=  opx
2168<2 opệ
2169=  opx\u0323\u0302  # x not z
2170<1 opê
2171=  opz
2172
2173# One more variation: Fallback to the simple code point, no shorter non-empty prefix.
2174@ rules
2175&x=e
2176&z=op|ê
2177* compare
2178<1 ope
2179=  opx
2180<3 oPe
2181=  oPx
2182<2 opệ
2183=  opx\u0323\u0302  # x not z
2184<1 opê
2185=  opz
2186
2187** test: maxVariable via rules
2188@ rules
2189[maxVariable space][alternate shifted]
2190* compare
2191=  \u0020
2192=  \u000A
2193<1 .
2194<1 °  # degree sign
2195<1 $
2196<1 0
2197
2198** test: maxVariable via setting
2199@ root
2200% maxVariable=currency
2201% alternate=shifted
2202* compare
2203=  \u0020
2204=  \u000A
2205=  .
2206=  °  # degree sign
2207=  $
2208<1 0
2209
2210** test: ICU4J CollationMiscTest/TestContractionClosure (ää)
2211# This tests canonical closure, but it also tests that CollationFastLatin
2212# bails out properly for contractions with combining marks.
2213# For that we need pairs of strings that remain in the Latin fastpath
2214# long enough, hence the extra "= b" lines.
2215@ rules
2216&b=\u00e4\u00e4
2217* compare
2218<1 b
2219=  \u00e4\u00e4
2220=  b
2221=  a\u0308a\u0308
2222=  b
2223=  \u00e4a\u0308
2224=  b
2225=  a\u0308\u00e4
2226
2227** test: ICU4J CollationMiscTest/TestContractionClosure (Å)
2228@ rules
2229&b=\u00C5
2230* compare
2231<1 b
2232=  \u00C5
2233=  b
2234=  A\u030A
2235=  b
2236=  \u212B
2237
2238** test: reset-before on already-tailored characters, ICU ticket 10108
2239@ rules
2240&a<w<<x &[before 2]x<<y
2241* compare
2242<1 a
2243<1 w
2244<2 y
2245<2 x
2246
2247@ rules
2248&a<<w<<<x &[before 2]x<<y
2249* compare
2250<1 a
2251<2 y
2252<2 w
2253<3 x
2254
2255@ rules
2256&a<w<x &[before 2]x<<y
2257* compare
2258<1 a
2259<1 w
2260<1 y
2261<2 x
2262
2263@ rules
2264&a<w<<<x &[before 2]x<<y
2265* compare
2266<1 a
2267<1 y
2268<2 w
2269<3 x
2270
2271** test: numeric collation with other settings, ICU ticket 9092
2272@ root
2273% strength=identical
2274% caseFirst=upper
2275% numeric=on
2276* compare
2277<1 100\u0020a
2278<1 101
2279
2280** test: collation type fallback from unsupported type, ICU ticket 10149
2281@ locale fr-CA-u-co-phonebk
2282# Expect the same result as with fr-CA, using backwards-secondary order.
2283# That is, we should fall back from the unsupported collation type
2284# to the locale's default collation type.
2285* compare
2286<1 cote
2287<2 côte
2288<2 coté
2289<2 côté
2290
2291** test: @ is equivalent to [backwards 2], ICU ticket 9956
2292@ rules
2293&b<a @ &v<<w
2294* compare
2295<1 b
2296<1 a
2297<1 cote
2298<2 côte
2299<2 coté
2300<2 côté
2301<1 v
2302<2 w
2303<1 x
2304
2305** test: shifted+reordering, ICU ticket 9507
2306@ root
2307% reorder Grek punct space
2308% alternate=shifted
2309% strength=quaternary
2310# Which primaries are "variable" should be determined without script reordering,
2311# and then primaries should be reordered whether they are shifted to quaternary or not.
2312* compare
2313<4 (  # punctuation
2314<4 )
2315<4 \u0020  # space
2316<1 `  # symbol
2317<1 ^
2318<1 $  # currency symbol
2319<1 €
2320<1 0  # numbers
2321<1 ε  # Greek
2322<1 e  # Latin
2323<1 e(e
2324<4 e)e
2325<4 e\u0020e
2326<4 ee
2327<3 e(E
2328<4 e)E
2329<4 e\u0020E
2330<4 eE
2331
2332** test: "uppercase first" could sort a string before its prefix, ICU ticket 9351
2333@ rules
2334&\u0001<<<b<<<B
2335% caseFirst=upper
2336* compare
2337<1 aaa
2338<3 aaaB
2339
2340** test: secondary+case ignores secondary ignorables, ICU ticket 9355
2341@ rules
2342&\u0001<<<b<<<B
2343% strength=secondary
2344% caseLevel=on
2345* compare
2346<1 a
2347=  ab
2348=  aB
2349
2350** test: custom collation rules involving tail of a contraction in Malayalam, ICU ticket 6328
2351@ rules
2352&[before 2] ൌ << ൗ  # U+0D57 << U+0D4C == 0D46+0D57
2353* compare
2354<1 ൗx
2355<2 ൌx
2356<1 ൗy
2357<2 ൌy
2358
2359** test: quoted apostrophe in compact syntax, ICU ticket 8204
2360@ rules
2361&q<<*a''c
2362* compare
2363<1 d
2364<1 p
2365<1 q
2366<2 a
2367<2 \u0027
2368<2 c
2369<1 r
2370
2371# ICU ticket #8260 "Support all collation-related keywords in Collator.getInstance()"
2372** test: locale -u- with collation keywords, ICU ticket 8260
2373@ locale de-u-kv-sPace-ka-shifTed-kn-kk-falsE-kf-Upper-kc-tRue-ks-leVel4
2374* compare
2375<4 \u0020  # space is shifted, strength=quaternary
2376<1 !  # punctuation is regular
2377<1 2
2378<1 12  # numeric sorting
2379<1 B
2380<c b  # uppercase first on case level
2381<1 x\u0301\u0308
2382<2 x\u0308\u0301  # normalization off
2383
2384** test: locale @ with collation keywords, ICU ticket 8260
2385@ locale fr@colbAckwards=yes;ColStrength=Quaternary;kv=currencY;colalternate=shifted
2386* compare
2387<4 $  # currency symbols are shifted, strength=quaternary
2388<1 àla
2389<2 alà  # backwards secondary level
2390
2391** test: locale -u- with script reordering, ICU ticket 8260
2392@ locale el-u-kr-kana-SYMBOL-Grek-hani-cyrl-latn-digit-armn-deva-ethi-thai
2393* compare
2394<1 \u0020
2395<1 あ
2396<1 ☂
2397<1 Ω
2398<1 丂
2399<1 ж
2400<1 L
2401<1 4
2402<1 Ձ
2403<1 अ
2404<1 ሄ
2405<1 ฉ
2406
2407** test: locale @collation=type should be case-insensitive
2408@ locale de@coLLation=PhoneBook
2409* compare
2410<1 ae
2411<2 ä
2412<3 Ä
2413
2414** test: import root search rules plus German phonebook rules, ICU ticket 8962
2415@ locale de-u-co-search
2416* compare
2417<1 =
2418<1 ≠
2419<1 a
2420<1 ae
2421<2 ä
2422
2423# Once more, but with runtime builder.
2424@ rules
2425[import und-u-co-search][import de-u-co-phonebk]
2426* compare
2427<1 =
2428<1 ≠
2429<1 a
2430<1 ae
2431<2 ä
2432
2433# Once again, with import from "root" not "und" (as in a proper language tag).
2434@ rules
2435[import root-u-co-search][import de-u-co-phonebk]
2436* compare
2437<1 =
2438<1 ≠
2439<1 a
2440<1 ae
2441<2 ä
2442
2443** test: import rules from a language with non-Latin native script, and reset the reordering, ICU ticket 10998
2444# Greek should sort Greek first.
2445@ rules
2446[import el]
2447* compare
2448<1 4
2449<1 Ω
2450<1 L
2451
2452# Import Greek, and then reset the reordering.
2453@ rules
2454[import el][reorder Zzzz]
2455* compare
2456<1 4
2457<1 L
2458<1 Ω
2459
2460# "others" is a synonym for Zzzz.
2461@ rules
2462[import el][reorder others]
2463* compare
2464<1 4
2465<1 L
2466<1 Ω
2467
2468** test: regression test for CollationFastLatinBuilder, ICU ticket 11388
2469@ rules
2470&x<<aa<<<Aa<<<AA
2471% strength=secondary
2472* compare
2473<1 AA
2474<2 Aẩ
2475<2 aą
2476* compare
2477<1 AA
2478<2 aą
2479
2480** test: tailor tertiary-after a common tertiary where there is a lower one
2481# Assume that Hiragana small A has a below-common tertiary, and Hiragana A has a common one.
2482# See ICU ticket 11448 & CLDR ticket 7222.
2483@ rules
2484&あ<<<x<<<y<<<z
2485* compare
2486<1 ぁ
2487<3 あ
2488<3 x
2489<3 y
2490<3 z
2491<3 ァ
2492<1 い
2493
2494** test: tailor tertiary-after a below-common tertiary
2495@ rules
2496&ぁ<<<x<<<y<<<z
2497* compare
2498<1 ぁ
2499<3 x
2500<3 y
2501<3 z
2502<3 あ
2503<3 ァ
2504<1 い
2505
2506** test: tailor tertiary-before a common tertiary where there is a lower one
2507@ rules
2508&[before 3]あ<<<x<<<y<<<z
2509* compare
2510<1 ぁ
2511<3 x
2512<3 y
2513<3 z
2514<3 あ
2515<3 ァ
2516<1 い
2517
2518** test: tailor tertiary-before a below-common tertiary
2519@ rules
2520&[before 3]ぁ<<<x<<<y<<<z
2521* compare
2522<1 x
2523<3 y
2524<3 z
2525<3 ぁ
2526<3 あ
2527<3 ァ
2528<1 い
2529
2530** test: reorder single scripts not groups, ICU ticket 11449
2531@ root
2532% reorder Goth Latn
2533* compare
2534<1 4
2535<1 ��  # Gothic
2536<1 L
2537<1 Ω
2538# Before ICU 55, the following reordered together with Gothic.
2539<1 ��  # Old Italic
2540<1 ��  # Shavian
2541