1# Copyright (C) 2016 and later: Unicode, Inc. and others. 2# License & terms of use: http://www.unicode.org/copyright.html#License 3# 4# Corporation and others. All Rights Reserved. 5# Copyright (c) 2012-2015 International Business Machines 6# Corporation and others. All Rights Reserved. 7# 8# This file should be in UTF-8 with a signature byte sequence ("BOM"). 9# 10# collationtest.txt: Collation test data. 11# 12# created on: 2012apr13 13# created by: Markus W. Scherer 14 15# A line with "** test: description" is used for verbose and error output. 16 17# A collator can be set with "@ root" or "@ locale language-tag", 18# for example "@ locale de-u-co-phonebk". 19# An old-style locale ID can also be used, for example "@ locale de@collation=phonebook". 20 21# A collator can be built with "@ rules". 22# An "@ rules" line is followed by one or more lines with the tailoring rules. 23 24# A collator can be modified with "% attribute=value". 25 26# "* compare" tests the order (= or <) of the following strings. 27# The relation can be "=" or "<" (the level of the difference is not specified) 28# or "<1", "<2", "<c", "<3", "<4" (indicating the level of the difference). 29 30# Test sections ("* compare") are terminated by 31# definitions of new collators, changing attributes, or new test sections. 32 33** test: simple CEs & expansions 34# Many types of mappings are tested elsewhere, including via the UCA conformance tests. 35# Here we mostly cover a few unusual mappings. 36@ rules 37&\x01 # most control codes are ignorable 38<<<\u0300 # tertiary CE 39&9<\x00 # NUL not ignorable 40&\uA00A\uA00B=\uA002 # two long-primary CEs 41&\uA00A\uA00B\u00050005=\uA003 # three CEs, require 64 bits 42 43* compare 44= \x01 45= \x02 46<3 \u0300 47<1 9 48<1 \x00 49= \x01\x00\x02 50<1 a 51<3 a\u0300 52<2 a\u0308 53= ä 54<1 b 55<1 か # Hiragana Ka (U+304B) 56<2 か\u3099 # plus voiced sound mark 57= が # Hiragana Ga (U+304C) 58<1 \uA00A\uA00B 59= \uA002 60<1 \uA00A\uA00B\u00050004 61<1 \uA00A\uA00B\u00050005 62= \uA003 63<1 \uA00A\uA00B\u00050006 64 65** test: contractions 66# Create some interesting mappings, and map some normalization-inert characters 67# (which are not subject to canonical reordering) 68# to some of the same CEs to check the sequence of CEs. 69@ rules 70 71# Contractions starting with 'a' should not continue with any character < U+0300 72# so that we can test a shortcut for that. 73&a=ⓐ 74&b<bz=ⓑ 75&d<dz\u0301=ⓓ # d+z+acute 76&z 77<a\u0301=Ⓐ # a+acute sorts after z 78<a\u0301\u0301=Ⓑ # a+acute+acute 79<a\u0301\u0301\u0358=Ⓒ # a+acute+acute+dot above right 80<a\u030a=Ⓓ # a+ring 81<a\u0323=Ⓔ # a+dot below 82<a\u0323\u0358=Ⓕ # a+dot below+dot above right 83<a\u0327\u0323\u030a=Ⓖ # a+cedilla+dot below+ring 84<a\u0327\u0323bz=Ⓗ # a+cedilla+dot below+b+z 85 86&\U0001D158=⁰ # musical notehead black (has a symbol primary) 87<\U0001D158\U0001D165=¼ # musical quarter note 88 89# deliberately missing prefix contractions: 90# dz 91# a\u0327 92# a\u0327\u0323 93# a\u0327\u0323b 94 95&\x01 96<<<\U0001D165=¹ # musical stem (ccc=216) 97<<<\U0001D16D=² # musical augmentation dot (ccc=226) 98<<<\U0001D165\U0001D16D=³ # stem+dot (ccc=216 226) 99&\u0301=❶ # acute (ccc=230) 100&\u030a=❷ # ring (ccc=230) 101&\u0308=❸ # diaeresis (ccc=230) 102<<\u0308\u0301=❹ # diaeresis+acute (=dialytika tonos) (ccc=230 230) 103&\u0327=❺ # cedilla (ccc=202) 104&\u0323=❻ # dot below (ccc=220) 105&\u0331=❼ # macron below (ccc=220) 106<<\u0331\u0358=❽ # macron below+dot above right (ccc=220 232) 107&\u0334=❾ # tilde overlay (ccc=1) 108&\u0358=❿ # dot above right (ccc=232) 109 110&\u0f71=① # tibetan vowel sign aa 111&\u0f72=② # tibetan vowel sign i 112# \u0f71\u0f72 # tibetan vowel sign aa + i = ii = U+0F73 113&\u0f73=③ # tibetan vowel sign ii (ccc=0 but lccc=129) 114 115** test: simple contractions 116 117# Some strings are chosen to cause incremental contiguous contraction matching to 118# go into partial matches for prefixes of contractions 119# (where the prefixes are deliberately not also contractions). 120# When there is no complete match, then the matching code must back out of those 121# so that discontiguous contractions work as specified. 122 123* compare 124# contraction starter with no following text, or mismatch, or blocked 125<1 a 126= ⓐ 127<1 aa 128= ⓐⓐ 129<1 ab 130= ⓐb 131<1 az 132= ⓐz 133 134* compare 135<1 a 136<2 a\u0308\u030a # ring blocked by diaeresis 137= ⓐ❸❷ 138<2 a\u0327 139= ⓐ❺ 140 141* compare 142<2 \u0308 143= ❸ 144<2 \u0308\u030a\u0301 # acute blocked by ring 145= ❸❷❶ 146 147* compare 148<1 \U0001D158 149= ⁰ 150<1 \U0001D158\U0001D165 151= ¼ 152 153# no discontiguous contraction because of missing prefix contraction d+z, 154# and a starter ('z') after the 'd' 155* compare 156<1 dz\u0323\u0301 157= dz❻❶ 158 159# contiguous contractions 160* compare 161<1 abz 162= ⓐⓑ 163<1 abzz 164= ⓐⓑz 165 166* compare 167<1 a 168<1 z 169<1 a\u0301 170= Ⓐ 171<1 a\u0301\u0301 172= Ⓑ 173<1 a\u0301\u0301\u0358 174= Ⓒ 175<1 a\u030a 176= Ⓓ 177<1 a\u0323\u0358 178= Ⓕ 179<1 a\u0327\u0323\u030a # match despite missing prefix 180= Ⓖ 181<1 a\u0327\u0323bz 182= Ⓗ 183 184* compare 185<2 \u0308\u0308\u0301 # acute blocked from first diaeresis, contracts with second 186= ❸❹ 187 188* compare 189<1 \U0001D158\U0001D165 190= ¼ 191 192* compare 193<3 \U0001D165\U0001D16D 194= ³ 195 196** test: discontiguous contractions 197* compare 198<1 a\u0327\u030a # a+ring skips cedilla 199= Ⓓ❺ 200<2 a\u0327\u0327\u030a # a+ring skips 2 cedillas 201= Ⓓ❺❺ 202<2 a\u0327\u0327\u0327\u030a # a+ring skips 3 cedillas 203= Ⓓ❺❺❺ 204<2 a\u0334\u0327\u0327\u030a # a+ring skips tilde overlay & 2 cedillas 205= Ⓓ❾❺❺ 206<1 a\u0327\u0323 # a+dot below skips cedilla 207= Ⓔ❺ 208<1 a\u0323\u0301\u0358 # a+dot below+dot ab.r.: 2-char match, then skips acute 209= Ⓕ❶ 210<2 a\u0334\u0323\u0358 # a+dot below skips tilde overlay 211= Ⓕ❾ 212 213* compare 214<2 \u0331\u0331\u0358 # macron below+dot ab.r. skips the second macron below 215= ❽❼ 216 217* compare 218<1 a\u0327\u0331\u0323\u030a # a+ring skips cedilla, macron below, dot below (dot blocked by macron) 219= Ⓓ❺❼❻ 220<1 a\u0327\u0323\U0001D16D\u030a # a+dot below skips cedilla 221= Ⓔ❺²❷ 222<2 a\u0327\u0327\u0323\u030a # a+dot below skips 2 cedillas 223= Ⓔ❺❺❷ 224<2 a\u0327\u0323\u0323\u030a # a+dot below skips cedilla 225= Ⓔ❺❻❷ 226<2 a\u0334\u0327\u0323\u030a # a+dot below skips tilde overlay & cedilla 227= Ⓔ❾❺❷ 228 229* compare 230<1 \U0001D158\u0327\U0001D165 # quarter note skips cedilla 231= ¼❺ 232<1 a\U0001D165\u0323 # a+dot below skips stem 233= Ⓔ¹ 234 235# partial contiguous match, backs up, matches discontiguous contraction 236<1 a\u0327\u0323b 237= Ⓔ❺b 238<1 a\u0327\u0323ba 239= Ⓔ❺bⓐ 240 241# a+acute+acute+dot above right skips cedilla, continues matching 2 same-ccc combining marks 242* compare 243<1 a\u0327\u0301\u0301\u0358 244= Ⓒ❺ 245 246# FCD but not NFD 247* compare 248<1 a\u0f73\u0301 # a+acute skips tibetan ii 249= Ⓐ③ 250 251# FCD but the 0f71 inside the 0f73 must be skipped 252# to match the discontiguous contraction of the first 0f71 with the trailing 0f72 inside the 0f73 253* compare 254<1 \u0f71\u0f73 # == \u0f73\u0f71 == \u0f71\u0f71\u0f72 255= ③① 256 257** test: discontiguous contractions with nested contractions 258* compare 259<1 a\u0323\u0308\u0301\u0358 260= Ⓕ❹ 261<2 a\u0323\u0308\u0301\u0308\u0301\u0358 262= Ⓕ❹❹ 263 264** test: discontiguous contractions with interleaved contractions 265* compare 266# a+ring & cedilla & macron below+dot above right 267<1 a\u0327\u0331\u030a\u0358 268= Ⓓ❺❽ 269 270# a+ring & 1x..3x macron below+dot above right 271<2 a\u0331\u030a\u0358 272= Ⓓ❽ 273<2 a\u0331\u0331\u030a\u0358\u0358 274= Ⓓ❽❽ 275# also skips acute 276<2 a\u0331\u0331\u0331\u030a\u0301\u0358\u0358\u0358 277= Ⓓ❽❽❽❶ 278 279# a+dot below & stem+augmentation dot, followed by contiguous d+z+acute 280<1 a\U0001D165\u0323\U0001D16Ddz\u0301 281= Ⓔ³ⓓ 282 283** test: some simple string comparisons 284@ root 285* compare 286# first string compares against "" 287= \u0000 288< a 289<1 b 290<3 B 291= \u0000B\u0000 292 293** test: compare with strength=primary 294% strength=primary 295* compare 296<1 a 297<1 b 298= B 299 300** test: compare with strength=secondary 301% strength=secondary 302* compare 303<1 a 304<1 b 305= B 306 307** test: compare with strength=tertiary 308% strength=tertiary 309* compare 310<1 a 311<1 b 312<3 B 313 314** test: compare with strength=quaternary 315% strength=quaternary 316* compare 317<1 a 318<1 b 319<3 B 320 321** test: compare with strength=identical 322% strength=identical 323* compare 324<1 a 325<1 b 326<3 B 327 328** test: côté with forwards secondary 329@ root 330* compare 331<1 cote 332<2 coté 333<2 côte 334<2 côté 335 336** test: côté with forwards secondary vs. U+FFFE merge separator 337# Merged sort keys: On each level, any difference in the first segment 338# must trump any further difference. 339* compare 340<1 cote\uFFFEcôté 341<2 coté\uFFFEcôte 342<2 côte\uFFFEcoté 343<2 côté\uFFFEcote 344 345** test: côté with backwards secondary 346% backwards=on 347* compare 348<1 cote 349<2 côte 350<2 coté 351<2 côté 352 353** test: côté with backwards secondary vs. U+FFFE merge separator 354# Merged sort keys: On each level, any difference in the first segment 355# must trump any further difference. 356* compare 357<1 cote\uFFFEcôté 358<2 côte\uFFFEcoté 359<2 coté\uFFFEcôte 360<2 côté\uFFFEcote 361 362** test: U+FFFE on identical level 363@ root 364% strength=identical 365* compare 366# All of these control codes are completely-ignorable, so that 367# their low code points are compared with the merge separator. 368# The merge separator must compare less than any other character. 369<1 \uFFFE\u0001\u0002\u0003 370<i \u0001\uFFFE\u0002\u0003 371<i \u0001\u0002\uFFFE\u0003 372<i \u0001\u0002\u0003\uFFFE 373 374* compare 375# The merge separator must even compare less than U+0000. 376<1 \uFFFE\u0000\u0000 377<i \u0000\uFFFE\u0000 378<i \u0000\u0000\uFFFE 379 380** test: Hani < surrogates < U+FFFD 381# Note: compareUTF8() treats unpaired surrogates like U+FFFD, 382# so with that the strings with surrogates will compare equal to each other 383# and equal to the string with U+FFFD. 384@ root 385% strength=identical 386* compare 387<1 abz 388<1 a\u4e00z 389<1 a\U00020000z 390<1 a\ud800z 391<1 a\udbffz 392<1 a\udc00z 393<1 a\udfffz 394<1 a\ufffdz 395 396** test: script reordering 397@ root 398% reorder Hani Zzzz digit 399* compare 400<1 ? 401<1 + 402<1 丂 403<1 a 404<1 α 405<1 5 406 407% reorder default 408* compare 409<1 ? 410<1 + 411<1 5 412<1 a 413<1 α 414<1 丂 415 416** test: empty rules 417@ rules 418* compare 419<1 a 420<2 ä 421<3 Ä 422<1 b 423 424** test: very simple rules 425@ rules 426&a=e<<<<q<<<<r<x<<<X<<y<<<Y;z,Z 427% strength=quaternary 428* compare 429<1 a 430= e 431<4 q 432<4 r 433<1 x 434<3 X 435<2 y 436<3 Y 437<2 z 438<3 Z 439 440** test: tailoring twice before a root position: primary 441@ rules 442&[before 1]b<p 443&[before 1]b<q 444* compare 445<1 a 446<1 p 447<1 q 448<1 b 449 450** test: tailoring twice before a root position: secondary 451@ rules 452&[before 2]ſ<<p 453&[before 2]ſ<<q 454* compare 455<1 s 456<2 p 457<2 q 458<2 ſ 459 460# secondary-before common weight 461@ rules 462&[before 2]b<<p 463&[before 2]b<<q 464* compare 465<1 a 466<1 p 467<2 q 468<2 b 469 470** test: tailoring twice before a root position: tertiary 471@ rules 472&[before 3]B<<<p 473&[before 3]B<<<q 474* compare 475<1 b 476<3 p 477<3 q 478<3 B 479 480# tertiary-before common weight 481@ rules 482&[before 3]b<<<p 483&[before 3]b<<<q 484* compare 485<1 a 486<1 p 487<3 q 488<3 b 489 490@ rules 491&[before 2]b<<s 492&[before 3]s<<<p 493&[before 3]s<<<q 494* compare 495<1 a 496<1 p 497<3 q 498<3 s 499<2 b 500 501** test: tailor after completely ignorable 502@ rules 503&\x00<<<x<<y 504* compare 505= \x00 506= \x1F 507<3 x 508<2 y 509 510** test: secondary tailoring gaps, ICU ticket 9362 511@ rules 512&[before 2]s<<'_' 513&s<<r # secondary between s and ſ (long s) 514&ſ<<*a-q # more than 15 between ſ and secondary CE boundary 515&[before 2][first primary ignorable]<<u<<v # between secondary CE boundary & lowest secondary CE 516&[last primary ignorable]<<y<<z 517 518* compare 519<2 u 520<2 v 521<2 \u0332 # lowest secondary CE 522<2 \u0308 523<2 y 524<2 z 525<1 s_ 526<2 ss 527<2 sr 528<2 sſ 529<2 sa 530<2 sb 531<2 sp 532<2 sq 533<2 sus 534<2 svs 535<2 rs 536 537** test: tertiary tailoring gaps, ICU ticket 9362 538@ rules 539&[before 3]t<<<'_' 540&t<<<r # tertiary between t and fullwidth t 541&ᵀ<<<*a-q # more than 15 between ᵀ (modifier letter T) and tertiary CE boundary 542&[before 3][first secondary ignorable]<<<u<<<v # between tertiary CE boundary & lowest tertiary CE 543&[last secondary ignorable]<<<y<<<z 544 545* compare 546<3 u 547<3 v 548# Note: The root collator currently does not map any characters to tertiary CEs. 549<3 y 550<3 z 551<1 t_ 552<3 tt 553<3 tr 554<3 tt 555<3 tᵀ 556<3 ta 557<3 tb 558<3 tp 559<3 tq 560<3 tut 561<3 tvt 562<3 rt 563 564** test: secondary & tertiary around root character 565@ rules 566&[before 2]m<<r 567&m<<s 568&[before 3]m<<<u 569&m<<<v 570* compare 571<1 l 572<1 r 573<2 u 574<3 m 575<3 v 576<2 s 577<1 n 578 579** test: secondary & tertiary around tailored item 580@ rules 581&m<x 582&[before 2]x<<r 583&x<<s 584&[before 3]x<<<u 585&x<<<v 586* compare 587<1 m 588<1 r 589<2 u 590<3 x 591<3 v 592<2 s 593<1 n 594 595** test: more nesting of secondary & tertiary before 596@ rules 597&[before 3]m<<<u 598&[before 2]m<<r 599&[before 3]r<<<q 600&m<<<w 601&m<<t 602&[before 3]w<<<v 603&w<<<x 604&w<<s 605* compare 606<1 l 607<1 q 608<3 r 609<2 u 610<3 m 611<3 v 612<3 w 613<3 x 614<2 s 615<2 t 616<1 n 617 618** test: case bits 619@ rules 620&w<x # tailored CE getting case bits 621 =uv=uV=Uv=UV # 2 chars -> 1 CE 622&ae=ch=cH=Ch=CH # 2 chars -> 2 CEs 623&rst=yz=yZ=Yz=YZ # 2 chars -> 3 CEs 624% caseFirst=lower 625* compare 626<1 ae 627= ch 628<3 cH 629<3 Ch 630<3 CH 631<1 rst 632= yz 633<3 yZ 634<3 Yz 635<3 YZ 636<1 w 637<1 x 638= uv 639<3 uV 640= Uv # mixed case on single CE cannot distinguish variations 641<3 UV 642 643** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=lower 644@ rules 645&\u0001<<<t<<<T # tertiary CEs 646% caseFirst=lower 647* compare 648<1 aa 649<3 aat 650<3 aaT 651<3 aA 652<3 aAt 653<3 ata 654<3 aTa 655 656** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=upper 657% caseFirst=upper 658* compare 659<1 aA 660<3 aAt 661<3 aa 662<3 aat 663<3 aaT 664<3 ata 665<3 aTa 666 667** test: reset on expansion, ICU tickets 9415 & 9593 668@ rules 669&æ<x # tailor the last primary CE so that x sorts between ae and af 670&æb=bæ # copy all reset CEs to make bæ sort the same 671&각<h # copy/tailor 3 CEs to make h sort before the next Hangul syllable 갂 672&⒀<<y # copy/tailor 4 CEs to make y sort with only a secondary difference 673&l·=z # handle the pre-context for · when fetching reset CEs 674 <<u # copy/tailor 2 CEs 675 676* compare 677<1 ae 678<2 æ 679<1 x 680<1 af 681 682* compare 683<1 aeb 684<2 æb 685= bæ 686 687* compare 688<1 각 689<1 h 690<1 갂 691<1 갃 692 693* compare 694<1 · # by itself: primary CE 695<1 l 696<2 l· # l+middle dot has only a secondary difference from l 697= z 698<2 u 699 700* compare 701<1 (13) 702<3 ⒀ # DUCET sets special tertiary weights in all CEs 703<2 y 704<1 (13[ 705 706% alternate=shifted 707* compare 708<1 (13) 709= 13 710<3 ⒀ 711= y # alternate=shifted removes the tailoring difference on the last CE 712<1 14 713 714** test: contraction inside extension, ICU ticket 9378 715@ rules 716&а<<х/й # all letters are Cyrillic 717* compare 718<1 ай 719<2 х 720 721** test: no duplicate tailored CEs for different reset positions with same CEs, ICU ticket 10104 722@ rules 723&t<x &ᵀ<y # same primary weights 724&q<u &[before 1]ꝗ<v # q and ꝗ are primary adjacent 725* compare 726<1 q 727<1 u 728<1 v 729<1 ꝗ 730<1 t 731<3 ᵀ 732<1 y 733<1 x 734 735# Principle: Each rule builds on the state of preceding rules and ignores following rules. 736 737** test: later rule does not affect earlier reset position, ICU ticket 10105 738@ rules 739&a < u < v < w &ov < x &b < v 740* compare 741<1 oa 742<1 ou 743<1 x # CE(o) followed by CE between u and w 744<1 ow 745<1 ob 746<1 ov 747 748** test: later rule does not affect earlier extension (1), ICU ticket 10105 749@ rules 750&a=x/b &v=b 751% strength=secondary 752* compare 753<1 B 754<1 c 755<1 v 756= b 757* compare 758<1 AB 759= x 760<1 ac 761<1 av 762= ab 763 764** test: later rule does not affect earlier extension (2), ICU ticket 10105 765@ rules 766&a <<< c / e &g <<< e / l 767% strength=secondary 768* compare 769<1 AE 770= c 771<2 æ 772<1 agl 773= ae 774 775** test: later rule does not affect earlier extension (3), ICU ticket 10105 776@ rules 777&a = b / c &d = c / e 778% strength=secondary 779* compare 780<1 AC # C is still only tertiary different from the original c 781= b 782<1 ade 783= ac 784 785** test: extension contains tailored character, ICU ticket 10105 786@ rules 787&a=e &b=u/e 788* compare 789<1 a 790= e 791<1 ba 792= be 793= u 794 795** test: add simple mappings for characters with root context 796@ rules 797&z=· # middle dot has a prefix mapping in the CLDR root 798&n=и # и (U+0438) has contractions in the root 799* compare 800<1 l 801<2 l· # root mapping for l|· still works 802<1 z 803= · 804* compare 805<1 n 806= и 807<1 И 808<1 и\u0306 # root mapping for й=и\u0306 still works 809= й 810<3 Й 811 812** test: add context mappings around characters with root context 813@ rules 814&z=·h # middle dot has a prefix mapping in the CLDR root 815&n=ә|и # и (U+0438) has contractions in the root 816* compare 817<1 l 818<2 l· # root mapping for l|· still works 819<1 z 820= ·h 821* compare 822<1 и 823<3 И 824<1 и\u0306 # root mapping for й=и\u0306 still works 825= й 826* compare 827<1 әn 828= әи 829<1 әo 830 831** test: many secondary CEs at the top of their range 832@ rules 833&[last primary ignorable]<<*\u2801-\u28ff 834* compare 835<2 \u0308 836<2 \u2801 837<2 \u2802 838<2 \u2803 839<2 \u2804 840<2 \u28fd 841<2 \u28fe 842<2 \u28ff 843<1 \x20 844 845** test: many tertiary CEs at the top of their range 846@ rules 847&[last secondary ignorable]<<<*a-z 848* compare 849<3 a 850<3 b 851<3 c 852<3 d 853# e..w 854<3 x 855<3 y 856<3 z 857<2 \u0308 858 859** test: tailor contraction together with nearly equivalent prefix, ICU ticket 10101 860@ rules 861&a=p|x &b=px &c=op 862* compare 863<1 b 864= px 865<3 B 866<1 c 867= op 868<3 C 869* compare 870<1 ca 871= opx # first contraction op, then prefix p|x 872<3 cA 873<3 Ca 874 875** test: reset position with prefix (pre-context), ICU ticket 10102 876@ rules 877&a=p|x &px=y 878* compare 879<1 pa 880= px 881= y 882<3 pA 883<1 q 884<1 x 885 886** test: prefix+contraction together (1), ICU ticket 10071 887@ rules 888&x=a|bc 889* compare 890<1 ab 891<1 Abc 892<1 abd 893<1 ac 894<1 aw 895<1 ax 896= abc 897<3 aX 898<3 Ax 899<1 b 900<1 bb 901<1 bc 902<3 bC 903<3 Bc 904<1 bd 905 906** test: prefix+contraction together (2), ICU ticket 10071 907@ rules 908&w=bc &x=a|b 909* compare 910<1 w 911= bc 912<3 W 913* compare 914<1 aw 915<1 ax 916= ab 917<3 aX 918<1 axb 919<1 axc 920= abc # prefix match a|b takes precedence over contraction match bc 921<3 abC 922<1 abd 923<1 ay 924 925** test: prefix+contraction together (3), ICU ticket 10071 926@ rules 927&x=a|b &w=bc # reverse order of rules as previous test, order should not matter here 928* compare # same "compare" sequences as previous test 929<1 w 930= bc 931<3 W 932* compare 933<1 aw 934<1 ax 935= ab 936<3 aX 937<1 axb 938<1 axc 939= abc # prefix match a|b takes precedence over contraction match bc 940<3 abC 941<1 abd 942<1 ay 943 944** test: no mapping p|c, falls back to contraction ch, CLDR ticket 5962 945@ rules 946&d=ch &v=p|ci 947* compare 948<1 pc 949<3 pC 950<1 pcH 951<1 pcI 952<1 pd 953= pch # no-prefix contraction ch matches 954<3 pD 955<1 pv 956= pci # prefix+contraction p|ci matches 957<3 pV 958 959** test: tailor in & around compact ranges of root primaries 960# The Ogham characters U+1681..U+169A are in simple ascending order of primary CEs 961# which should be reliably encoded as one range in the root elements data. 962@ rules 963&[before 1]ᚁ<a 964&ᚁ<b 965&[before 1]ᚂ<c 966&ᚂ<d 967&[before 1]ᚚ<y 968&ᚚ<z 969&[before 2]ᚁ<<r 970&ᚁ<<s 971&[before 3]ᚚ<<<t 972&ᚚ<<<u 973* compare 974<1 ᣵ # U+18F5 last Canadian Aboriginal 975<1 a 976<1 r 977<2 ᚁ 978<2 s 979<1 b 980<1 c 981<1 ᚂ 982<1 d 983<1 ᚃ 984<1 ᚙ 985<1 y 986<1 t 987<3 ᚚ 988<3 u 989<1 z 990<1 ᚠ # U+16A0 first Runic 991 992** test: suppressContractions 993@ rules 994&z<ch<әж [suppressContractions [·cә]] 995* compare 996<1 ch 997<3 cH # ch was suppressed 998<1 l 999<1 l· # primary difference, not secondary, because l|· was suppressed 1000<1 ә 1001<2 ә\u0308 # secondary difference, not primary, because contractions for ә were suppressed 1002<1 әж 1003<3 әЖ 1004 1005** test: Hangul & Jamo 1006@ rules 1007&L=\u1100 # first Jamo L 1008&V=\u1161 # first Jamo V 1009&T=\u11A8 # first Jamo T 1010&\uAC01<<*\u4E00-\u4EFF # first Hangul LVT syllable & lots of secondary diffs 1011* compare 1012<1 Lv 1013<3 LV 1014= \u1100\u1161 1015= \uAC00 1016<1 LVt 1017<3 LVT 1018= \u1100\u1161\u11A8 1019= \uAC00\u11A8 1020= \uAC01 1021<2 LVT\u0308 1022<2 \u4E00 1023<2 \u4E01 1024<2 \u4E80 1025<2 \u4EFF 1026<2 LV\u0308T 1027<1 \uAC02 1028 1029** test: adjust special reset positions according to previous rules, CLDR ticket 6070 1030@ rules 1031&[last variable]<x 1032[maxVariable space] # has effect only after building, no effect on following rules 1033&[last variable]<y 1034&[before 1][first regular]<z 1035* compare 1036<1 ? # some punctuation 1037<1 x 1038<1 y 1039<1 z 1040<1 $ # some symbol 1041 1042@ rules 1043&[last primary ignorable]<<x<<<y 1044&[last primary ignorable]<<z 1045* compare 1046<2 \u0358 1047<2 x 1048<3 y 1049<2 z 1050<1 \x20 1051 1052@ rules 1053&[last secondary ignorable]<<<x 1054&[last secondary ignorable]<<<y 1055* compare 1056<3 x 1057<3 y 1058<2 \u0358 1059 1060@ rules 1061&[before 2][first variable]<<z 1062&[before 2][first variable]<<y 1063&[before 3][first variable]<<<x 1064&[before 3][first variable]<<<w 1065&[before 1][first variable]<v 1066&[before 2][first variable]<<u 1067&[before 3][first variable]<<<t 1068&[before 2]\uFDD1\xA0<<s # FractionalUCA.txt: FDD1 00A0, SPACE first primary 1069* compare 1070<2 \u0358 1071<1 s 1072<2 \uFDD1\xA0 1073<1 t 1074<3 u 1075<2 v 1076<1 w 1077<3 x 1078<3 y 1079<2 z 1080<2 \t 1081 1082@ rules 1083&[before 2][first regular]<<z 1084&[before 3][first regular]<<<y 1085&[before 1][first regular]<x 1086&[before 3][first regular]<<<w 1087&[before 2]\uFDD1\u263A<<v # FractionalUCA.txt: FDD1 263A, SYMBOL first primary 1088&[before 3][first regular]<<<u 1089&[before 1][first regular]<p # primary before the boundary: becomes variable 1090&[before 3][first regular]<<<t # not affected by p 1091&[last variable]<q # after p! 1092* compare 1093<1 ? 1094<1 p 1095<1 q 1096<1 t 1097<3 u 1098<3 v 1099<1 w 1100<3 x 1101<1 y 1102<3 z 1103<1 $ 1104 1105# check that p & q are indeed variable 1106% alternate=shifted 1107* compare 1108= ? 1109= p 1110= q 1111<1 t 1112<3 u 1113<3 v 1114<1 w 1115<3 x 1116<1 y 1117<3 z 1118<1 $ 1119 1120@ rules 1121&[before 2][first trailing]<<z 1122&[before 1][first trailing]<y 1123&[before 3][first trailing]<<<x 1124* compare 1125<1 \u4E00 # first Han, first implicit 1126<1 \uFDD1\uFDD0 # FractionalUCA.txt: unassigned first primary 1127# Note: The root collator currently does not map any characters to the trailing first boundary primary. 1128<1 x 1129<3 y 1130<1 z 1131<2 \uFFFD # The root collator currently maps U+FFFD to the first real trailing primary. 1132 1133@ rules 1134&[before 2][first primary ignorable]<<z 1135&[before 2][first primary ignorable]<<y 1136&[before 3][first primary ignorable]<<<x 1137&[before 3][first primary ignorable]<<<w 1138* compare 1139= \x01 1140<2 w 1141<3 x 1142<3 y 1143<2 z 1144<2 \u0301 1145 1146@ rules 1147&[before 3][first secondary ignorable]<<<y 1148&[before 3][first secondary ignorable]<<<x 1149* compare 1150= \x01 1151<3 x 1152<3 y 1153<2 \u0301 1154 1155** test: canonical closure 1156@ rules 1157&X=A &U= 1158* compare 1159<1 U 1160=  1161= A\u0302 1162<2 Ú # U with acute 1163= U\u0301 1164= Ấ # A with circumflex & acute 1165= Â\u0301 1166= A\u0302\u0301 1167<1 X 1168= A 1169<2 X\u030A # with ring above 1170= Å 1171= A\u030A 1172= \u212B # Angstrom sign 1173 1174@ rules 1175&x=\u5140\u55C0 1176* compare 1177<1 x 1178= \u5140\u55C0 1179= \u5140\uFA0D 1180= \uFA0C\u55C0 1181= \uFA0C\uFA0D # CJK compatibility characters 1182<3 X 1183 1184# canonical closure on prefix rules, ICU ticket 9444 1185@ rules 1186&x=ä|ŝ 1187* compare 1188<1 äs # not tailored 1189<1 äx 1190= äŝ 1191= a\u0308s\u0302 1192= a\u0308ŝ 1193= äs\u0302 1194<3 äX 1195 1196** test: conjoining Jamo map to expansions 1197@ rules 1198&gg=\u1101 # Jamo Lead consonant GG 1199&nj=\u11AC # Jamo Trail consonant NJ 1200* compare 1201<1 gg\u1161nj 1202= \u1101\u1161\u11AC 1203= \uAE4C\u11AC 1204= \uAE51 1205<3 gg\u1161nJ 1206<1 \u1100\u1100 1207 1208** test: canonical tail closure, ICU ticket 5913 1209@ rules 1210&a<â 1211* compare 1212<1 a 1213<1 â # tailored 1214= a\u0302 1215<2 a\u0323\u0302 # discontiguous contraction 1216= ạ\u0302 # equivalent 1217= ậ # equivalent 1218<1 b 1219 1220@ rules 1221&a<ạ 1222* compare 1223<1 a 1224<1 ạ # tailored 1225= a\u0323 1226<2 a\u0323\u0302 # contiguous contraction plus extra diacritic 1227= ạ\u0302 # equivalent 1228= ậ # equivalent 1229<1 b 1230 1231# Tail closure should work even if there is a prefix and/or contraction. 1232@ rules 1233&a<\u5140|câ 1234# In order to find discontiguous contractions for \u5140|câ 1235# there must exist a mapping for \u5140|ca, regardless of what it maps to. 1236# (This follows from the UCA spec.) 1237&x=\u5140|ca 1238* compare 1239<1 \u5140a 1240= \uFA0Ca 1241<1 \u5140câ # tailored 1242= \uFA0Ccâ 1243= \u5140ca\u0302 1244= \uFA0Cca\u0302 1245<2 \u5140ca\u0323\u0302 # discontiguous contraction 1246= \uFA0Cca\u0323\u0302 1247= \u5140cạ\u0302 1248= \uFA0Ccạ\u0302 1249= \u5140cậ 1250= \uFA0Ccậ 1251<1 \u5140b 1252= \uFA0Cb 1253<1 \u5140x 1254= \u5140ca 1255 1256# Double-check that without the extra mapping there will be no discontiguous match. 1257@ rules 1258&a<\u5140|câ 1259* compare 1260<1 \u5140a 1261= \uFA0Ca 1262<1 \u5140câ # tailored 1263= \uFA0Ccâ 1264= \u5140ca\u0302 1265= \uFA0Cca\u0302 1266<1 \u5140b 1267= \uFA0Cb 1268<1 \u5140ca\u0323\u0302 # no discontiguous contraction 1269= \uFA0Cca\u0323\u0302 1270= \u5140cạ\u0302 1271= \uFA0Ccạ\u0302 1272= \u5140cậ 1273= \uFA0Ccậ 1274 1275@ rules 1276&a<cạ 1277* compare 1278<1 a 1279<1 cạ # tailored 1280= ca\u0323 1281<2 ca\u0323\u0302 # contiguous contraction plus extra diacritic 1282= cạ\u0302 # equivalent 1283= cậ # equivalent 1284<1 b 1285 1286# ᾢ = U+1FA2 GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI 1287# = 03C9 0313 0300 0345 1288# ccc = 0, 230, 230, 240 1289@ rules 1290&δ=αῳ 1291# In order to find discontiguous contractions for αῳ 1292# there must exist a mapping for αω, regardless of what it maps to. 1293# (This follows from the UCA spec.) 1294&ε=αω 1295* compare 1296<1 δ 1297= αῳ 1298= αω\u0345 1299<2 αω\u0313\u0300\u0345 # discontiguous contraction 1300= αὠ\u0300\u0345 1301= αὢ\u0345 1302= αᾢ 1303<2 αω\u0300\u0313\u0345 1304= αὼ\u0313\u0345 1305= αῲ\u0313 # not FCD 1306<1 ε 1307= αω 1308 1309# Double-check that without the extra mapping there will be no discontiguous match. 1310@ rules 1311&δ=αῳ 1312* compare 1313<1 αω\u0313\u0300\u0345 # no discontiguous contraction 1314= αὠ\u0300\u0345 1315= αὢ\u0345 1316= αᾢ 1317<2 αω\u0300\u0313\u0345 1318= αὼ\u0313\u0345 1319= αῲ\u0313 # not FCD 1320<1 δ 1321= αῳ 1322= αω\u0345 1323 1324# Add U+0315 COMBINING COMMA ABOVE RIGHT which has ccc=232. 1325# Tests code paths where the tailored string has a combining mark 1326# that does not occur in any composite's decomposition. 1327@ rules 1328&δ=αὼ\u0315 1329* compare 1330<1 αω\u0313\u0300\u0315 # Not tailored: The grave accent blocks the comma above. 1331= αὠ\u0300\u0315 1332= αὢ\u0315 1333<1 δ 1334= αὼ\u0315 1335= αω\u0300\u0315 1336<2 αω\u0300\u0315\u0345 1337= αὼ\u0315\u0345 1338= αῲ\u0315 # not FCD 1339 1340** test: danish a+a vs. a-umlaut, ICU ticket 9319 1341@ rules 1342&z<aa 1343* compare 1344<1 z 1345<1 aa 1346<2 aa\u0308 1347= aä 1348 1349** test: Jamo L with and in prefix 1350# Useful for the Korean "searchjl" tailoring (instead of contractions of pairs of Jamo L). 1351@ rules 1352# Jamo Lead consonant G after G or GG 1353&[last primary ignorable]<<\u1100|\u1100=\u1101|\u1100 1354# Jamo Lead consonant GG sorts like G+G 1355&\u1100\u1100=\u1101 1356# Note: Making G|GG and GG|GG sort the same as G|G+G 1357# would require the ability to reset on G|G+G, 1358# or we could make G-after-G equal to some secondary-CE character, 1359# and reset on a pair of those. 1360# (It does not matter much if there are at most two G in a row in real text.) 1361* compare 1362<1 \u1100 1363<2 \u1100\u1100 # only one primary from a sequence of G lead consonants 1364= \u1101 1365<2 \u1100\u1100\u1100 1366= \u1101\u1100 1367# but not = \u1100\u1101, see above 1368<1 \u1100\u1161 1369= \uAC00 1370<2 \u1100\u1100\u1161 1371= \u1100\uAC00 # prefix match from the L of the LV syllable 1372= \u1101\u1161 1373= \uAE4C 1374 1375** test: proposed Korean "searchjl" tailoring with prefixes, CLDR ticket 6546 1376@ rules 1377# Low secondary CEs for Jamo V & T. 1378# Note: T should sort before V for proper syllable order. 1379&\u0332 # COMBINING LOW LINE (first primary ignorable) 1380<<\u1161<<\u1162 1381 1382# Korean Jamo lead consonant search rules, part 2: 1383# Make modern compound L jamo primary equivalent to non-compound forms. 1384 1385# Secondary CEs for Jamo L-after-L, greater than Jamo V & T. 1386&\u0313 # COMBINING COMMA ABOVE (second primary ignorable) 1387=\u1100|\u1100 1388=\u1103|\u1103 1389=\u1107|\u1107 1390=\u1109|\u1109 1391=\u110C|\u110C 1392 1393# Compound L Jamo map to equivalent expansions of primary+secondary CE. 1394&\u1100\u0313=\u1101<<<\u3132 # HANGUL CHOSEONG SSANGKIYEOK, HANGUL LETTER SSANGKIYEOK 1395&\u1103\u0313=\u1104<<<\u3138 # HANGUL CHOSEONG SSANGTIKEUT, HANGUL LETTER SSANGTIKEUT 1396&\u1107\u0313=\u1108<<<\u3143 # HANGUL CHOSEONG SSANGPIEUP, HANGUL LETTER SSANGPIEUP 1397&\u1109\u0313=\u110A<<<\u3146 # HANGUL CHOSEONG SSANGSIOS, HANGUL LETTER SSANGSIOS 1398&\u110C\u0313=\u110D<<<\u3149 # HANGUL CHOSEONG SSANGCIEUC, HANGUL LETTER SSANGCIEUC 1399 1400* compare 1401<1 \u1100\u1161 1402= \uAC00 1403<2 \u1100\u1162 1404= \uAC1C 1405<2 \u1100\u1100\u1161 1406= \u1100\uAC00 1407= \u1101\u1161 1408= \uAE4C 1409<3 \u3132\u1161 1410 1411** test: Hangul syllables in prefix & in the interior of a contraction 1412@ rules 1413&x=\u1100\u1161|a\u1102\u1162z 1414* compare 1415<1 \u1100\u1161x 1416= \u1100\u1161a\u1102\u1162z 1417= \u1100\u1161a\uB0B4z 1418= \uAC00a\u1102\u1162z 1419= \uAC00a\uB0B4z 1420 1421** test: digits are unsafe-backwards when numeric=on 1422@ root 1423% numeric=on 1424* compare 1425# If digits are not unsafe, then numeric collation sees "1"=="01" and "b">"a". 1426# We need to back up before the identical prefix "1" and compare the full numbers. 1427<1 11b 1428<1 101a 1429 1430** test: simple locale data test 1431@ locale de 1432* compare 1433<1 a 1434<2 ä 1435<1 ae 1436<2 æ 1437 1438@ locale de-u-co-phonebk 1439* compare 1440<1 a 1441<1 ae 1442<2 ä 1443<2 æ 1444 1445# The following test cases were moved here from ICU 52's DataDrivenCollationTest.txt. 1446 1447** test: DataDrivenCollationTest/TestMorePinyin 1448# Testing the primary strength. 1449@ locale zh 1450% strength=primary 1451* compare 1452< lā 1453= lĀ 1454= Lā 1455= LĀ 1456< lān 1457= lĀn 1458< lē 1459= lĒ 1460= Lē 1461= LĒ 1462< lēn 1463= lĒn 1464 1465** test: DataDrivenCollationTest/TestLithuanian 1466# Lithuanian sort order. 1467@ locale lt 1468* compare 1469< cz 1470< č 1471< d 1472< iz 1473< j 1474< sz 1475< š 1476< t 1477< zz 1478< ž 1479 1480** test: DataDrivenCollationTest/TestLatvian 1481# Latvian sort order. 1482@ locale lv 1483* compare 1484< cz 1485< č 1486< d 1487< gz 1488< ģ 1489< h 1490< iz 1491< j 1492< kz 1493< ķ 1494< l 1495< lz 1496< ļ 1497< m 1498< nz 1499< ņ 1500< o 1501< rz 1502< ŗ 1503< s 1504< sz 1505< š 1506< t 1507< zz 1508< ž 1509 1510** test: DataDrivenCollationTest/TestEstonian 1511# Estonian sort order. 1512@ locale et 1513* compare 1514< sy 1515< š 1516< šy 1517< z 1518< zy 1519< ž 1520< v 1521< va 1522< w 1523< õ 1524< õy 1525< ä 1526< äy 1527< ö 1528< öy 1529< ü 1530< üy 1531< x 1532 1533** test: DataDrivenCollationTest/TestAlbanian 1534# Albanian sort order. 1535@ locale sq 1536* compare 1537< cz 1538< ç 1539< d 1540< dz 1541< dh 1542< e 1543< ez 1544< ë 1545< f 1546< gz 1547< gj 1548< h 1549< lz 1550< ll 1551< m 1552< nz 1553< nj 1554< o 1555< rz 1556< rr 1557< s 1558< sz 1559< sh 1560< t 1561< tz 1562< th 1563< u 1564< xz 1565< xh 1566< y 1567< zz 1568< zh 1569 1570** test: DataDrivenCollationTest/TestSimplifiedChineseOrder 1571# Sorted file has different order. 1572@ root 1573# normalization=on turned on & off automatically. 1574* compare 1575< \u5F20 1576< \u5F20\u4E00\u8E3F 1577 1578** test: DataDrivenCollationTest/TestTibetanNormalizedIterativeCrash 1579# This pretty much crashes. 1580@ root 1581* compare 1582< \u0f71\u0f72\u0f80\u0f71\u0f72 1583< \u0f80 1584 1585** test: DataDrivenCollationTest/TestThaiPartialSortKeyProblems 1586# These are examples of strings that caused trouble in partial sort key testing. 1587@ locale th-TH 1588* compare 1589< \u0E01\u0E01\u0E38\u0E18\u0E20\u0E31\u0E13\u0E11\u0E4C 1590< \u0E01\u0E01\u0E38\u0E2A\u0E31\u0E19\u0E42\u0E18 1591* compare 1592< \u0E01\u0E07\u0E01\u0E32\u0E23 1593< \u0E01\u0E07\u0E42\u0E01\u0E49 1594* compare 1595< \u0E01\u0E23\u0E19\u0E17\u0E32 1596< \u0E01\u0E23\u0E19\u0E19\u0E40\u0E0A\u0E49\u0E32 1597* compare 1598< \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E22\u0E27 1599< \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E4A\u0E22\u0E27 1600* compare 1601< \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E2D 1602< \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E49\u0E32 1603 1604** test: DataDrivenCollationTest/TestJavaStyleRule 1605# java.text allows rules to start as '<<<x<<<y...' 1606# we emulate this by assuming a &[first tertiary ignorable] in this case. 1607@ rules 1608&\u0001=equal<<<z<<x<<<w &[first tertiary ignorable]=a &[first primary ignorable]=b 1609* compare 1610= a 1611= equal 1612< z 1613< x 1614= b # x had become the new first primary ignorable 1615< w 1616 1617** test: DataDrivenCollationTest/TestShiftedIgnorable 1618# The UCA states that primary ignorables should be completely 1619# ignorable when following a shifted code point. 1620@ root 1621% alternate=shifted 1622% strength=quaternary 1623* compare 1624< a\u0020b 1625= a\u0020\u0300b 1626= a\u0020\u0301b 1627< a_b 1628= a_\u0300b 1629= a_\u0301b 1630< A\u0020b 1631= A\u0020\u0300b 1632= A\u0020\u0301b 1633< A_b 1634= A_\u0300b 1635= A_\u0301b 1636< a\u0301b 1637< A\u0301b 1638< a\u0300b 1639< A\u0300b 1640 1641** test: DataDrivenCollationTest/TestNShiftedIgnorable 1642# The UCA states that primary ignorables should be completely 1643# ignorable when following a shifted code point. 1644@ root 1645% alternate=non-ignorable 1646% strength=tertiary 1647* compare 1648< a\u0020b 1649< A\u0020b 1650< a\u0020\u0301b 1651< A\u0020\u0301b 1652< a\u0020\u0300b 1653< A\u0020\u0300b 1654< a_b 1655< A_b 1656< a_\u0301b 1657< A_\u0301b 1658< a_\u0300b 1659< A_\u0300b 1660< a\u0301b 1661< A\u0301b 1662< a\u0300b 1663< A\u0300b 1664 1665** test: DataDrivenCollationTest/TestSafeSurrogates 1666# It turned out that surrogates were not skipped properly 1667# when iterating backwards if they were in the middle of a 1668# contraction. This test assures that this is fixed. 1669@ rules 1670&a < x\ud800\udc00b 1671* compare 1672< a 1673< x\ud800\udc00b 1674 1675** test: DataDrivenCollationTest/da_TestPrimary 1676# This test goes through primary strength cases 1677@ locale da 1678% strength=primary 1679* compare 1680< Lvi 1681< Lwi 1682* compare 1683< L\u00e4vi 1684< L\u00f6wi 1685* compare 1686< L\u00fcbeck 1687= Lybeck 1688 1689** test: DataDrivenCollationTest/da_TestTertiary 1690# This test goes through tertiary strength cases 1691@ locale da 1692% strength=tertiary 1693* compare 1694< Luc 1695< luck 1696* compare 1697< luck 1698< L\u00fcbeck 1699* compare 1700< lybeck 1701< L\u00fcbeck 1702* compare 1703< L\u00e4vi 1704< L\u00f6we 1705* compare 1706< L\u00f6ww 1707< mast 1708 1709* compare 1710< A/S 1711< ANDRE 1712< ANDR\u00c9 1713< ANDREAS 1714< AS 1715< CA 1716< \u00c7A 1717< CB 1718< \u00c7C 1719< D.S.B. 1720< DA 1721< \u00d0A 1722< DB 1723< \u00d0C 1724< DSB 1725< DSC 1726< EKSTRA_ARBEJDE 1727< EKSTRABUD0 1728< H\u00d8ST 1729< HAAG 1730< H\u00c5NDBOG 1731< HAANDV\u00c6RKSBANKEN 1732< Karl 1733< karl 1734< NIELS\u0020J\u00d8RGEN 1735< NIELS-J\u00d8RGEN 1736< NIELSEN 1737< R\u00c9E,\u0020A 1738< REE,\u0020B 1739< R\u00c9E,\u0020L 1740< REE,\u0020V 1741< SCHYTT,\u0020B 1742< SCHYTT,\u0020H 1743< SCH\u00dcTT,\u0020H 1744< SCHYTT,\u0020L 1745< SCH\u00dcTT,\u0020M 1746< SS 1747< \u00df 1748< SSA 1749< STORE\u0020VILDMOSE 1750< STOREK\u00c6R0 1751< STORM\u0020PETERSEN 1752< STORMLY 1753< THORVALD 1754< THORVARDUR 1755< \u00feORVAR\u00d0UR 1756< THYGESEN 1757< VESTERG\u00c5RD,\u0020A 1758< VESTERGAARD,\u0020A 1759< VESTERG\u00c5RD,\u0020B 1760< \u00c6BLE 1761< \u00c4BLE 1762< \u00d8BERG 1763< \u00d6BERG 1764 1765* compare 1766< andere 1767< chaque 1768< chemin 1769< cote 1770< cot\u00e9 1771< c\u00f4te 1772< c\u00f4t\u00e9 1773< \u010du\u010d\u0113t 1774< Czech 1775< hi\u0161a 1776< irdisch 1777< lie 1778< lire 1779< llama 1780< l\u00f5ug 1781< l\u00f2za 1782< lu\u010d 1783< luck 1784< L\u00fcbeck 1785< lye 1786< l\u00e4vi 1787< L\u00f6wen 1788< m\u00e0\u0161ta 1789< m\u00eer 1790< myndig 1791< M\u00e4nner 1792< m\u00f6chten 1793< pi\u00f1a 1794< pint 1795< pylon 1796< \u0161\u00e0ran 1797< savoir 1798< \u0160erb\u016bra 1799< Sietla 1800< \u015blub 1801< subtle 1802< symbol 1803< s\u00e4mtlich 1804< verkehrt 1805< vox 1806< v\u00e4ga 1807< waffle 1808< wood 1809< yen 1810< yuan 1811< yucca 1812< \u017eal 1813< \u017eena 1814< \u017den\u0113va 1815< zoo0 1816< Zviedrija 1817< Z\u00fcrich 1818< zysk0 1819< \u00e4ndere 1820 1821** test: DataDrivenCollationTest/hi_TestNewRules 1822# This test goes through new rules and tests against old rules 1823@ locale hi 1824* compare 1825< कॐ 1826< कं 1827< कँ 1828< कः 1829 1830** test: DataDrivenCollationTest/ro_TestNewRules 1831# This test goes through new rules and tests against old rules 1832@ locale ro 1833* compare 1834< xAx 1835< xă 1836< xĂ 1837< Xă 1838< XĂ 1839< xăx 1840< xĂx 1841< xâ 1842< x 1843< Xâ 1844< X 1845< xâx 1846< xÂx 1847< xb 1848< xIx 1849< xî 1850< xÎ 1851< Xî 1852< XÎ 1853< xîx 1854< xÎx 1855< xj 1856< xSx 1857< xș 1858= xş 1859< xȘ 1860= xŞ 1861< Xș 1862= Xş 1863< XȘ 1864= XŞ 1865< xșx 1866= xşx 1867< xȘx 1868= xŞx 1869< xT 1870< xTx 1871< xț 1872= xţ 1873< xȚ 1874= xŢ 1875< Xț 1876= Xţ 1877< XȚ 1878= XŢ 1879< xțx 1880= xţx 1881< xȚx 1882= xŢx 1883< xU 1884 1885** test: DataDrivenCollationTest/testOffsets 1886# This tests cases where forwards and backwards iteration get different offsets 1887@ locale en 1888% strength=tertiary 1889* compare 1890< a\uD800\uDC00\uDC00 1891< b\uD800\uDC00\uDC00 1892* compare 1893< \u0301A\u0301\u0301 1894< \u0301B\u0301\u0301 1895* compare 1896< abcd\r\u0301 1897< abce\r\u0301 1898# TODO: test offsets in new CollationTest 1899 1900# End of test cases moved here from ICU 52's DataDrivenCollationTest.txt. 1901 1902** test: was ICU 52 cmsccoll/TestRedundantRules 1903@ rules 1904& a < b < c < d& [before 1] c < m 1905* compare 1906<1 a 1907<1 b 1908<1 m 1909<1 c 1910<1 d 1911 1912@ rules 1913& a < b <<< c << d <<< e& [before 3] e <<< x 1914* compare 1915<1 a 1916<1 b 1917<3 c 1918<2 d 1919<3 x 1920<3 e 1921 1922@ rules 1923& a < b <<< c << d <<< e <<< f < g& [before 1] g < x 1924* compare 1925<1 a 1926<1 b 1927<3 c 1928<2 d 1929<3 e 1930<3 f 1931<1 x 1932<1 g 1933 1934@ rules 1935& a <<< b << c < d& a < m 1936* compare 1937<1 a 1938<3 b 1939<2 c 1940<1 m 1941<1 d 1942 1943@ rules 1944&a<b<<b\u0301 &z<b 1945* compare 1946<1 a 1947<1 b\u0301 1948<1 z 1949<1 b 1950 1951@ rules 1952&z<m<<<q<<<m 1953* compare 1954<1 z 1955<1 q 1956<3 m 1957 1958@ rules 1959&z<<<m<q<<<m 1960* compare 1961<1 z 1962<1 q 1963<3 m 1964 1965@ rules 1966& a < b < c < d& r < c 1967* compare 1968<1 a 1969<1 b 1970<1 d 1971<1 r 1972<1 c 1973 1974@ rules 1975& a < b < c < d& c < m 1976* compare 1977<1 a 1978<1 b 1979<1 c 1980<1 m 1981<1 d 1982 1983@ rules 1984& a < b < c < d& a < m 1985* compare 1986<1 a 1987<1 m 1988<1 b 1989<1 c 1990<1 d 1991 1992** test: was ICU 52 cmsccoll/TestExpansionSyntax 1993# The following two rules should sort the particular list of strings the same. 1994@ rules 1995&AE <<< a << b <<< c &d <<< f 1996* compare 1997<1 AE 1998<3 a 1999<2 b 2000<3 c 2001<1 d 2002<3 f 2003 2004@ rules 2005&A <<< a / E << b / E <<< c /E &d <<< f 2006* compare 2007<1 AE 2008<3 a 2009<2 b 2010<3 c 2011<1 d 2012<3 f 2013 2014# The following two rules should sort the particular list of strings the same. 2015@ rules 2016&AE <<< a <<< b << c << d < e < f <<< g 2017* compare 2018<1 AE 2019<3 a 2020<3 b 2021<2 c 2022<2 d 2023<1 e 2024<1 f 2025<3 g 2026 2027@ rules 2028&A <<< a / E <<< b / E << c / E << d / E < e < f <<< g 2029* compare 2030<1 AE 2031<3 a 2032<3 b 2033<2 c 2034<2 d 2035<1 e 2036<1 f 2037<3 g 2038 2039# The following two rules should sort the particular list of strings the same. 2040@ rules 2041&AE <<< B <<< C / D <<< F 2042* compare 2043<1 AE 2044<3 B 2045<3 F 2046<1 AED 2047<3 C 2048 2049@ rules 2050&A <<< B / E <<< C / ED <<< F / E 2051* compare 2052<1 AE 2053<3 B 2054<3 F 2055<1 AED 2056<3 C 2057 2058** test: never reorder trailing primaries 2059@ root 2060% reorder Zzzz Grek 2061* compare 2062<1 L 2063<1 字 2064<1 Ω 2065<1 \uFFFD 2066<1 \uFFFF 2067 2068** test: fall back to mappings with shorter prefixes, not immediately to ones with no prefixes 2069@ rules 2070&u=ab|cd 2071&v=b|ce 2072* compare 2073<1 abc 2074<1 abcc 2075<1 abcf 2076<1 abcd 2077= abu 2078<1 abce 2079= abv 2080 2081# With the following rules, there is only one prefix per composite ĉ or ç, 2082# but both prefixes apply to just c in NFD form. 2083# We would get different results for composed vs. NFD input 2084# if we fell back directly from longest-prefix mappings to no-prefix mappings. 2085@ rules 2086&x=op|ĉ 2087&y=p|ç 2088* compare 2089<1 opc 2090<2 opć 2091<1 opcz 2092<1 opd 2093<1 opĉ 2094= opc\u0302 2095= opx 2096<1 opç 2097= opc\u0327 2098= opy 2099 2100# The mapping is used which has the longest matching prefix for which 2101# there is also a suffix match, with the longest suffix match among several for that prefix. 2102@ rules 2103&❶=d 2104&❷=de 2105&❸=def 2106&①=c|d 2107&②=c|de 2108&③=c|def 2109&④=bc|d 2110&⑤=bc|de 2111&⑥=bc|def 2112&⑦=abc|d 2113&⑧=abc|de 2114&⑨=abc|def 2115* compare 2116<1 9aadzz 2117= 9aa❶zz 2118<1 9aadez 2119= 9aa❷z 2120<1 9aadef 2121= 9aa❸ 2122<1 9acdzz 2123= 9ac①zz 2124<1 9acdez 2125= 9ac②z 2126<1 9acdef 2127= 9ac③ 2128<1 9bcdzz 2129= 9bc④zz 2130<1 9bcdez 2131= 9bc⑤z 2132<1 9bcdef 2133= 9bc⑥ 2134<1 abcdzz 2135= abc⑦zz 2136<1 abcdez 2137= abc⑧z 2138<1 abcdef 2139= abc⑨ 2140 2141** test: prefix + discontiguous contraction with missing prefix contraction 2142# Unfortunate terminology: The first "prefix" here is the pre-context, 2143# the second "prefix" refers to the contraction/relation string that is 2144# one shorter than the one being tested. 2145@ rules 2146&x=p|e 2147&y=p|ê 2148&z=op|ê 2149# No mapping for op|e: 2150# Discontiguous contraction matching should not match op|ê in opệ 2151# because it would have to skip the dot below and extend a match on op|e by the circumflex, 2152# but there is no match on op|e. 2153* compare 2154<1 oPe 2155<1 ope 2156= opx 2157<1 opệ 2158= opy\u0323 # y not z 2159<1 opê 2160= opz 2161 2162# We cannot test for fallback by whether the contraction default CE32 2163# is for another contraction. With the following rules, there is no mapping for op|e, 2164# and the fallback to prefix p has no contractions. 2165@ rules 2166&x=p|e 2167&z=op|ê 2168* compare 2169<1 oPe 2170<1 ope 2171= opx 2172<2 opệ 2173= opx\u0323\u0302 # x not z 2174<1 opê 2175= opz 2176 2177# One more variation: Fallback to the simple code point, no shorter non-empty prefix. 2178@ rules 2179&x=e 2180&z=op|ê 2181* compare 2182<1 ope 2183= opx 2184<3 oPe 2185= oPx 2186<2 opệ 2187= opx\u0323\u0302 # x not z 2188<1 opê 2189= opz 2190 2191** test: maxVariable via rules 2192@ rules 2193[maxVariable space][alternate shifted] 2194* compare 2195= \u0020 2196= \u000A 2197<1 . 2198<1 ° # degree sign 2199<1 $ 2200<1 0 2201 2202** test: maxVariable via setting 2203@ root 2204% maxVariable=currency 2205% alternate=shifted 2206* compare 2207= \u0020 2208= \u000A 2209= . 2210= ° # degree sign 2211= $ 2212<1 0 2213 2214** test: ICU4J CollationMiscTest/TestContractionClosure (ää) 2215# This tests canonical closure, but it also tests that CollationFastLatin 2216# bails out properly for contractions with combining marks. 2217# For that we need pairs of strings that remain in the Latin fastpath 2218# long enough, hence the extra "= b" lines. 2219@ rules 2220&b=\u00e4\u00e4 2221* compare 2222<1 b 2223= \u00e4\u00e4 2224= b 2225= a\u0308a\u0308 2226= b 2227= \u00e4a\u0308 2228= b 2229= a\u0308\u00e4 2230 2231** test: ICU4J CollationMiscTest/TestContractionClosure (Å) 2232@ rules 2233&b=\u00C5 2234* compare 2235<1 b 2236= \u00C5 2237= b 2238= A\u030A 2239= b 2240= \u212B 2241 2242** test: reset-before on already-tailored characters, ICU ticket 10108 2243@ rules 2244&a<w<<x &[before 2]x<<y 2245* compare 2246<1 a 2247<1 w 2248<2 y 2249<2 x 2250 2251@ rules 2252&a<<w<<<x &[before 2]x<<y 2253* compare 2254<1 a 2255<2 y 2256<2 w 2257<3 x 2258 2259@ rules 2260&a<w<x &[before 2]x<<y 2261* compare 2262<1 a 2263<1 w 2264<1 y 2265<2 x 2266 2267@ rules 2268&a<w<<<x &[before 2]x<<y 2269* compare 2270<1 a 2271<1 y 2272<2 w 2273<3 x 2274 2275** test: numeric collation with other settings, ICU ticket 9092 2276@ root 2277% strength=identical 2278% caseFirst=upper 2279% numeric=on 2280* compare 2281<1 100\u0020a 2282<1 101 2283 2284** test: collation type fallback from unsupported type, ICU ticket 10149 2285@ locale fr-CA-u-co-phonebk 2286# Expect the same result as with fr-CA, using backwards-secondary order. 2287# That is, we should fall back from the unsupported collation type 2288# to the locale's default collation type. 2289* compare 2290<1 cote 2291<2 côte 2292<2 coté 2293<2 côté 2294 2295** test: @ is equivalent to [backwards 2], ICU ticket 9956 2296@ rules 2297&b<a @ &v<<w 2298* compare 2299<1 b 2300<1 a 2301<1 cote 2302<2 côte 2303<2 coté 2304<2 côté 2305<1 v 2306<2 w 2307<1 x 2308 2309** test: shifted+reordering, ICU ticket 9507 2310@ root 2311% reorder Grek punct space 2312% alternate=shifted 2313% strength=quaternary 2314# Which primaries are "variable" should be determined without script reordering, 2315# and then primaries should be reordered whether they are shifted to quaternary or not. 2316* compare 2317<4 ( # punctuation 2318<4 ) 2319<4 \u0020 # space 2320<1 ` # symbol 2321<1 ^ 2322<1 $ # currency symbol 2323<1 € 2324<1 0 # numbers 2325<1 ε # Greek 2326<1 e # Latin 2327<1 e(e 2328<4 e)e 2329<4 e\u0020e 2330<4 ee 2331<3 e(E 2332<4 e)E 2333<4 e\u0020E 2334<4 eE 2335 2336** test: "uppercase first" could sort a string before its prefix, ICU ticket 9351 2337@ rules 2338&\u0001<<<b<<<B 2339% caseFirst=upper 2340* compare 2341<1 aaa 2342<3 aaaB 2343 2344** test: secondary+case ignores secondary ignorables, ICU ticket 9355 2345@ rules 2346&\u0001<<<b<<<B 2347% strength=secondary 2348% caseLevel=on 2349* compare 2350<1 a 2351= ab 2352= aB 2353 2354** test: custom collation rules involving tail of a contraction in Malayalam, ICU ticket 6328 2355@ rules 2356&[before 2] ൌ << ൗ # U+0D57 << U+0D4C == 0D46+0D57 2357* compare 2358<1 ൗx 2359<2 ൌx 2360<1 ൗy 2361<2 ൌy 2362 2363** test: quoted apostrophe in compact syntax, ICU ticket 8204 2364@ rules 2365&q<<*a''c 2366* compare 2367<1 d 2368<1 p 2369<1 q 2370<2 a 2371<2 \u0027 2372<2 c 2373<1 r 2374 2375# ICU ticket #8260 "Support all collation-related keywords in Collator.getInstance()" 2376** test: locale -u- with collation keywords, ICU ticket 8260 2377@ locale de-u-kv-sPace-ka-shifTed-kn-kk-falsE-kf-Upper-kc-tRue-ks-leVel4 2378* compare 2379<4 \u0020 # space is shifted, strength=quaternary 2380<1 ! # punctuation is regular 2381<1 2 2382<1 12 # numeric sorting 2383<1 B 2384<c b # uppercase first on case level 2385<1 x\u0301\u0308 2386<2 x\u0308\u0301 # normalization off 2387 2388** test: locale @ with collation keywords, ICU ticket 8260 2389@ locale fr@colbAckwards=yes;ColStrength=Quaternary;kv=currencY;colalternate=shifted 2390* compare 2391<4 $ # currency symbols are shifted, strength=quaternary 2392<1 àla 2393<2 alà # backwards secondary level 2394 2395** test: locale -u- with script reordering, ICU ticket 8260 2396@ locale el-u-kr-kana-SYMBOL-Grek-hani-cyrl-latn-digit-armn-deva-ethi-thai 2397* compare 2398<1 \u0020 2399<1 あ 2400<1 ☂ 2401<1 Ω 2402<1 丂 2403<1 ж 2404<1 L 2405<1 4 2406<1 Ձ 2407<1 अ 2408<1 ሄ 2409<1 ฉ 2410 2411** test: locale @collation=type should be case-insensitive 2412@ locale de@coLLation=PhoneBook 2413* compare 2414<1 ae 2415<2 ä 2416<3 Ä 2417 2418** test: import root search rules plus German phonebook rules, ICU ticket 8962 2419@ locale de-u-co-search 2420* compare 2421<1 = 2422<1 ≠ 2423<1 a 2424<1 ae 2425<2 ä 2426 2427# Once more, but with runtime builder. 2428@ rules 2429[import und-u-co-search][import de-u-co-phonebk] 2430* compare 2431<1 = 2432<1 ≠ 2433<1 a 2434<1 ae 2435<2 ä 2436 2437# Once again, with import from "root" not "und" (as in a proper language tag). 2438@ rules 2439[import root-u-co-search][import de-u-co-phonebk] 2440* compare 2441<1 = 2442<1 ≠ 2443<1 a 2444<1 ae 2445<2 ä 2446 2447** test: import rules from a language with non-Latin native script, and reset the reordering, ICU ticket 10998 2448# Greek should sort Greek first. 2449@ rules 2450[import el] 2451* compare 2452<1 4 2453<1 Ω 2454<1 L 2455 2456# Import Greek, and then reset the reordering. 2457@ rules 2458[import el][reorder Zzzz] 2459* compare 2460<1 4 2461<1 L 2462<1 Ω 2463 2464# "others" is a synonym for Zzzz. 2465@ rules 2466[import el][reorder others] 2467* compare 2468<1 4 2469<1 L 2470<1 Ω 2471 2472** test: regression test for CollationFastLatinBuilder, ICU ticket 11388 2473@ rules 2474&x<<aa<<<Aa<<<AA 2475% strength=secondary 2476* compare 2477<1 AA 2478<2 Aẩ 2479<2 aą 2480* compare 2481<1 AA 2482<2 aą 2483 2484** test: tailor tertiary-after a common tertiary where there is a lower one 2485# Assume that Hiragana small A has a below-common tertiary, and Hiragana A has a common one. 2486# See ICU ticket 11448 & CLDR ticket 7222. 2487@ rules 2488&あ<<<x<<<y<<<z 2489* compare 2490<1 ぁ 2491<3 あ 2492<3 x 2493<3 y 2494<3 z 2495<3 ァ 2496<1 い 2497 2498** test: tailor tertiary-after a below-common tertiary 2499@ rules 2500&ぁ<<<x<<<y<<<z 2501* compare 2502<1 ぁ 2503<3 x 2504<3 y 2505<3 z 2506<3 あ 2507<3 ァ 2508<1 い 2509 2510** test: tailor tertiary-before a common tertiary where there is a lower one 2511@ rules 2512&[before 3]あ<<<x<<<y<<<z 2513* compare 2514<1 ぁ 2515<3 x 2516<3 y 2517<3 z 2518<3 あ 2519<3 ァ 2520<1 い 2521 2522** test: tailor tertiary-before a below-common tertiary 2523@ rules 2524&[before 3]ぁ<<<x<<<y<<<z 2525* compare 2526<1 x 2527<3 y 2528<3 z 2529<3 ぁ 2530<3 あ 2531<3 ァ 2532<1 い 2533 2534** test: reorder single scripts not groups, ICU ticket 11449 2535@ root 2536% reorder Goth Latn 2537* compare 2538<1 4 2539<1 # Gothic 2540<1 L 2541<1 Ω 2542# Before ICU 55, the following reordered together with Gothic. 2543<1 # Old Italic 2544<1 # Shavian 2545