code point order | , | Z | a | y | ü | ☹️ | ✈️️ | 글 | 😀 |
en | , | ☹️ | ✈️️ | 😀 | a | ü | y | Z | 글 |
en-u-co-emoji | , | 😀 | ☹️ | ✈️️ | a | ü | y | Z | 글 |
da | , | ☹️ | ✈️️ | 😀 | a | y | ü | Z | 글 |
da-u-co-emoji | , | 😀 | ☹️ | ✈️️ | a | ü | y | Z | 글 |
combined rules | , | 😀 | ☹️ | ✈️️ | a | y | ü | Z | 글 |
BCP47 Key | BCP47 Value | Rule Syntax | Description |
---|---|---|---|
ks | level1 | [strength 1] (primary) |
Sets the default strength for comparison, as described in the [UCA]. Note that a strength setting of greater than 4 may have the same effect as identical, depending on the locale and implementation. |
level2 | [strength 2] (secondary) | ||
level3 | [strength 3] (tertiary) | ||
level4 | [strength 4] (quaternary) | ||
identic | [strength I] (identical) | ||
ka | noignore | [alternate non-ignorable] |
Sets alternate handling for variable weights, as described in [UCA], where "shifted" causes certain characters to be ignored in comparison. The default for LDML is different than it is in the UCA. In LDML, the default for alternate handling is non-ignorable, while in UCA it is shifted. In addition, in LDML only whitespace and punctuation are variable by default. |
shifted | [alternate shifted] (UCA default) | ||
n/a | n/a (blanked) | ||
kb | true | [backwards 2] |
Sets the comparison for the second level to be backwards, as described in [UCA]. |
false | n/a | ||
kk | true | [normalization on] (UCA default) |
If on, then the normal [UCA] algorithm is used. If off, then most strings should still sort correctly despite not normalizing to NFD first. Note that the default for CLDR locales may be different than in the UCA. The rules for particular locales have it set to on: those locales whose exemplar characters (in forms commonly interchanged) would be affected by normalization. |
false | [normalization off] | ||
kc | true | [caseLevel on] |
If set to on, a level consisting only of case characteristics will be inserted in front of tertiary level, as a "Level 2.5". To ignore accents but take case into account, set strength to primary and case level to on. For details, see Section 3.14, Case Parameters . |
false | [caseLevel off] | ||
kf | upper | [caseFirst upper] |
If set to upper, causes upper case to sort before lower case. If set to lower, causes lower case to sort before upper case. Useful for locales that have already supported ordering but require different order of cases. Affects case and tertiary levels. For details, see Section 3.14, Case Parameters . |
lower | [caseFirst lower] | ||
false | [caseFirst off] | ||
kh | true Deprecated: Use rules with quaternary relations instead. | [hiraganaQ on] |
Controls special treatment of Hiragana code points on quaternary level. If turned on, Hiragana codepoints will get lower values than all the other non-variable code points in shifted. That is, the normal Level 4 value for a regular collation element is FFFF, as described in [UCA], Section 3.6, Variable Weighting . This is changed to FFFE for [:script=Hiragana:] characters. The strength must be greater or equal than quaternary if this attribute is to have any effect. |
false | [hiraganaQ off] | ||
kn | true | [numericOrdering on] |
If set to on, any sequence of Decimal Digits (General_Category = Nd in the [UAX44]) is sorted at a primary level with its numeric value. For example, "A-21" < "A-123". The computed primary weights are all at the start of the digit reordering group. Thus with an untailored UCA table, "a$" < "a0" < "a2" < "a12" < "a⓪" < "aa". |
false | [numericOrdering off] | ||
kr | a sequence of one or more reorder codes: space, punct, symbol, currency, digit, or any BCP47 script ID | [reorder Grek digit] |
Specifies a reordering of scripts or other significant blocks of characters such as symbols, punctuation, and digits. For the precise meaning and usage of the reorder codes, see Section 3.13, Collation Reordering. |
kv | space | [maxVariable space] |
Sets the variable top to the top of the specified reordering group. All code points with primary weights less than or equal to the variable top will be considered variable, and thus affected by the alternate handling. Variables are ignorable by default in [UCA], but not in CLDR. |
punct | [maxVariable punct] | ||
symbol | [maxVariable symbol] (UCA default) | ||
currency | [maxVariable currency] | ||
vt | See Part 1 Section 3.6.4, U Extension Data Files. Deprecated: Use maxVariable instead. | &\u00XX\uYYYY < [variable top] (the default is set to the highest punctuation, thus including spaces and punctuation, but not symbols) |
The BCP47 value is described in Appendix Q: Locale Extension Keys and Types. Sets the string value for the variable top. All the code points with primary weights less than or equal to the variable top will be considered variable, and thus affected by the alternate handling. An implementation that supports the variableTop setting should also support the maxVariable setting, and it should "pin" ("round up") the variableTop to the top of the containing reordering group. Variables are ignorable by default in [UCA], but not in CLDR. See below for more information. |
n/a | n/a | n/a | match-boundaries: none | whole-character | whole-word Defined by Section 8, Searching and Matching of [UCA]. |
n/a | n/a | n/a | match-style: minimal | medial | maximal Defined by Section 8, Searching and Matching of [UCA]. |
Case Level | Strength | Original CE | Modified CE | Comment |
---|---|---|---|---|
on | primary | 0.S.t | 0.0 | ignore case level weights of primary-ignorable CEs |
p.s.t | p.c | |||
secondary or higher | 0.0.T | 0.0.0.T | ignore case level weights of secondary-ignorable CEs | |
0.S.t | 0.S.c.t | |||
p.s.t | p.s.c.t | |||
off | any | 0.0.0 | 0.0.00 | ignore case level weights of tertiary-ignorable CEs |
0.0.T | 0.0.3T | |||
0.S.t | 0.S.ct | |||
p.s.t | p.s.ct |
A
B
``` These indicate the boundaries of "buckets" that can be used for indexing. They are always two characters starting with the noncharacter U+FDD0, and thus will not occur in normal text. For pinyin the second character is A-Z; for unihan it is one of the radicals; and for stroke it is a character after U+2800 indicating the number of strokes, such as ⠁. For zhuyin the second character is one of the standard Bopomofo characters in the range U+3105 through U+3129. The corresponding bucket label strings are the boundary strings with the leading U+FDD0 removed. For example, the Pinyin boundary string "\\uFDD0A" yields the label string "A". However, for stroke order, the label string is the stroke count (second character minus U+2800) as a decimal-digit number followed by 劃 (U+5283). For example, the stroke order boundary string "\\uFDD0\\u2805" yields the label string "5劃". * * * Copyright © 2001–2022 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode [Terms of Use](https://www.unicode.org/copyright.html) apply. Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.