• Home
  • Raw
  • Download

Lines Matching +full:alert +full:- +full:comment +full:- +full:cc +full:- +full:users

6 |-------|----------|
8 |Date|2024-04-16|
9 … href="https://www.unicode.org/reports/tr35/tr35-72/tr35.html">https://www.unicode.org/reports/tr3…
10 … href="https://www.unicode.org/reports/tr35/tr35-71/tr35.html">https://www.unicode.org/reports/tr3…
23 Some links may lead to in-development or older
25 See <https://cldr.unicode.org> for up-to-date CLDR release data.
29 <!-- _This is a draft document which may be updated, replaced, or superseded by other documents at …
31 …a stable document; it is inappropriate to cite this document as other than a work in progress._ -->
38 … with the CLDR bug reporting form [[Bugs](https://cldr.unicode.org/index/bug-reports)]. Related in…
47 * Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
48 * Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
49 * Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
50 * Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
51 * Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
52 * Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
53 * Part 8: [Person Names](tr35-personNames.md#Contents) (person names)
54 * Part 9: [MessageFormat](tr35-messageFormat.md#Contents) (message format)
82 * [Special Script Codes](#special-script-codes)
108 * Table: [Lookup Differences](#Lookup-Differences)
113 * [Region-Priority Inheritance](#Region_Priority_Inheritance)
141 * [UnicodeSet syntax](#unicodeset-syntax)
142 * [Syntax Special Case Examples](#syntax-special-case-examples)
178 * [A.5.2 Attribute draft in non-leaf elements](#Attribute_draft_nonLeaf)
192 …* Table: [Part 2 Links](#Part_2_Links): [General](tr35-general.md) (display names & transforms, et…
193 * Table: [Part 3 Links](#Part_3_Links): [Numbers](tr35-numbers.md) (number & currency formatting)
194 * Table: [Part 4 Links](#Part_4_Links): [Dates](tr35-dates.md) (date, time, time zone formatting)
195 …* Table: [Part 5 Links](#Part_5_Links): [Collation](tr35-collation.md) (sorting, searching, groupi…
196 * Table: [Part 6 Links](#Part_6_Links): [Supplemental](tr35-info.md) (supplemental data)
197 * Table: [Part 7 Links](#Part_7_Links): [Keyboards](tr35-keyboards.md) (keyboard mappings)
200 * [1. Multimap interpretation](#1.-multimap-interpretation)
201 * [2. Alias elements](#2.-alias-elements)
202 * [Matches](#3.-matches)
203 * [4. Replacement](#4.-replacement)
204 * [Territory Exception](#territory-exception)
205 * [5. Canonicalizing Syntax](#5.-canonicalizing-syntax)
207 * [Processing LanguageIds](#processing-languageids)
208 * [Processing LocaleIds](#processing-localeids)
218-neutral data, and format that data for the client. This formatting can take place on any of a num…
220-sensitive data used for such formatting, parsing, and analysis. Many of those differences are sim…
222 …ake it as consistent as possible with existing locale data, and acceptable to users in that locale.
224 …ionalization libraries. It also provides a standard format that can allow users to customize the b…
228 …and simplicity of transformation into other formats, above efficiency of run-time lookup and use. …
234 _**UAX35-C1.**_ An implementation that claims conformance to this specification shall:
239 … the characters in such patterns according to [Date Field Symbol Table](tr35-dates.md#Date_Field_S…
243 _**UAX35-C2.**_ An implementation that claims conformance to Unicode locale or language identifiers…
253 …a variant of the Extended Backus-Naur Form (EBNF) notation used in [W3C XML Notation](https://www.…
257 * eg., `[A-Z a-z]` is the same as `[A-Za-z]`
258 3. A backslash may be used to escape a following "x"-prefixed hexadecimal code point or the immedia…
259 * eg., `\x20` is the same as `#x20` and `[\&\-]` is the same as `[#x26#x2D]`
260 4. Constraints (well-formedness or validity) may use separate notes, and/or the W3C notations:
264 In the text, this is sometimes referred to as "EBNF (Perl-based)".
270 …ng of dates, times, numbers, and currencies; for measurement units, for sort-order (collation), pl…
272 …he user's time zone, preferred currency, preferred character set, smoker/non-smoker preference, me…
276-checking, and so on). The format in this document does not attempt to represent all the data that…
286 The BCP 47 extensions (-u- and -t-) are described in _[Unicode BCP 47 U Extension](#u_Extension)_ a…
290 … the following structure (provided in EBNF (Perl-based)). The following table defines syntacticall…
309 …<a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/language.xml">lat…
315 …<a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/script.xml">lates…
321 …<a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/region.xml">lates…
327 …<a href="https://github.com/unicode-org/cldr/blob/maint/maint-41/common/validity/variant.xml">late…
329 <tr><td><code>sep</code></td> <td><pre>= [-_] ;</pre></td></tr>
330 <tr><td><code>digit</code></td> <td><pre>= [0-9] ;</pre></td></tr>
331 <tr><td><code>alpha</code></td> <td><pre>= [A-Z a-z] ;</pre></td></tr>
332 <tr><td><code>alphanum</code></td><td><pre>= [0-9 A-Z a-z] ;</pre></td></tr>
335 The following is an additional well-formedness constraint:
336 …f variant subtags must not have any duplicates (eg, de-1996-fonipa-1996 is not syntactically well-
340 For example, "en-US" (American English), "en_GB" (British English), "es-419" (Latin American Spanis…
344 …d private use extensions are supported for pass-through. The following table defines syntactically…
347--------------------------------------------------------------------------------------------------…
353 …nsions">`other_extensions`</a> | `= sep [alphanum-[tTuUxX]]`<br/>` (se…
355 …validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/mai…
356 …validity`](#Key_Type_Definitions)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/mai…
358 …code_subdivision_subtag_validity)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/mai…
360 …;` | [`validity`](#Validity_Data)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/mai…
362 … [`validity`](#BCP47_T_Extension)<br/>[`latest-data`](https://github.com/unicode-org/cldr/blob/mai…
366 The following are additional well-formedness constraints:
367 …han one extension with the same singleton. For example, en-u-ca-buddhist-u-cf-standard is ill-form…
368 … There cannot be more than one ukey or tkey. For example, en-u-ca-buddhist-ca-islamic is ill-forme…
370 3. [ wfc: The private use extension (-x-) must come after all other extensions. ]
372 …ctions (with few exceptions) as a language identifier, and accesses language-based data. Except wh…
376 … base language code for "en-US" (American English) is "en" (English). The _type_ may also be refer…
378 …in case and in the separator characters. The "-" and "\_" separators are treated as equivalent, al…
380 All identifier field values are case-insensitive. Although case distinctions do not carry any speci…
392 * Any variants are in alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)
393 …ny extensions are in alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)
398 For example, the canonical form of "en-u-foo-bar-nu-thai-ca-buddhist-kk-true" is "en-u-bar-foo-ca-b…
400 … rather than the ordering in [Section 4.1](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1…
402 …* Moreover, [Section 4.5](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.5) states that “If…
406 …e mechanism for determining “importance” is not specified: ca-valencia-fonipa and ca-fonipa-valenc…
409 **Note:** The current version of CLDR data uses some non-preferred _syntax_ for backward compatibil…
412 * It uses "\_" as the separator, while the preferred form of the separator is "-".
419 > _Example:_ the maxmal form of ja-Kana-t-it is ja-Kana-JP-t-it-latn-it
425 > _Example:_ "IW-HEBR-u-ms-imperial" ~ "he-u-ms-uksystem"
435 ….5](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.5) and [Section 3.1.7](https://www.rfc-e…
437 …* A tag must not start with the subtag "x": thus a _privateuse_ (eg x-abc) can only be after a lan…
439 * Certain codes that are private-use in BCP 47 and ISO are given semantics by LDML
440 …(as allowed by BCP 47, see [Section 4.1.2](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.1…
441 * It allows certain syntax for backwards compatibility (not BCP 47-compatible):
442 …eld separator characters, as well as the "-" used in [[BCP47](#BCP47)] (however, the canonical for…
462 | ------------------- | -------------------------------- | -------- |
463 | `en-US` | `en-US` | no changes |
464 | `iw-FX` | `he-FR` | BCP 47 canonicalization |
465 | `cmn-TW` | `zh-TW` | language alias |
466 | `zh-cmn-TW` | `zh-TW` | BCP 47 canonicalization, then language a…
467 | `sr-CS` | `sr-RS` | territory alias |
468 | `sh` | `sr-Latn` | multiple replacement subtags |
469 | `sh-Cyrl` | `sr-Cyrl` | no replacement with multiple replacement…
470 | `hy-SU` | `hy-AM` | multiple territory values <br/>`<territo…
471 | `i-enochian` | `und-x-i-enochian` | prefix any legacy language tags (marked …
472 | `x-abc` | `und-x-abc` | prefix with "und-", so that there is alw…
478 1. Replace the "\_" separators with "-"
485 | ------------------------------ | -------------------- | ---------------------- |
486 | `en_US` | `en-US` | change separator |
487 | `de_DE_u_co_phonebk` | `de-DE-u-co-phonebk` | change separator |
489 | `root_u_cu_usd` | `und-u-cu-usd` | change to "und" |
490 | `Latn_DE` | `und-Latn-DE` | add "und" |
502 | ------------------- | ------------------------------ | -------- |
503 | `en-US` | `en_US` | changes separator |
505 | `und-US` | `und_US` | no change to "und", because a region subta…
506 | `und-u-cu-USD` | `root_u_cu_usd` | changes to "root", because no script, regi…
510 …at least 35 characters, in [Section 4.1.1](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.4…
517-use BCP 47 field values are given specific meanings in CLDR. While field values are based on [[BC…
523 ISO 639-3 introduces the notion of "macrolanguages", where certain ISO 639-1 or ISO 639-2 codes are…
526 | --------------------------- | ----- | ----- |
535 …uage identifiers use "ar-EG" for Standard Arabic (Egypt), not "arb-EG"; they use "zh-TW" for Manda…
539 … overlong codes like "eng-840" or "eng-USA" to the correct code "en-US"; see the **[Aliases](https…
543 | | Name | Comment |
544 | ----- | --------------------- | ------- |
556 | --------- | ----------- |
560 | `zh_Hans` | Chinese, in simplified script (=zh, zh-Hans, zh-CN, zh-Hans-CN) |
565 <!-- HTML: rospan, colspan -->
569 …<td colspan="2">Qaag is a special script code for identifying the non-standard use of Myanmar char…
588 <tr><td>uz-Latn <i>or</i> uz-Arab</td><td>written-only content (particular script)</td></tr>
589 <tr><td>uz-Zyyy</td><td>written-only content (unspecified script)</td></tr>
590 <tr><td>uz-Zxxx</td><td>spoken-only content</td></tr>
591 <tr><td>uz-Latn, uz-Zxxx</td><td>both specific written and spoken content (using a <i>language list…
610 | alpha2 | alpha3 | num | Name | Comment | ISO 3166-1 status |
611 | ------ | ------ | --- | ---------------------------- | ------- | ----------------- |
612 …in Oceania [009] that do not have a [subcontinent](https://unicode-org.github.io/cldr-staging/char…
614 | `UK` | - | - | United Kingdom | **deprecated**: the _canonicalized_ form i…
615 | `XA` | `XAA` | 973 | Pseudo-Accents | special code indicating derived testing lo…
616 | `XB` | `XBB` | 974 | Pseudo-Bidi | special code indicating derived testing lo…
623 …egion codes, including mapping overlong codes like "eng-840" or "eng-USA" to the correct code "en-
628 * The territory code '001' (the World) is used to indicate a standardized form, such as "ar-001" fo…
632 …t tags must not have any duplicates: thus de-1996-fonipa-1996 is invalid, while de-1996-fonipa and…
641 zh-Hant-HK
652 …ilable, or that at some point an input identifier value was determined to be invalid or ill-formed.
655 | ----------- | ------ | ----------------------------------- |
667-letter codes and numeric codes. However, this does not extend to the private use codes, which are…
669 | Region | UN/ISO Numeric | ISO 3-Letter |
670 | -------- | -------------- | ------------ |
679 | ------------ | ---------- |
693 | ------------- | -------- | ----- |
704 | | reserved | bcp47: all non-5 letter codes not starting with x |
705 | | excluded | bcp47: all non-5 letter codes starting with x |
711 These are the codes in [Script Codes](https://www.unicode.org/iso15924/iso15924-codes.html) with th…
719 | --- | --- | --- |
731 | Subsets (approximate) | Jamo | ≃ Hang - LVT - LV |
732 | | Hans | ≃ Hani - Traditional-only |
733 | | Hant | ≃ Hani - Simplified-only |
735 …al codes most frequently used are in the locale identifiers zh-Hans, zh-Hant, ja-Jpan, and ko-Kore.
747 … (that is the canonical casing for these subtags), however, subtags are case-insensitive and casin…
749-u- Extension.** The syntax of 'u' extension subtags is defined by the rule `unicode_locale_extens…
753 _See also [Unicode Extensions for BCP 47](https://cldr.unicode.org/index/bcp47-extension) on the CL…
763 …ot change invalid locales to valid locales. For example, und-u-ka canonicalizes to und-u-ka-true, …
765 1. "und-u-ka-true" — is invalid, since ‘yes’ is not a valid value for ka
766 2. "und-u-ka" — is invalid, since the value “true” is assumed whenever there is no value, and ‘true…
772 <!-- HTML: rowspan, colspan -->
778 …in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/calendar.xml" target=…
779 …This selects calendar-specific data within a locale used for formatting and parsing, such as date/…
781 …ct the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>F…
795 <tr><td>"islamic-civil"</td>
796 …<td>Islamic calendar, tabular (intercalary years [2,5,7,10,13,16,18,21,24,26,29] - civil epoch)</t…
797 <tr><td>"islamic-umalqura"</td>
798 <td>Islamic calendar, Umm al-Qura</td></tr>
800 …formatting data for all Islamic calendar types, including "islamic-civil" and "islamic-umalqura".<…
804 …bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/currency.xml" target="_b…
812 …ues in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
813 … collation setting parameter, from <b>ka</b> to <b>vt</b>, see <a href="tr35-collation.md#Setting_…
817 …">DUCET</a>] (Default Unicode Collation Element Table): see <i><a href="tr35-collation.md#Root_Col…
819 …nguages (and which are different than the root collation behavior); language-specific search colla…
825 …K characters; that is, an ordering for CJK characters based on a character-by-character transliter…
830 …e</i> elements of key name="cu" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
834 …me periods associated with each currency value is available in <a href="tr35-numbers.md#Supplement…
837 …ry Break Exclusion Identifier</a> specifies scripts to be excluded from dictionary-based text break
839 …key name="dx" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/segment…
846 …n. The compound script values are expanded when interpreted, eg, -dx-jpan = -dx-hani-hira-kata</li>
847 … be in any order, eg, -dx-thai-hani = dx-hani-thai. However, the canonical order for the bcp47 sub…
848 …<li>Dictionary-based break iterators will ignore each character whose Script_Extension value set i…
852-Latn-u-em-emoji"&gt;</code>. The valid values are those <i>name</i> attribute values in the <i>ty…
863 … week data for the region (see Part 4 Dates, <a href="tr35-dates.md#Week_Data">Week Data</a>).
865 …of key name="fw" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/cale…
866 …ct the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>F…
880 …(see Part 4 Dates, <a href="tr35-dates.md#Time_Data">Time Data</a>). The valid values are those <i…
881 …key name="hc" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/calenda…
895 …ing to the CSS level 3 <a href="https://drafts.csswg.org/css-text/#line-break-property">line-break…
897 …e</i> elements of key name="lb" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
902 <td>CSS level 3 line-break=strict, e.g. treat CJ as NS</td></tr>
904 … <td>CSS level 3 line-break=normal, e.g. treat CJ as ID, break before hyphens for ja,zh</td></tr>
906 <td>CSS lev 3 line-break=loose</td></tr>
909 …ing to the CSS level 3 <a href="https://drafts.csswg.org/css-text/#word-break-property">word-break…
911 …e</i> elements of key name="lw" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
916 … <td>CSS level 3 word-break=normal, normal script/language behavior for midword breaks</td></tr>
918 …<td>CSS level 3 word-break=break-all, allow midword breaks unless forbidden by lb setting</td></tr>
920 …<td>CSS level 3 word-break=keep-all, prohibit midword breaks except for dictionary breaks</td></tr>
926 …(see Part 2 General, <a href="tr35-general.md#Measurement_System_Data">Measurement System Data</a>…
927 …<i>type</i> elements of key name="ms" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/m…
929 …units and unit conversion, see <a href="tr35-info.md#Unit_Conversion">Unit Conversion</a> and <a h…
942 …bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/common/bcp47/measure.xml" target="_bl…
943 …units and unit conversion, see <a href="tr35-info.md#Unit_Conversion">Unit Conversion</a> and <a h…
954 …ues in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
958-letter types indicating the primary numbering system for the corresponding script represented in …
959 …<p class="note">For more information, see <a href="tr35-numbers.md#Numbering_Systems">Numbering Sy…
961 <td>Extended Arabic-Indic digits ("arab" means the base Arabic-Indic digits)</td></tr>
973 …certain region-specific default values (those specified by the <a href="tr35-info.md#rgScope">&lt;…
978 …ovide more specificity. For example, “en-GB-u-rg-uszzzz” represents a locale for British English b…
980 …ct the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>F…
981 …units and unit conversion, see <a href="tr35-info.md#Unit_Conversion">Unit Conversion</a> and <a h…
989 …otland). Thus “en-GB-u-sd-gbsct” represents the language variant “English as used in Scotland”. An…
990 …ct the computation of the first day of the week: see <a href='tr35-dates.md#first-day-overrides'>F…
994 …e</i> elements of key name="ss" in bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
1002 …ues in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
1010 …ues in the <i>type</i> elements of bcp47/<a href="https://github.com/unicode-org/cldr/blob/main/co…
1024 …latest version of the data see [bcp47/number.xml](https://github.com/unicode-org/cldr/blob/main/co…
1026 … the data see [supplemental/numberingSystems.xml](https://github.com/unicode-org/cldr/blob/main/co…
1041 Although the first two letters of a short identifier may match an ISO 3166 two-letter country code,…
1049 …cial rule applied to the `alias` attribute in the `<type>` element for "tz" - the first "long" ID …
1074 …domain. [common/bcp47/collation.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/commo…
1076 …ored in [common/bcp47/transform.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/commo…
1123 > | ---------- | ---- |
1125 > | codepoint | `= [0-9 A-F a-f]{4,6}` |
1127 > In addition, no codepoint may exceed 10FFFF. For example, "00A0", "300b", "10D40C" and "00C1-00E1…
1129 …61-0061 ("aa") is a Valid type value for "vt", since the sequence may be a collating element. Orde…
1133 > > **en-u-vt-00A4** : this indicates English, with any characters sorting at or below " ¤" (at a p…
1135 …le by default. For more information, see [Collation: _Setting Options_](tr35-collation.md#Setting_…
1139-collation.md#Root_Collation)_. The type "REORDER_CODE" is used for locale extension key "kr" (col…
1147 …y "dx". The value of type for "dx" is represented by one or more SCRIPT_CODEs, such as "thai-laoo".
1162 > | ------------- | ----------- |
1164-umalqura" description="Islamic calendar, Umm al-Qura"/>`<br/>Thus _ca-islamic-umalqura_ is valid.…
1208 …<type name="noignore" alias="non-ignorable" description="Variable collation elements are not reset…
1222 * type "pinyin" is valid for key "co", thus "u-co-pinyin" is a valid Unicode locale extension.
1223 * type "pinyin" is not valid for key "ka", thus "u-ka-pinyin" is not a valid Unicode locale extensi…
1229-u-ca-islamicc" would be equivalent to "ar-u-ca-islamic-civil" on input, but the latter should be …
1240 …d States, or a _province_ in Canada. The codes in CLDR are based on ISO 3166-2 subdivision codes. …
1244 * en-u-sd-**usca**
1245 * en-US-u-sd-**usca**
1247 CLDR has additional subdivision codes. These may start with a 3-digit region code or use a suffix o…
1249-2 (nor have the ISO 3166-2 codes been stable in the past). If an ISO 3166-2 code is removed, it r…
1260 …bdivision_id) starts with the [unicode_region_subtag](#unicode_region_subtag) (case-insensitively).
1266 * en-**US**-u-sd-**us**ca is valid — the region "US" matches the first part of "usca"
1267 * en-u-sd-**us**ca is valid — it still works after adding likely subtags.
1268 * en-**CA**-u-sd-**gb**sct is invalid — the region "CA" does not match the first part of "gbsct". A…
1269-u-sd-**gb**sct is valid but not recommended — an implementation that ignores the [unicode_subdivi…
1278-t- Extension.** The syntax of 't' extension subtags is defined by the rule `transformed_extension…
1280 … (that is the canonical casing for these subtags), however, subtags are case-insensitive and casin…
1282 The following keys are defined for the -t- extension:
1285 | ------ | ----------- | ------------------------ |
1286 …of transformation | [​transform.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/commo…
1287-languages/scripts, such as fullwidth-halfwidth conversion. | [​transform-destination.xml](https:/…
1288-side input method. The first subfield in a sequence would typically be a 'platform' or vendor des…
1289-side virtual keyboard. The first subfield in a sequence would typically be a 'platform' designati…
1290 …r designation. | [​transform_mt.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/commo…
1291-t- value is a language that is mixed into the main language tag to form a hybrid. For more inform…
1292 …orm** | [​transform_private_use.xml](https://github.com/unicode-org/cldr/blob/maint/maint-41/commo…
1316 > The name of the mechanism, limited to 3-8 characters (or sequences of them). Any indirect type na…
1341 | ----------------------------- | ---- |
1346-formed old locale identifier, a special key name _attribute_ with the value of entire `attribute`…
1353 | ------------------------------------------ | ---------------------------- |
1361 …syntax also take the new syntax, interpreted correctly. For example, "zh-TW-u-co-pinyin" and "zh_T…
1362 …aliases for keywords and types. For example, "ar-u-ca-islamicc" would be equivalent to "ar-u-ca-is…
1363 …* The one exception is where an alias would only be well-formed with the old syntax, such as "greg…
1366 1. **well-formed** - syntactically correct
1367 …2. **valid** - well-formed and only uses registered language subtags, extensions, keywords, types.…
1368 3. **canonical** - valid and no deprecated codes or structure.
1377 | ------------ | ----------- |
1381 | `POSIX` | POSIX variation of locale data. Use Unicode locale extension `-u-va-posix` to indi…
1385 When converting to old syntax, the Unicode locale extension "`-u-va-posix`" should be converted to …
1389 * `en_US_POSIX` ↔ `en-US-u-va-posix`
1390 * `en_US_POSIX@colNumeric=yes` ↔ `en-US-u-kn-va-posix`
1391 * `en-US-POSIX-u-kn-true` → `en-US-u-kn-va-posix`
1392 * `en-US-POSIX-u-kn-va-posix` → `en-US-u-kn-va-posix`
1394 > �� Note that the mapping between `en_US_POSIX` and `en-US-u-va-posix` is a conversion process, no…
1400 … a representation of all Unicode characters. The repository is stored in UTF-8, although that can …
1407-demand software components, with arbitrary connections between those components, it is important …
1409 1. Store and transmit _neutral-format_ data wherever possible.
1410 …* Neutral-format data is data that is kept in a standard format, no matter what the local user's e…
1413 2. Localize that data as "_close_" to the end-user as possible.
1415 …e the entire system is. On a practical level, if transmitted data is neutral-format, then it is mu…
1419 Moreover, the closer we are to end-user, the more we know about that user's preferred formats. If w…
1421 Even though localization should be done as close to the end-user as possible, there will be cases w…
1425-us/windows/win32/api/winbase/nf-winbase-formatmessage), [String.Format](https://learn.microsoft.c…
1427-neutral code, such as a numeric error code or mnemonic string key, that is understood outside of …
1429 …within the component; ideally, any exceptions should bundle up some language-neutral message ID, p…
1431-user at all. By avoiding the localization at the throw site, it the cost of doing formatting, whe…
1437-" versus "\_" (for example, _zh-TW_ for language code, _zh_TW_ for locale code), but in practice …
1445-based resource bundles, not in the territory-based resource bundles. Thus, the resource bundle _e…
1449 Criteria for what makes a written language should be purely pragmatic; _what would copy-editors say…
1466 <!-- HTML: no header -->
1472 …ument, and a Spanish document that has some passages quoted in English. Fine-grained tagging doesn…
1474-GB, es-419, etc.). To allow an application to support Spanglish or Hinglish locale selection, [un…
1476users typically expect a their language in non-default script to contain a significant amount of t…
1477 This tends to work better in implementations that don't yet handle the -t- extension.
1482 |-------------------------------|---------------|---------------|----------------------------------…
1483 |hi-t-***en-h0-hybrid*** | Deva | Hinglish | Hindi-English hybrid where the script is Devanagari\*…
1484 |hi-Latn-t-***en-h0-hybrid*** | Latin | Hinglish | Hindi-English hybrid where the script is Latin\…
1485 |hi-Latn | Latin | Hinglish | Hindi written in Latin script; in practice usually a hybrid with E…
1486 |ta-t-***en-h0-hybrid*** | Tamil | Tanglish | Tamil-English hybrid where the script is Tamil\* …
1488 |en-t-***hi-h0-hybrid*** | Latin | Hinglish | English-Hindi hybrid where the script is Latin\* |
1489 |en-t-***zh-h0-hybrid*** | Latin | Chinglish | English-Chinese hybrid where the script is Latin\* …
1492 …ertain passages or phrases that are in the other script but are not tagged on a fine-grained level.
1494-t-en-h0-hybrid (when written in Devanagari script) or hi-Latn-t-en-h0-hybrid (when written in Lat…
1496-t- is a full _[unicode_language_id](#unicode_language_id)_, and can contain a subtag for the regi…
1498 …d subtags. Thus the default script for 'ru' is “Cyrl”, no matter what the source is in the -t- tag.
1501 |-------------------------------|---------------|---------------|----------------------------------…
1502 |ru-t-***en***-h0-hybrid | Cyrillic | Runglish | Russian with an admixture of ***American English**…
1503 |ru-t-***en-gb***-h0-hybrid | Cyrillic | Runglish | Russian with an admixture of ***British English…
1504 |ru-***Latn***-t-en-gb-h0-hybrid| Latin | Runglish | Russian with an admixture of British English …
1505 |en-t-***zh-h0-hybrid*** | Latin | Chinglish | American English with an admixture of ***Chinese (S…
1506 |en-t-***zh-hant-h0-hybrid*** | Latin | Chinglish | American English with an admixture of ***Chine…
1519 …tory [common/validity](https://github.com/unicode-org/cldr/blob/main/common/validity/) contains ma…
1525 …* Note that some two-letter region codes are macroregions, and (in the future) some three-digit co…
1528 * **private_use** — codes that, for CLDR, are considered private use. Note that some private-use co…
1531 The list of subtags for each idStatus use a compact format as a space-delimited list of StringRange…
1533 Each measure unit is a sequence of subtags, such as “angle-arc-minute”. The first subtag provides a…
1541 …er in the root is based on the [[DUCET](#DUCET)] (see _[Root Collation](tr35-collation.md#Root_Col…
1571 …s per [5. Canonicalizing Syntax](#5-canonicalizing-syntax) in [Annex C. LocaleId Canonicalization]…
1575-prone) maintenance. At the script or region level, the "primary" child locale will be empty, sinc…
1579 * The currency for the specified region (see [Supplemental Currency Data](tr35-numbers.md#Supplemen…
1580 * The measurement system for the specified region (see [Measurement System Data](tr35-general.md#Me…
1581 * The week conventions for the specified region (see [Week Data](tr35-dates.md#Week_Data))
1583-Based Preferences](tr35-info.md#Territory_Based_Preferences).) These items will be correct for th…
1602 …t, processes. They are illustrated in the table [Lookup Differences](#Lookup-Differences), where "…
1604 The table [Lookup Differences](#Lookup-Differences) uses the naïve resource bundle lookup for illus…
1606 …ders identical are treated as such. Thus eng-Latn-GB should be mapped to en-GB, and cmn-TW mapped …
1608 …zh-Hant-TW` should start lookup at `zh-TW` (since `zh-TW` implies `Hant`), and `de-Latn-LI` should…
1610 …a `<unit>` resource bundle would be empty. However, for purposes of resource-bundle lookup the res…
1612 ###### Table: <a name="Lookup-Differences" href="#Lookup-Differences">Lookup Differences</a>
1615 <!-- HTML: readability -->
1626 se-FI → <br/>
1631 …<td><p>* The default-locale may have its own inheritance change; for example, it may be "en-GB → e…
1633 se-FI → <br/>
1636 <i>en-GB →</i> <br/>
1644 se-FI+key → <br/>
1649 …months. In that case, the root alias would change the key, and retry from se-FI downward. This can…
1651 se-FI+key → <br/>
1654 <i>se-FI+key2 →</i> <br/>
1665-locale were used in the resource-item lookup, then strange results will occur. For example, suppo…
1673 …rplay of source and target locales: see _Part 2 General, [Inheritance.](tr35-general.md#Inheritanc…
1679 … output bundle are more tolerant, when represent overall improvements for users. For more informat…
1693 <relative type="-2">∅∅∅</relative>
1702 | ---------------- | ------ | ------- |
1714 | ---------- | -------------------------------------- | --------------------------- |
1721 …ded in the future if necessary. See also [Part 2, Grammatical Features](tr35-general.md#Grammatica…
1770 | ------ | ---- |
1771 | fr-CA | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="x"]…
1772 | fr-CA | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="oth…
1773 | fr | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="x"]…
1774 | fr | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="oth…
1775 | root | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="x"]…
1776 | root | `//ldml/units/unitLength[@type="narrow"]/unit[@type="mass-gram"]/unitPattern[@count="oth…
1789 | ------ | ---- |
1790 | fr-CA | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="x"]` |
1791 | fr-CA | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName[@count="other"]` |
1792 | fr-CA | `//ldml/numbers/currencies/currency[@type="CAD"]/displayName` |
1877 Note: When components were first introduced, the component-specific parent locales were be merged w…
1878 This was determined to be an error, and the component-specific parent locales are now not merged,
1879 but instead are treated as stand-alone.
1882 the parentLocale information is contained in CLDR’s [supplemental data.](tr35-info.md)
1890 There may be specific exceptions to these for certain closely-related languages or language-script …
1900 #### <a name="Region_Priority_Inheritance" href="#Region_Priority_Inheritance">Region-Priority Inhe…
1902 …, and measurement system. All resources matched by an entry in <a href="tr35-info.md#rgScope">&lt;…
1904 The default search chain for region-priority inheritance removes the language subtag before the reg…
1913 Equivalently as BCP-47:
1916 en-US-variant
1917 en-US
1918 und-US
1922 Before running region-priority inheritance, the locale should be normalized as follows:
1924-u-rg` Unicode BCP-47 locale extension, the region subtag should be set to the `-u-rg` region. For…
1925 …n from likely subtags data. For example, `en` should become `en-US` before running region-priority…
1927 Note that region-priority inheritance does not currently make use of parent locales or territory co…
1946 Non-distinguishing attributes are identified by [DTD Annotations](#DTD_Annotations) such as `@VALUE…
1948 …er. So in, say, [https://github.com/unicode-org/cldr/blob/main/common/main/el.xml](https://github.…
1972 …lement chain, data>, where the element chains are all the chains for the end-nodes. (This works be…
2085 <!-- HTML: rowspan -->
2087 … <th>Lookup in Locale</th> <th>For</th> <th>Comment</th></tr>
2107 Examples of "search" collator lookup; 'de' has a language-specific version, but 'en' does not:
2109 <!-- HTML: rowspan -->
2111 … <th>Lookup in Locale</th> <th>For</th> <th>Comment</th></tr>
2122 * All of the Chinese-specific collation types are provided in the 'zh' locale
2125 <!-- HTML: rowspan -->
2127 … <th>Lookup in Locale</th> <th>For</th> <th>Comment</th></tr>
2147 <!-- no 'a' -->
2157 <!-- HTML: rowspan, colspan, col th -->
2173 <td>addLikelySubtags(sr-ME) ⇒ sr-Latn-ME, minimize(de-Latn-DE) ⇒ de</td></tr>
2177 …><td><b>Part 6: Section 9.3&nbsp;<a href="tr35-info.md#Default_Content">Default Content</a></b></t…
2182 <td>addLikelySubtags(zh) ⇒ zh-Hans-CN<br/>
2183 addLikelySubtags(zh-TW) ⇒ zh-Hant-TW<br/>
2184 addLikelySubtags(zh-Hant) ⇒ zh-Hant-TW<br/>
2185 minimize(zh-Hans-CN, favorRegion|favorScript) ⇒ zh<br/>
2186 minimize(zh-Hant-TW, favorRegion) ⇒ zh-TW<br/>
2187 minimize(zh-Hant-TW, favorScript) ⇒ zh-Hant
2197 <td>bestLocale(userLangs=&lt;en, fr&gt;, appLangs=&lt;fr-CA, ru&gt;) ⇒ fr-CA</td></tr>
2217 … is based on the default content data, the population data, and the suppress-script data in [[BCP4…
2256 …5. Get the components of the cleaned-up source tag _(language<sub>s</sub>, script<sub>s</sub>,_ an…
2276 One by-product of this algorithm is that an element such as `<likelySubtag from="fr_IR "to="en_Arab…
2283 * Input is ZH-ZZZZ-SG.
2290 … goal of the algorithm is that non-empty field present in the 'from' field is also present in the …
2325 …is variant much less commonly used, only when the script relationship is more significant to users.
2356-JP, de, zh-TW}. If the user understands written American English, German, French, Swiss German, a…
2358 The standard truncation-fallback algorithm does not work well when faced with the complexities of n…
2361 sr-Cyrl-RS
2362 sr-Cyrl
2363 sr-Latn-RS
2364 sr-Latn
2366 hr-Latn
2370 …r's list of preferred languages in the OS Settings, or from a browser Accept-Language list. For ex…
2376 …rces _within the locale-parent chain_. For example, suppose that we are looking for the value for …
2378 > **nb-NO** → **nb** → **root**
2380-NO**. If there is no value for **P** there, then we look in **nb**. If there is no value for **P*…
2382 However, suppose that **nb-NO** has the fallback values **[nn da sv en]**, derived from language ma…
2385 value = lookup(P, nb-NO); if (locationFound != root) return value;
2386 value = lookup(P, nn-NO); if (locationFound != root) return value;
2387 value = lookup(P, da-NO); if (locationFound != root) return value;
2388 value = lookup(P, sv-NO); if (locationFound != root) return value;
2389 value = lookup(P, en-NO); return value;
2392 …k list are not used recursively. For example, for the lookup of a path in nb-NO, if **fr** were a …
2396-GB can be greater than those between en-GB and en-IE. In some cases, language and/or script diffe…
2416 …* The threshold is implementation-defined, typically set to greater than a default region differen…
2423 2. Set the match-distance MD to 0
2431 * This used to be a `percent` attribute value, which was 100 - the `distance` attribute value.
2439 _User's desired languages:_ "de-AT, fr"
2441-off between the user's languages is substantially greater than regional variants. But unless F is…
2453-DE and nb-FR are being compared. They are first maximized to nn-Latn-DE and nb-Latn-FR, respectiv…
2460 <languageMatch desired="es-*-ES" supported="es-*-ES" percent="100" />
2461 <!-- Latin American Spanishes are closer to each other. Approximate by having es-ES be further from…
2463 <languageMatch desired="es-*-ES" supported="es-*-*" percent="93" />
2466 <!-- [Default value - must be at end!] Normally there is no comprehension of different languages. -…
2468 <languageMatch desired="*-*" supported="*-*" percent="20" />
2469 <!-- [Default value - must be at end!] Normally there is little comprehension of different scripts.…
2471 <languageMatch desired="*-*-*" supported="*-*-*" percent="96" />
2472 <!-- [Default value - must be at end!] Normally there are small differences across regions. -->
2475users. Looking for en-SK, for example, should fall back to something within Europe (eg en-GB) in p…
2485 <paradigmLocales locales="en en-GB es es-419 pt-BR pt-PT" />
2490 <languageMatch desired="no" supported="nb" distance="1" /><!-- no ⇒ nb -->
2493 <!-- ar; *; $maghreb ⇒ ar; *; $maghreb -->
2495 <!-- ar; *; $!maghreb ⇒ ar; *; $!maghreb -->
2499- for _set difference_, but no precedence. So A+B-A+D is interpreted as (((A+B)-A)+D), not as (A+B…
2501 …r all of the Americas in the variables above, because en-US should be in the same cluster as es-41…
2507-SA] against [en-GU en en-IN en-GB], the value en-GB is returned. Both of \{en-GU en} are in a dif…
2509-419] should match to \{es-MX} more closely than to \{es}, and vice versa: \{es-MX} should match m…
2515 There are two kinds of data that can be expressed in LDML: language-dependent data and supplementar…
2517 For example, the language-dependent data for Japanese in CLDR is present in the following files:
2546 * common/transforms/Hiragana-Katakana.xml
2547 * common/transforms/Hiragana-Latin.xml
2551 * /keyboards/chromeos/ja-t-k0-chromeos.xml
2554 …rresponds to the file name, such as `<keyboard locale="af-t-k0-android">` for the file af-t-k0-and…
2556 The following sections describe the structure of the XML format for language-dependent data. The mo…
2568 …rmat is in element contents, while attributes are reserved for types and non-translated informatio…
2574 1. There is no ["mixed" content](https://www.w3.org/TR/xml/#sec-mixed-content): if an element has …
2581 <!-- Not correct LDML -->
2582 <unit type="duration-day"
2583 displayName="days"> <!-- #3: @VALUE attribute AND children -->
2584 {0} per day <!-- #1: Mixed content -->
2585 <unitPattern>{0} day</unitPattern> <!-- #2 same XPath /unit[@type="duration-day"]/unitPattern -->
2586 <unitPattern>{0} days</unitPattern> <!-- #2 same XPath /unit[@type="duration-day"]/unitPattern -->
2593 <unit type="duration-day"> <!-- OK: "type" is distinguishing -->
2595 <unitPattern count="one">{0} day</unitPattern> <!-- "count" is distinguishing -->
2597 <perUnitPattern>{0} per day</perUnitPattern> <!-- mixed content in an element -->
2601 …idity)_. For more technical details, see [Updating-DTDs](https://cldr.unicode.org/development/upda…
2617 Lists, such as singleCountries are space-delimited. That means that they are separated by one or mo…
2629 …ta that is product-specific. It has one required attribute `xmlns`, which specifies the XML [names…
2639 <!-- old abbreviations for pre-GUI days -->
2652 …e 1.0 specification. Instead, they are special elements used for application-specific data to be s…
2662 <?xml version="1.0" encoding="UTF-8" ?>
2673 <?xml version="1.0" encoding="UTF-8" ?>
2679-std.org/jtc1/sc22/wg20/docs/n897-14652w25.pdf) compatibility data. That element has been withdraw…
2734 <!-- HTML: multiline, readability -->
2811 …For example, looking up Thai buddhist abbreviated months for the locale **xx-YY** may result in th…
2815 > xx-YY → xx → root // finds alias that changes path to:
2819 > xx-YY → xx → root // finds alias that changes path to:
2823 > xx-YY → xx // finds value here
2830 Many elements can have a display name. This is a translated name that can be presented to users whe…
2867 …raft` attribute). This does not mean that the data is guaranteed to be error-free—this is the best…
2882 * `variantname-proposed`, optionally followed by a number, indicating that the value is a proposed …
2921 …splay names whose capitalization differed from what was indicated by the now-deprecated `<inText>`…
2929 …`year`, `month`, `day`, `hour`, `minute`, and `second` in that order, with "-" used as a separator…
2939 <usesMetazone from="1991-10-27" to="2006-04-02" .../>
2940 <usesMetazone from="1991-10-27 00:00:00" to="2006-04-02 24:00:00" .../>
2941 <usesMetazone from="1991-10-26 24:00:00" to="2006-04-03 00:00:00" .../>
2950 … of several sub-elements with an inherent order (for example, the year, month, and day for dates).…
2952 … or end of the element content. In such a case, the overall order of the sub-elements may change d…
2954 …uld include an explicit direction mark, such as U+200E LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT …
2965 … the [online UnicodeSet utilities](https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp…
2973 | -------------- | -------------------------------------------------------------- | ---------------…
2974 | `unicodeSet` | <pre>= prop<br/>\| '\[' '^'? s '-'? s seq\* \[\\$ \\-\]? s '\]' <br/>\| va…
2975 …eq` | <pre>= unicodeSet \(s \[\\&\\\-\] s unicodeSet\)\* s<br/>\| range s</pre> | …
2976 | `range` | <pre>= element \('\-' element\)? | a, a\-c, \{abc\}, a\-\{z\} <br/> _note…
2979 | `propName` | <pre>= s \[A\-Za\-z0\-9\] \[A\-Za\-z0\-9\_\\x20\]\* s</pre> | Gen…
2983 | `char` | <pre>= \[^ \\^ \\& \\\- \\\[ \\\] \\\\ \\\{ \\$ \[:Pat_WS:\]\]<br/>\| '\\' quote…
2985 …\} \| '10' hex\{4\}\)<br/>\| 'N\{' charName '\}'<br/>\| \[\[\\u0000\-\\U00010FFFF\]\-\[uxUN\]\]</p…
2986 | `charName` | <pre>= s \[A\-Za\-z0\-9\] \[\-A\-Za\-z0\-9\_\\x20\]\* s</pre> | T…
2989 | `hex` | <pre>= \[0\-9A\-Fa\-f\]</pre> | …
2994 The following are additional well-formedness and validity constraints:
2995-**Y**) are only well-formed in the case that elements **X** and **Y** resolve to single code poin…
3000 …]. Thus **\[:di:\]** is a valid property expression, **\[di:\]** is a 3 code-point set, and **\[:d…
3006 | ---- | ------ | -------------------- | ------------------------------------------ |
3009 | - | U+002D | HYPHEN-MINUS | Ranges of characters; also set difference. |
3010 | : | U+003A | COLON | POSIX-style property syntax |
3020 …tespace have a special meaning inside strings (**\[\{\[a-z\]\}\]** is the set of the string '\[a-z…
3021 …ng at the very end of a set with or without trailing whitespace (**[a-z\$]**, **[a-z\$ ]**), and u…
3022- is equivalent to the literal character \\- when occuring at the very beginning of a set, after a…
3029 | - | - | - |
3035 | **\[-\]** | '-'. | |
3036 | **\[ - \]** | '-' | |
3037 | **\[a-\]**, **\[-a\]** | 'a' and '-' | |
3038 | **\[a -b\]** | All code points between 'a' and 'b' (inclusive) | |
3039 | **\[\[a-b\] -\[b\]\]**, **\[\[a\]-\[b\]-\[c\]\]** | 'a' | **\[a-b-c\]** |
3040 | **\[^ - \]** | All Unicode code points except '-' | **\[ ^ - \]** |
3050 | **\[\{\[a-z\}\]**, **\[\{ \[ a - z\}\]** | the string '\[a-z' | |
3052 | **\[\\x\{61\}-d\]** | 'a', 'b', 'c', and 'd' | **\[\\x\{61 63\}-d\]**, **\[\\x\{61 63\}-\\x\{62 …
3062-' between two code points, as in "a-z". The sequence _start-end_ specifies the range of all code …
3064-z \{ch}]**. It can be used with the range notation, with the restriction that each string contain…
3070 | --------------- | ------------------------------------ |
3071 | \\x\{h...h}<br/>\\u\{h...h} | list of 1-6 hex digits ([0-9A-Fa-f]), separated by spaces |
3075 | \\a | U+0007 (BEL / ALERT) |
3094 …y the addition of `"=<value>"`. For example, you can match letters by using the POSIX-style syntax:
3098 or by using the Perl-style syntax
3102-insensitive, and whitespace, "-", and "\_" are ignored. The property name can be omitted for the …
3107 | ------------------ | ---------------- | ----------------- |
3108 | POSIX-style Syntax | [:type=value:] | [:^type=value:] |
3109 | Perl-style Syntax | \\p\{type=value} | \\P\{type=value} |
3113 The low-level lists or properties then can be freely combined with the normal set operations (union…
3116 * To intersect two sets, use the '&' operator. For example, **[[:letter:] & [a-z]]**
3117 * To take the set-difference of two sets, use the '-' operator. For example, **[[:letter:] - [a-z]]…
3118 …^a-z]**. In any other location, the '\^' does not have a special meaning. The inversion [\^X] is e…
3121-', and the implicit union have equal precedence and bind left-to-right. Thus **[[:letter:]-[a-z]-
3123-' operators operate between sets. That is, they must be immediately preceded and immediately foll…
3128 They are used in certain contexts such as in [Transforms](tr35-general.md#Transforms).
3131 …characters. If variable support is enabled, variables must be defined (out-of-scope for UnicodeSet…
3134 For instance, consider **[$a-$b]**; this may be a range of characters if both **$a** and **$b** are…
3138 but also to parse the UnicodeSet syntax: if **$a** and **$b** were unknown, the parsing of **[$a-$b…
3140 …is, **[a \$minus z]** with a variable map `{ minus => '-' }` is equivalent to **[-az]**, not **[a-
3143 The variable syntax implements UAX31-R1-2 with XID_Start and XID_Continue. For more information, se…
3144 … identifiers with Normalization Form C, implementing UAX31-R4. Furthermore, variables are case-sen…
3149 Thus \[\$a\-\$b\] can resolve whether \$a and \$b are chars/strings (eg, \$a=δ, \$b=θ) or full Unic…
3158 …$ character may only appear by itself at the end of a UnicodeSet, e.g., **[a-z\$]**, where it keep…
3167 | -------------------- | ----------- |
3169 | [a-z] | The set containing 'a' through 'z' and all letters in between, in Unicode …
3170 | [^a-z] | The set containing all code points but 'a' through 'z'.<br/>Thus it is the…
3173 | [[pat1]-[pat2]] | The asymmetric difference of sets specified by pat1 and pat2 |
3174 | [a \{ab} \{ac}] | The code point 'a' and the multi-code point strings "ab" and "ac" |
3190 * for UnicodeSet the separator is -, and any multi-codepoint string is enclosed in {…}.
3196 > _There may be additional, domain-specific requirements for validity of the expansion of the strin…
3207 <!-- HTML: no th -->
3209 <tr><td>ab-ad</td><td>→</td><td>ab ac ad</td></tr>
3210 <tr><td>ab-d</td><td>→</td><td>ab ac ad</td></tr>
3211 <tr><td>ab-cd</td><td>→</td><td>ab ac ad bb bc bd cb cc cd</td></tr>
3212 <tr><td>����-����</td><td>→</td><td>���� ���� ���� ���� ����</td></tr>
3213 <tr><td>����-��</td><td>→</td><td>���� ���� ���� ���� ����</td></tr>
3300 [XML](https://www.w3.org/TR/REC-xml/) files can have a wide variation in textual form, while repres…
3348 … and then if the element names are identical, by the sorted set of attribute-value pairs. For the …
3356 1. Comments are of the form `<!-- stuff -->`.
3361 4. Final comment, after `</ldml>`
3362 3. Multiline comments (except the final comment) have each line after the first indented to one dee…
3368 <era type="0">BC</era> <!-- might add alternate BDE in the future -->
3371 <!-- Note: zones that do not use daylight time need further work -->
3374 <!-- Note: the following is known to be sparse,
3375 and needs to be improved in the future -->
3384 <!--@...-->
3390 | ---------------------| ----------- |
3391 | `<!--@VALUE-->` | The attribute is not distinguishing, and is treated like an element value |
3392 | `<!--@METADATA-->` | The attribute is a “comment” on the data, like the draft status. It is not…
3393 | `<!--@ALLOWS_UESC-->` | The attribute value can be escaped using the `\u` notation. Does not re…
3394 | `<!--@ORDERED-->` | The element's children are ordered, and do not inherit. |
3395 | `<!--@DEPRECATED-->` | The element or attribute is deprecated, and should not be used. |
3396 | `<!--@DEPRECATED: attribute-value1, attribute-value2-->` | The attribute values are deprecated, a…
3397 | `<!--@MATCH:{attribute value constraint}-->` | Requires the attribute value to match the constrai…
3398 | `<!--@TECHPREVIEW-->` | The element is a technical preview of a feature and may be changed or rem…
3413 | ------------------------- | -------- |
3423 | time/\{time or date or date-time pattern} | eg HH:mm |
3425 | validity/\{field} | currency, language, locale, region, script, subdivision, short-unit, …
3434 Some data in CLDR does not use an XML format, but rather a semicolon-delimited format derived from …
3448 This file was used to define the ExtendedPictographic data used for “future-proofing” emoji behavio…
3454 …s of labels to characters that may be useful to implementations of character-picking applications.…
3483 * Map all characters in [:Dash:] to U+002D HYPHEN-MINUS
3484 …a in the `<character-fallback>` element to map equivalent characters (for example, curly to straig…
3489 * Apply case folding (possibly including language-specific mappings such as Turkish i)
3490 * Normalize to NFKC; thus _no-break space_ will map to _space_; half-width _katakana_ will map to f…
3492 …tics below. For example, if the input number text is " - NA f. 1,000.00", then it is mapped to "-n…
3498 * For a field using a currently-invalid length for a valid pattern character:
3501 …For a pattern that contains a currently-invalid pattern character (applies only to date patterns, …
3521 [Supplemental Territory Information](tr35-info.md#Supplemental_Territory_Information),
3523 (For a human-readable chart, see [Territory-Language Information](https://unicode-org.github.io/cld…
3530 …ocales at Basic coverage in [Unicode CLDR - Coverage Levels](https://cldr.unicode.org/index/cldr-s…
3539 Locales may also have data on a field-by-field basis that is reasonable to filter out.
3545 The easiest way to do that is to use the CLDR Java tooling (the `cldr-code` package) to filter the …
3569 The CLDR [DTD Deltas](https://unicode-org.github.io/cldr-staging/charts/latest/supplemental/dtd_del…
3590 …endar type for a locale is now specified by _[Calendar Preference Data](tr35-dates.md#Calendar_Pre…
3598 …ibute_draft_nonLeaf" href="#Attribute_draft_nonLeaf">A.5.2 Attribute draft in non-leaf elements</a>
3608 Instead use the basic collation syntax with the [`<cr>` element](tr35-collation.md#Rules).
3621 …should be located just under a `<dates>` element. See [Calendar Fields](tr35-dates.md#Calendar_Fie…
3626 * `<singleCountries>`:use [Primary Zones](tr35-dates.md#Primary_Zones)
3635 …ute are documented in [`<contextTransformUsage>` type attribute values](tr35-general.md#contextTra…
3639 * `displayName-count` was renamed to `currencyName-count`
3651 | ---------- | -------------- |
3674 ###### Table: <a name="Part_2_Links" href="#Part_2_Links">Part 2 Links</a>: [General](tr35-general.…
3677 | -------------------------------------------------------------------------------------------------…
3678 …">Display Name Elements</a> | 1 [Display Name Elements](tr35-general.md#Display_Na…
3679 …ut Elements</a> | 2 [Layout Elements](tr35-general.md#Layout_Ele…
3680 …haracter Elements</a> | 3 [Character Elements](tr35-general.md#Character_…
3681 …ar Syntax</a> | 3.1 [Exemplar Syntax](tr35-general.md#ExemplarSy…
3682 … | 3.1 [Exemplar Syntax](tr35-general.md#ExemplarSy…
3683 … | 3.2 [Mapping](tr35-general.md#Character_…
3684 …els</a> | 3.3 [Index Labels](tr35-general.md#IndexLabel…
3685 … | 3.4 [Ellipsis](tr35-general.md#Ellipsis) |
3686 … | 3.5 [More Information](tr35-general.md#Character_…
3687 …elimiter Elements</a> | 4 [Delimiter Elements](tr35-general.md#Delimiter_…
3688 …ta">Measurement System Data</a> | 5 [Measurement System Data](tr35-general.md#Measuremen…
3689 …ements (deprecated)</a> | 5.1 [Measurement Elements (deprecated)](tr35-general.md#Measuremen…
3690 …Elements</a> | 6 [Unit Elements](tr35-general.md#Unit_Eleme…
3691 …X Elements</a> | 7 [POSIX Elements](tr35-general.md#POSIX_Elem…
3692 …>Reference Element</a> | 8 [Reference Element](tr35-general.md#Reference_…
3693 …ntations</a> | 9 [Segmentations](tr35-general.md#Segmentati…
3694 …ance">Segmentation Inheritance</a> | 9.1 [Segmentation Inheritance](tr35-general.md#Segmentati…
3695 …s</a> | 10 [Transforms](tr35-general.md#Transforms…
3696 …/a> | 10.3 [Transform Rules Syntax](tr35-general.md#Transform_…
3697 …terns</a> | 11 [List Patterns](tr35-general.md#ListPatter…
3698 …s</a> | 11.1 [Gender of Lists](tr35-general.md#List_Gende…
3699 …ements">ContextTransform Elements</a> | 12 [ContextTransform Elements](tr35-general.md#Context_Tr…
3701 ###### Table: <a name="Part_3_Links" href="#Part_3_Links">Part 3 Links</a>: [Numbers](tr35-numbers.…
3704 | -------------------------------------------------------------------------------------------------…
3705 …ng Systems</a> | 1 [Numbering Systems](tr35-numbers.md#Numbering_…
3706 …ements</a> | 2 [Number Elements](tr35-numbers.md#Number_Ele…
3707 …bols</a> | 2.3 [Number Symbols](tr35-numbers.md#Number_Sym…
3708 …r Format Patterns</a> | 3 [Number Format Patterns](tr35-numbers.md#Number_For…
3709 …a> | 4 [Currencies](tr35-numbers.md#Currencies…
3710 …upplemental Currency Data</a> | 4.1 [Supplemental Currency Data](tr35-numbers.md#Supplement…
3711 …guage Plural Rules</a> | 5 [Language Plural Rules](tr35-numbers.md#Language_P…
3712-Based_Number_Formatting" href="#Rule-Based_Number_Formatting">Rule-Based Number Formatting</a> | …
3714 ###### Table: <a name="Part_4_Links" href="#Part_4_Links">Part 4 Links</a>: [Dates](tr35-dates.md) …
3717 | -------------------------------------------------------------------------------------------------…
3718 …1 [Overview: Dates Element, Supplemental Date and Calendar Information](tr35-dates.md#Overview_Dat…
3719 …/a> | 2 [Calendar Elements](tr35-dates.md#Calendar_Ele…
3720 …s, eras</a> | 2.1 [Elements months, days, quarters, eras](tr35-dates.md#months_days_…
3721 …yclicNameSets</a> | 2.2 [Elements monthPatterns, cyclicNameSets](tr35-dates.md#monthPattern…
3722 … | 2.3 [Element dayPeriods](tr35-dates.md#dayPeriods) |
3723 … | 2.4 [Element dateFormats](tr35-dates.md#dateFormats)…
3724 … | 2.5 [Element timeFormats](tr35-dates.md#timeFormats)…
3725 … | 2.6 [Element dateTimeFormats](tr35-dates.md#dateTimeForm…
3726 … | 3 [Calendar Fields](tr35-dates.md#Calendar_Fie…
3727 … | 5 [Time Zone Names](tr35-dates.md#Time_Zone_Na…
3728 …l Calendar Data</a> | 4 [Supplemental Calendar Data](tr35-dates.md#Supplemental…
3729 … Time Zone Data</a> | 6 [Supplemental Time Zone Data](tr35-dates.md#Supplemental…
3730 …rence Data</a> | 4.2 [Calendar Preference Data](tr35-dates.md#Calendar_Pre…
3731 … | 4.5 [Day Period Rules](tr35-dates.md#Day_Period_R…
3732 …at Patterns</a> | 8 [Date Format Patterns](tr35-dates.md#Date_Format_…
3733 …l Table</a> | [Date Field Symbol Table](tr35-dates.md#Date_Field_S…
3734 …ters (deprecated)</a> | 8.1 [Localized Pattern Characters (deprecated)](tr35-dates.md#Localized_Pa…
3735 …lay Names</a> | 7 [Using Time Zone Names](tr35-dates.md#Using_Time_Z…
3736 … | [**fallbackFormat**:](tr35-dates.md#fallbackForm…
3737 … | 9 [Parsing Dates and Times](tr35-dates.md#Parsing_Date…
3739 ###### Table: <a name="Part_5_Links" href="#Part_5_Links">Part 5 Links</a>: [Collation](tr35-collat…
3742 | -------------------------------------------------------------------------------------------------…
3743 … | 3 [Collation Tailorings](tr35-collation.md#Collatio…
3744 … | 3.1 [Version](tr35-collation.md#Collatio…
3745 … | 3.2 [Collation Element](tr35-collation.md#Collatio…
3746 … | 3.3 [Setting Options](tr35-collation.md#Setting_…
3747 … | Table [Collation Settings](tr35-collation.md#Collatio…
3748 … | 3.4 [Collation Rule Syntax](tr35-collation.md#Rules) |
3749 … | 3.5 [Orderings](tr35-collation.md#Ordering…
3750 … | 3.6 [Contractions](tr35-collation.md#Contract…
3751 … | 3.7 [Expansions](tr35-collation.md#Expansio…
3752 … | 3.8 [Context Before](tr35-collation.md#Context_…
3753 …g Characters Before Others</a> | 3.9 [Placing Characters Before Others](tr35-collation.md#Placing_…
3754 …ositions</a> | 3.10 [Logical Reset Positions](tr35-collation.md#Logical_…
3755 …#Special_Purpose_Commands">Special-Purpose Commands</a> | 3.11 [Special-Pur…
3756 … | 3.12 [Collation Reordering](tr35-collation.md#Script_R…
3757 … | 3.13 [Case Parameters](tr35-collation.md#Case_Par…
3758 … | removed: see 3.13 [Case Parameters](tr35-collation.md#Case_Par…
3759 … | removed: see 3.13 [Case Parameters](tr35-collation.md#Case_Par…
3760 … | removed: see 3.13 [Case Parameters](tr35-collation.md#Case_Par…
3761 … | 3.14 [Visibility](tr35-collation.md#Visibili…
3763 ###### Table: <a name="Part_6_Links" href="#Part_6_Links">Part 6 Links</a>: [Supplemental](tr35-inf…
3766--------------------------------------------------------------------------------------------------…
3767 … | Introduction [Supplemental Data](tr35-info.md#Supplemental_…
3768 …ritory Containment</a> | 1.1 [Supplemental Territory Containment](tr35-info.md#Supplemental_…
3769 …ritory Information</a> | 1.2 [Supplemental Territory Information](tr35-info.md#Supplemental_…
3770 …Data</a> | 2 [Supplemental Language Data](tr35-info.md#Supplemental_…
3771 …ng</a> | 4 [Supplemental Code Mapping](tr35-info.md#Supplemental_…
3772 … | 5 [Telephone Code Data](tr35-info.md#Telephone_Cod…
3773 …> | 6 [Postal Code Validation](tr35-info.md#Postal_Code_V…
3774 … Character Fallback Data</a> | 7 [Supplemental Character Fallback Data](tr35-info.md#Supplemental_…
3775 … | 8 [Coverage Levels](tr35-info.md#Coverage_Leve…
3776 …tr35-info.md#Metadata_Elements) …
3777 …](tr35-info.md#Appendix_Supplemental_Metadata) …
3778 …](tr35-info.md#Supplemental_Alias_Information) …
3779 …](tr35-info.md#Supplemental_Deprecated_Information) |…
3780 …](tr35-info.md#Default_Content) …
3782 ###### Table: <a name="Part_7_Links" href="#Part_7_Links">Part 7 Links</a>: [Keyboards](tr35-keyboa…
3784 [Part 7](tr35-keyboards.md) has been extensively rewritten. The prior link anchors within this file…
3792 > Note: in the following discussion, the separator '-' is used. That is also used in examples of XM…
3802 #### <a name="1.-multimap-interpretation" href="#1.-multimap-interpretation">1. Multimap interpreta…
3809 |---------------------------|----------|--------|--------|-------------------|
3810 | en-GB | {en} | {} | {GB} | {} |
3811 | und-GB | {} | {} | {GB} | {} |
3812 | ja-Latn-YU-hepburn-heploc | {ja} | {Latn} | {YU} | {hepburn, heploc} |
3818 #### <a name="2.-alias-elements" href="#2.-alias-elements">2. Alias elements</a>
3822 …ript-, territory- (aka region), and variant- Alias elements, the type and replacements are interpr…
3831 <territoryAlias type="und-AN" replacement="und-CW und-SX und-BQ" reason="deprecated" />
3834 …ment values separated by spaces in the text (such as replacement="und-CW und-SX und-BQ"); other ru…
3836 #### <a name="3.-matches" href="#3.-matches">Matches</a>
3842 `source="ja-heploc-hepburn"` and `type="und-hepburn"`
3851 `source="ja-hepburn"` and `type="und-hepburn-heploc"`
3860 #### <a name="4.-replacement" href="#4.-replacement">4. Replacement</a>
3865 * source.field = (source.field - type.field) ∪ replacement.field
3871 > source="ja-Latn-fonipa-hepburn-heploc"
3873 > rule =`<languageAlias type="und-hepburn-heploc" replacement="und-alalc97">`
3875 > result="ja-Latn-alalc97-fonipa"
3883 #### <a name="5.-canonicalizing-syntax" href="#5.-canonicalizing-syntax">5. Canonicalizing Syntax</…
3888 * If the first subtag has 4 letters, prepend the source with "und-"
3895 * Put any variants into alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)
3896 … any extensions into alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)
3901 * Replace '\_' by '-'
3907 1. Load the rules from supplementalMetadata.xml, replacing '\_' by '-', and adding “und-” as descri…
3909 1. `<languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy" />`
3911 1. `<languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy" />`
3912 2. `<territoryAlias type="und-AAA" replacement="und-AA" reason="overlong" />`
3923 If one rule has a non-empty value set for that field and the other rule does not,
3924 …then order the rule with the non-empty value set for that field before the other rule and disregar…
3928 … * {L={en}, S={Latn}, R={GB}} (because both have non-empty sets for L, S, and R but not for V),
3936 … * {V={hepburn,heploc,simple}} (because both have non-empty sets for V but not for L, S, or R).
3940 …then consider the first position of difference in the two lists and order the rules by code-point …
3957 | languageId | 5.1 total field value set item count | 5.2 non-empty field value set | 5.3 field val…
3958 | --- | --- | --- | --- |
3973 ### <a name="processing-languageids" href="#processing-languageids">Processing LanguageIds</a>
3980 … then apply Step 3 of BCP 47 [Section 4.5](https://www.rfc-editor.org/rfc/rfc5646.html#section-4.5…
3982 3. Else if the first subtag is "x", prefix by "und-".
3983 …4. **Note:** there are currently no valid 4-letter primary language subtags. While it is extremely…
3989 ### <a name="processing-localeids" href="#processing-localeids">Processing LocaleIds</a>
3998 `en-u-ms-imperial ⇒ en-u-ms-uksystem`
3999 …es for the 'sd' and 'rg' keys. However, where the replacement value is a two-letter region code, a…
4003 `en-u-rg-fi01 ⇒ en-u-rg-axzzzz`
4007 …s, but would obviously not be directly suited to production code. Production-level code can use ma…
4014 | -------------------------------------------------------- | --- |
4015 …Reporting form<br/>[https://cldr.unicode.org/index/bug-reports](https://cldr.unicode.org/index/bug
4017 … | The Default Unicode Collation Element Table (DUCET)<br/>For the base-level collation, of w…
4022 …aylight savings information.<br/>[https://www.iana.org/time-zones](https://www.iana.org/time-zones…
4024 …13.0.0_<br/>(Mountain View, CA: The Unicode Consortium, 2020. ISBN 978-1-936213-26-9)<br/>[https:/…
4028-editor.org/rfc/bcp/bcp47.txt](https://www.rfc-editor.org/rfc/bcp/bcp47.txt)<br/>The Registry<br/>…
4029-2/](https://www.loc.gov/standards/iso639-2/)<br/>Actual List<br/>[https://www.loc.gov/standards/i…
4031 …ISO Region Codes<br/>[https://www.iso.org/iso-3166-country-codes.html](https://www.iso.org/iso-316…
4032 … Currency Codes<br/>[https://www.iso.org/iso-4217-currency-codes.html](https://www.iso.org/iso-421…
4033 …me Format<br/>[https://www.iso.org/iso-8601-date-and-time-format.html](https://www.iso.org/iso-860…
4035 …nload at: [https://unece.org/trade/cefact/UNLOCODE-Download](https://unece.org/trade/cefact/UNLOC…
4037 | [<a name="RFC6497" href="#RFC6497">RFC6497</a>] | BCP 47 Extension T - Transformed Conte…
4038 …r/>Composition of macro geographical (continental) regions, geographical sub-regions, and selected…
4042 …ward M. Reingold, Nachum Dershowitz; Cambridge University Press; Book and CD-ROM edition (July 1, …
4043 …sources<br/>[https://unicode-org.github.io/cldr-staging/charts/latest/by_type/index.html](https://…
4044 …E Currency Data<br/>[https://www.iso.org/iso-4217-currency-codes.html](https://www.iso.org/iso-421…
4046 …e<br/>[https://www.unicode.org/cldr/dtd/1.1/ldml-example.xml](https://www.unicode.org/cldr/dtd/1.1…
4047 … | ICU rule syntax<br/>[https://unicode-org.github.io/icu/userguide/collation/customization/](ht…
4048-org.github.io/icu/userguide/transforms/](https://unicode-org.github.io/icu/userguide/transforms/)…
4049-org.github.io/icu/userguide/strings/unicodeset.html<br/>](https://unicode-org.github.io/icu/userg…
4050 …itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2](https://www.itu.int/opb/publications.aspx?pa…
4051 … | ICU Locale Explorer<br/>[https://icu4c-demos.unicode.org/icu-bin/locexp](https://icu4c-demos.un…
4053 …N Locale Naming Guideline<br/>formerly at https://www.openi18n.org/docs/text/LocNameGuide-V10.txt |
4054-Based Number Format<br/>[https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_…
4055 … | Rule-Based Break Iterator<br/>[https://unicode-org.github.io/icu/userguide/boun…
4057-and-frequency-division<br/>](https://www.nist.gov/pml/time-and-frequency-division)U.S. Naval Obse…
4058-style codes to LCIDs)<br/>[https://learn.microsoft.com/en-us/dotnet/api/system.globalization.cult…
4086 * Robin Leroy for his work on compact plurals: Part 3, [Language Plural Rules](tr35-numbers.md#Lang…
4101 that avoids needing a long (and fragile) list of language-script codes
4104 * In [Special Script Codes](#special-script-codes), added a description of special script codes,
4116 * Part 3: [Numbers](tr35-numbers.md#Contents)
4117 …* In [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data), for the `currency` …
4118 added attributes `tz` and `to-tz` to clarify the `from` and `to` dates.
4120 * Part 4: [Dates](tr35-dates.md#Contents)
4121 …* In [Date Format Patterns](tr35-dates.md#Date_Format_Patterns), reserved date Pattern field lengt…
4124 * Part 6: [Supplemental](tr35-info.md#Contents)
4125 …* In [Mixed Units](tr35-info.md#mixed-units), clarified many aspects of mixed units (such as foot-
4127 * In [Testing](tr35-info.md#testing), listed the additional test files.
4128 …* In [Unit Preferences Overrides](tr35-info.md#Unit_Preferences_Overrides), made substantial chang…
4133 …* In [Conversion Data](tr35-info.md#conversion-data), added the `special` attribute for `convertUn…
4134 * In [Unit Prefixes](tr35-info.md#unit-prefixes), added the SI unit prefixes and the power of 10
4137 * Part 7: [Keyboards](tr35-keyboards.md#Contents)
4142 * Part 9: [MessageFormat](tr35-messageFormat.md#Contents)