• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2"https://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5  <meta name="generator" content=
6  "HTML Tidy for HTML5 for Apple macOS version 5.6.0">
7  <meta http-equiv="Content-Type" content=
8  "text/html; charset=utf-8">
9  <meta http-equiv="Content-Language" content="en-us">
10  <link rel="stylesheet" href=
11  "../reports.css" type="text/css">
12  <title>UTS #35: Unicode Locale Data Markup Language</title>
13  <style type="text/css">
14  <!--
15  .dtd {
16        font-family: monospace;
17        font-size: 90%;
18        background-color: #CCCCFF;
19        border-style: dotted;
20        border-width: 1px;
21  }
22
23  .xmlExample {
24        font-family: monospace;
25        font-size: 80%
26  }
27
28  .blockedInherited {
29        font-style: italic;
30        font-weight: bold;
31        border-style: dashed;
32        border-width: 1px;
33        background-color: #FF0000
34  }
35
36  .inherited {
37        font-weight: bold;
38        border-style: dashed;
39        border-width: 1px;
40        background-color: #00FF00
41  }
42
43  .element {
44        font-weight: bold;
45        color: red;
46  }
47
48  .attribute {
49        font-weight: bold;
50        color: maroon;
51  }
52
53  .attributeValue {
54        font-weight: bold;
55        color: blue;
56  }
57
58  li, p {
59        margin-top: 0.5em;
60        margin-bottom: 0.5em
61  }
62
63  h2, h3, h4, h5, table {
64        margin-top: 1.5em;
65        margin-bottom: 0.5em;
66  }
67
68  h5 {
69        font-size: medium;
70        font-style: italic
71  }
72  -->
73  </style>
74</head>
75<body>
76  <table class="header" width="100%">
77    <tr>
78      <td class="icon"><a href="https://unicode.org"><img alt=
79      "[Unicode]" src="../logo60s2.gif"
80      width="34" height="33" style=
81      "vertical-align: middle; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px; border-top-width: 0px;"></a>&nbsp;
82      <a class="bar" href=
83      "https://www.unicode.org/reports/">Technical Reports</a></td>
84    </tr>
85    <tr>
86      <td class="gray">&nbsp;</td>
87    </tr>
88  </table>
89  <div class="body">
90    <h2 style="text-align: center">Unicode Technical Standard #35</h2>
91    <h1>Unicode Locale Data Markup Language (LDML)</h1>
92    <!-- At least the first row of this header table should be identical across the parts of this UTS. -->
93    <table border="1" cellpadding="2" cellspacing="0" class="wide">
94      <tr>
95        <td>Version</td>
96        <td>38</td>
97      </tr>
98      <tr>
99        <td>Editors</td>
100        <td>Mark Davis (<a href="mailto:markdavis@google.com">markdavis@google.com</a>) and
101        <a href="tr35.html#Acknowledgments">other CLDR committee members</a></td>
102      </tr>
103      <tr>
104        <td>Date</td>
105        <td>2020-10-23</td>
106      </tr>
107      <tr>
108        <!-- This link must be made live when posting the final version but is disabled during proposed update stage. -->
109        <td>This Version</td>
110        <td>
111		<a href="https://www.unicode.org/reports/tr35/tr35-61/tr35.html">
112		https://www.unicode.org/reports/tr35/tr35-61/tr35.html</a></td>
113      </tr>
114      <tr>
115        <td>Previous Version</td>
116        <td>
117		<a href="https://www.unicode.org/reports/tr35/tr35-60/tr35.html">
118		https://www.unicode.org/reports/tr35/tr35-60/tr35.html</a></td>
119      </tr>
120      <tr>
121        <td>Latest Version</td>
122        <td><a href=
123        "https://www.unicode.org/reports/tr35/">https://www.unicode.org/reports/tr35/</a></td>
124      </tr>
125      <tr>
126        <td>Corrigenda</td>
127        <td><a href=
128        "http://unicode.org/cldr/corrigenda.html">http://unicode.org/cldr/corrigenda.html</a></td>
129      </tr>
130      <tr>
131        <td>Latest Proposed Update</td>
132        <td><a href=
133        "https://www.unicode.org/reports/tr35/proposed.html">https://www.unicode.org/reports/tr35/proposed.html</a></td>
134      </tr>
135      <tr>
136        <td>Namespace</td>
137        <td><a href=
138        "https://unicode.org/cldr/">https://unicode.org/cldr/</a></td>
139      </tr>
140      <tr>
141        <td>DTDs</td>
142        <td><a href="https://github.com/unicode-org/cldr/tree/maint/maint-38/common/dtd">
143		http://unicode.org/cldr/dtd/38/</a></td>
144      </tr>
145      <tr>
146        <td>Revision</td>
147        <td><a href="#Modifications">61</a></td>
148      </tr>
149    </table>
150    <h3><i>Summary</i></h3>
151    <p>This document describes an XML format (<i>vocabulary</i>)
152    for the exchange of structured locale data. This format is used
153    in the <a href="https://unicode.org/cldr/">Unicode Common Locale
154    Data Repository</a>.</p>
155    <h3><i>Status</i></h3>
156
157    <!-- NOT YET APPROVED
158                <p>
159                                <i class="changed">This is a<b><font color="#ff3333">
160                                draft </font></b>document which may be updated, replaced, or superseded by
161                                other documents at any time. Publication does not imply endorsement
162                                by the Unicode Consortium. This is not a stable document; it is
163                                inappropriate to cite this document as other than a work in
164                                progress.
165                        </i>
166                </p>
167     END NOT YET APPROVED -->
168    <!-- APPROVED -->
169    <p><i>This document has been reviewed by Unicode members and
170    other interested parties, and has been approved for publication
171    by the Unicode Consortium. This is a stable document and may be
172    used as reference material or cited as a normative reference by
173    other specifications.</i></p>
174    <!-- END APPROVED -->
175
176    <blockquote>
177      <p><i><b>A Unicode Technical Standard (UTS)</b> is an
178      independent specification. Conformance to the Unicode
179      Standard does not imply conformance to any UTS.</i></p>
180    </blockquote>
181    <p><i>Please submit corrigenda and other comments with the CLDR
182    bug reporting form [<a href=
183    "http://cldr.unicode.org/index/bug-reports">Bugs</a>]. Related
184    information that is useful in understanding this document is
185    found in the <a href="#References">References</a>. For the
186    latest version of the Unicode Standard see [<a href=
187    "https://www.unicode.org/versions/latest/">Unicode</a>]. For a
188    list of current Unicode Technical Reports see [<a href=
189    "https://www.unicode.org/reports/">Reports</a>]. For more
190    information about versions of the Unicode Standard, see
191    [<a href=
192    "https://www.unicode.org/versions/">Versions</a>].</i></p><!-- This section of Parts should be identical in all of the parts of this UTS. -->
193    <h2><a name="Parts" href="#Parts" id="Parts">Parts</a></h2>
194    <p>The LDML specification is divided into the following
195    parts:</p>
196    <ul class="toc">
197      <li>Part 1: <a href="tr35.html#Contents">Core</a> (languages,
198      locales, basic structure)</li>
199      <li>Part 2: <a href="tr35-general.html#Contents">General</a>
200      (display names &amp; transforms, etc.)</li>
201      <li>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a>
202      (number &amp; currency formatting)</li>
203      <li>Part 4: <a href="tr35-dates.html#Contents">Dates</a>
204      (date, time, time zone formatting)</li>
205      <li>Part 5: <a href=
206      "tr35-collation.html#Contents">Collation</a> (sorting,
207      searching, grouping)</li>
208      <li>Part 6: <a href=
209      "tr35-info.html#Contents">Supplemental</a> (supplemental
210      data)</li>
211      <li>Part 7: <a href=
212      "tr35-keyboards.html#Contents">Keyboards</a> (keyboard
213      mappings)</li>
214    </ul>
215    <h2><a name="Contents" href="#Contents" id="Contents">Contents
216    of Part 1, Core</a></h2>
217    <!-- START Generated TOC: CheckHtmlFiles -->
218    <ul class="toc">
219      <li>1 <a href="#Introduction">Introduction</a>
220        <ul class="toc">
221          <li>1.1 <a href="#Conformance">Conformance</a></li>
222        </ul>
223      </li>
224      <li>2 <a href="#Locale">What is a Locale?</a></li>
225      <li>3 <a href="#Identifiers">Unicode Language and Locale
226      Identifiers</a>
227        <ul class="toc">
228          <li>3.1 <a href="#Unicode_language_identifier">Unicode
229          Language Identifier</a></li>
230          <li>3.2 <a href="#Unicode_locale_identifier">Unicode
231          Locale Identifier</a>
232            <ul class='toc'>
233              <li><a href="#Canonical_Unicode_Locale_Identifiers">3.2.1 Canonical Unicode Locale Identifiers</a></li>
234            </ul>
235          </li>
236          <li>3.3 <a href="#BCP_47_Conformance">BCP 47
237          Conformance</a>
238            <ul class="toc">
239              <li>3.3.1 <a href=
240              "#BCP_47_Language_Tag_Conversion">BCP 47 Language Tag
241              Conversion</a></li>
242            </ul>
243          </li>
244          <li>3.4 <a href="#Field_Definitions">Language Identifier
245          Field Definitions</a>
246            <ul class="toc">
247              <li>Table: <a href=
248              "#Language_Locale_Field_Definitions">Language
249              Identifier Field Definitions</a></li>
250            </ul>
251          </li>
252          <li>3.5 <a href="#Special_Codes">Special Codes</a>
253            <ul class="toc">
254              <li>3.5.1 <a href=
255              "#Unknown_or_Invalid_Identifiers">Unknown or Invalid
256              Identifiers</a></li>
257              <li>3.5.2 <a href="#Numeric_Codes">Numeric
258              Codes</a></li>
259              <li>3.5.3 <a href="#Private_Use_Codes">Private Use
260              Codes</a>
261                <ul class="toc">
262                  <li>Table: <a href="#Private_Use_CLDR">Private
263                  Use Codes in CLDR</a></li>
264                </ul>
265              </li>
266            </ul>
267          </li>
268          <li>3.6 <a href=
269          "#Locale_Extension_Key_and_Type_Data">Unicode BCP 47 U
270          Extension</a>
271            <ul class="toc">
272              <li>3.6.1 <a href="#Key_And_Type_Definitions_">Key
273              And Type Definitions</a>
274                <ul class="toc">
275                  <li>Table: <a href=
276                  "#Key_Type_Definitions">Key/Type
277                  Definitions</a></li>
278                </ul>
279              </li>
280              <li>3.6.2 <a href=
281              "#Numbering%20System%20Data">Numbering System
282              Data</a></li>
283              <li>3.6.3 <a href="#Time_Zone_Identifiers">Time Zone
284              Identifiers</a></li>
285              <li>3.6.4 <a href=
286              "#Unicode_Locale_Extension_Data_Files">U Extension
287              Data Files</a></li>
288              <li>3.6.5 <a href=
289              "#Unicode_Subdivision_Codes">Subdivision Codes</a>
290                <ul class="toc">
291                  <li>3.6.5.1 <a href="#Validity">Validity</a></li>
292                </ul>
293              </li>
294            </ul>
295          </li>
296          <li>3.7 <a href="#t_Extension">Unicode BCP 47 T
297          Extension</a>
298            <ul class="toc">
299              <li>3.7.1 <a href="#Transformed_Content_Data_File">T
300              Extension Data Files</a></li>
301            </ul>
302          </li>
303          <li>3.8 <a href="#Compatibility_with_Older_Identifiers">
304            Compatibility with Older Identifiers</a>
305            <ul class="toc">
306              <li>3.8.1 <a href="#Old_Locale_Extension_Syntax">Old
307              Locale Extension Syntax</a>
308                <ul class="toc">
309                  <li>Table: <a href=
310                  "#Locale_Extension_Mappings">Locale Extension
311                  Mappings</a></li>
312                </ul>
313              </li>
314              <li>3.8.2 <a href="#Legacy_Variants">Legacy
315              Variants</a>
316                <ul class="toc">
317                  <li>Table: <a href=
318                  "#Legacy_Variant_Mappings">Legacy Variant
319                  Mappings</a></li>
320                </ul>
321              </li>
322              <li>3.8.3 <a href="#Relation_to_OpenI18n">Relation to
323              OpenI18n</a></li>
324            </ul>
325          </li>
326          <li>3.9 <a href=
327          "#Transmitting_Locale_Information">Transmitting Locale
328          Information</a>
329            <ul class="toc">
330              <li>3.9.1 <a href=
331              "#Message_Formatting_and_Exceptions">Message
332              Formatting and Exceptions</a></li>
333            </ul>
334          </li>
335          <li>3.10 <a href="#Language_and_Locale_IDs">Unicode
336          Language and Locale IDs</a>
337            <ul class="toc">
338              <li>3.10.1 <a href="#Written_Language">Written
339              Language</a></li>
340              <li>3.10.2 <a href="#Hybrid_Locale">Hybrid Locale
341              Identifiers</a></li>
342            </ul>
343          </li>
344          <li>3.11 <a href="#Validity_Data">Validity Data</a></li>
345        </ul>
346      </li>
347      <li>4 <a href="#Locale_Inheritance">Locale Inheritance and
348      Matching</a>
349        <ul class="toc">
350          <li>4.1 <a href="#Lookup">Lookup</a>
351            <ul class="toc">
352              <li>4.1.1 <a href="#Bundle_vs_Item_Lookup">Bundle vs
353              Item Lookup</a>
354                <ul class="toc">
355                  <li>Table: <a href="#Lookup-Differences">Lookup
356                  Differences</a></li>
357                </ul>
358              </li>
359              <li>4.1.2 <a href="#Multiple_Inheritance">Lateral
360              Inheritance</a>
361                <ul class="toc">
362                  <li>Table: <a href="#Count_Fallback_normal">Count
363                  Fallback: normal</a></li>
364                  <li>Table: <a href=
365                  "#Count_Fallback_currency">Count Fallback:
366                  currency</a></li>
367                </ul>
368              </li>
369              <li>4.1.3 <a href="#Parent_Locales">Parent
370              Locales</a></li>
371            </ul>
372          </li>
373          <li>4.2 <a href="#Inheritance_and_Validity">Inheritance
374          and Validity</a>
375            <ul class="toc">
376              <li>4.2.1 <a href="#Definitions">Definitions</a></li>
377              <li>4.2.2 <a href="#Resolved_Data_File">Resolved Data
378              File</a></li>
379              <li>4.2.3 <a href="#Valid_Data">Valid Data</a></li>
380              <li>4.2.4 <a href=
381              "#Checking_for_Draft_Status">Checking for Draft
382              Status</a></li>
383              <li>4.2.5 <a href=
384              "#Keyword_and_Default_Resolution">Keyword and Default
385              Resolution</a></li>
386              <li>4.2.6 <a href=
387              "#Inheritance_vs_Related">Inheritance vs Related
388              Information</a></li>
389            </ul>
390          </li>
391          <li>4.3 <a href="#Likely_Subtags">Likely Subtags</a></li>
392          <li>4.4 <a href="#LanguageMatching">Language Matching</a>
393            <ul class='toc'>
394              <li>4.4.1 <a href=
395              "#EnhancedLanguageMatching">Enhanced Language
396              Matching</a></li>
397            </ul>
398          </li>
399        </ul>
400      </li>
401      <li>5 <a href="#XML_Format">XML Format</a>
402        <ul class="toc">
403          <li>5.1 <a href="#Common_Elements">Common Elements</a>
404            <ul class="toc">
405              <li>5.1.1 <a href="#special">Element special</a>
406                <ul class="toc">
407                  <li>5.1.1.1 <a href=
408                  "#Sample_Special_Elements">Sample Special
409                  Elements</a></li>
410                </ul>
411              </li>
412              <li>5.1.2 <a href="#Alias_Elements">Element alias</a>
413                <ul class="toc">
414                  <li>Table: <a href=
415                  "#Inheritance_with_source_locale_">Inheritance
416                  with source="locale"</a></li>
417                </ul>
418              </li>
419              <li>5.1.3 <a href="#Element_displayName">Element
420              displayName</a></li>
421              <li>5.1.4 <a href="#Escaping_Characters">Escaping
422              Characters</a></li>
423            </ul>
424          </li>
425          <li>5.2 <a href="#Common_Attributes">Common
426          Attributes</a>
427            <ul class="toc">
428              <li>5.2.1 <a href="#Attribute_type">Attribute
429              type</a></li>
430              <li>5.2.2 <a href="#Attribute_draft">Attribute
431              draft</a></li>
432              <li>5.2.3 <a href="#alt_attribute">Attribute
433              alt</a></li>
434            </ul>
435          </li>
436          <li>5.3 <a href="#Common_Structures">Common
437          Structures</a>
438            <ul class="toc">
439              <li>5.3.1 <a href="#Date_Ranges">Date and Date
440              Ranges</a></li>
441              <li>5.3.2 <a href="#Text_Directionality">Text
442              Directionality</a></li>
443              <li>5.3.3 <a href="#Unicode_Sets">Unicode Sets</a>
444                <ul class="toc">
445                  <li>5.3.3.1 <a href="#Lists_of_Code_Points">Lists
446                  of Code Points</a></li>
447                  <li>5.3.3.2 <a href="#Unicode_Properties">Unicode
448                  Properties</a></li>
449                  <li>5.3.3.3 <a href="#Boolean_Operations">Boolean
450                  Operations</a></li>
451                  <li>5.3.3.4 <a href=
452                  "#UnicodeSet_Examples">UnicodeSet
453                  Examples</a></li>
454                </ul>
455              </li>
456              <li>5.3.4 <a href="#String_Range">String
457              Range</a></li>
458            </ul>
459          </li>
460          <li>5.4 <a href="#Identity_Elements">Identity
461          Elements</a></li>
462          <li>5.5 <a href="#Valid_Attribute_Values">Valid Attribute
463          Values</a></li>
464          <li>5.6 <a href="#Canonical_Form">Canonical Form</a>
465            <ul class="toc">
466              <li>5.6.1 <a href="#Content">Content</a></li>
467              <li>5.6.2 <a href="#Ordering">Ordering</a></li>
468              <li>5.6.3 <a href="#Comments">Comments</a></li>
469            </ul>
470          </li>
471          <li>5.7 <a href="#DTD_Annotations">DTD
472          Annotations</a>
473            <ul class='toc'>
474              <li>5.7.1 <a href="#match_expressions" >Attribute Value Constraints</a></li>
475            </ul>
476          </li>
477        </ul>
478      </li>
479      <li>6 <a href="#Property_Data">Property Data</a>
480        <ul class="toc">
481          <li>6.1 <a href="#Script_Metadata">Script
482          Metadata</a></li>
483          <li>6.2 <a href="#Extended_Pictographic">Extended
484          Pictographic</a></li>
485          <li>6.3 <a href="#Labels.txt">Labels.txt</a></li>
486          <li><a href="#Segmentation_Tests">6.4 Segmentation Tests</a></li>
487        </ul>
488      </li>
489      <li>7 <a href="#Format_Parse_Issues">Issues in Formatting and
490      Parsing</a>
491        <ul class="toc">
492          <li>7.1 <a href="#Lenient_Parsing">Lenient Parsing</a>
493            <ul class="toc">
494              <li>7.1.1 <a href="#Motivation">Motivation</a></li>
495              <li>7.1.2 <a href="#Loose_Matching">Loose
496              Matching</a></li>
497            </ul>
498          </li>
499          <li>7.2 <a href="#Invalid_Patterns">Handling Invalid
500          Patterns</a></li>
501        </ul>
502      </li>
503      <li>Annex A <a href="#Deprecated_Structure">Deprecated
504      Structure</a>
505        <ul class="toc">
506          <li>A.1 <a href="#Fallback_Elements">Element
507          fallback</a></li>
508          <li>A.2 <a href="#BCP47_Keyword_Mapping">BCP 47 Keyword
509          Mapping</a></li>
510          <li>A.3 <a href="#Choice_Patterns">Choice
511          Patterns</a></li>
512          <li>A.4 <a href="#Element_default">Element
513          default</a></li>
514          <li>A.5 <a href=
515          "#Deprecated_Common_Attributes">Deprecated Common
516          Attributes</a>
517            <ul>
518              <li>A.5.1 <a href="#Attribute_standard">Attribute
519              standard</a></li>
520              <li>A.5.2 <a href=
521              "#Attribute_draft_nonLeaf">Attribute draft in
522              non-leaf elements</a></li>
523            </ul>
524          </li>
525          <li>A.6 <a href="#Element_base">Element base</a></li>
526          <li>A.7 <a href="#Element_rules">Element rules</a></li>
527          <li>A.8 <a href=
528          "#Deprecated_subelements_of_dates">Deprecated subelements
529          of &lt;dates&gt;</a></li>
530          <li>A.9 <a href=
531          "#Deprecated_subelements_of_calendars">Deprecated
532          subelements of &lt;calendars&gt;</a></li>
533          <li>A.10 <a href=
534          "#Deprecated_subelements_of_timeZoneNames">Deprecated
535          subelements of &lt;timeZoneNames&gt;</a></li>
536          <li>A.11 <a href=
537          "#Deprecated_subelements_of_zone_metazone">Deprecated
538          subelements of &lt;zone&gt; and &lt;metazone&gt;</a></li>
539          <li>A.12 <a href=
540          "#Renamed_attribute_values_for_contextTransformUsage">Renamed
541          attribute values for &lt;contextTransformUsage&gt;
542          element</a></li>
543          <li>A.13 <a href=
544          "#Deprecated_subelements_of_segmentations">Deprecated
545          subelements of &lt;segmentations&gt;</a></li>
546          <li>A.14 <a href="#Element_cp">Element cp</a></li>
547          <li>A.15 <a href="#validSubLocales">Attribute
548          validSubLocales</a></li>
549          <li>A.16 <a href="#postCodeElements">Elements
550          postalCodeData, postCodeRegex</a></li>
551          <li>A.17 <a href="#telephoneCodeData">Element
552          telephoneCodeData</a></li>
553        </ul>
554      </li>
555      <li>Annex B <a href="#Links_to_Other_Parts">Links to Other
556      Parts</a>
557        <ul class="toc">
558          <li>Table: <a href="#Part_2_Links">Part 2 Links: General
559          (display names &amp; transforms, etc.)</a></li>
560          <li>Table: <a href="#Part_3_Links">Part 3 Links: Numbers
561          (number &amp; currency formatting)</a></li>
562          <li>Table: <a href="#Part_4_Links">Part 4 Links: Dates
563          (date, time, time zone formatting)</a></li>
564          <li>Table: <a href="#Part_5_Links">Part 5 Links:
565          Collation (sorting, searching, grouping)</a></li>
566          <li>Table: <a href="#Part_6_Links">Part 6 Links:
567          Supplemental (supplemental data)</a></li>
568          <li>Table: <a href="#Part_7_Links">Part 7 Links:
569          Keyboards (keyboard mappings)</a></li>
570        </ul>
571      </li>
572      <li>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></li>
573      <li><a href="#References">References</a></li>
574      <li><a href="#Acknowledgments">Acknowledgments</a></li>
575      <li><a href="#Modifications">Modifications</a></li>
576    </ul><!-- END Generated TOC: CheckHtmlFiles -->
577    <h2><a name="Introduction" href="#Introduction" id=
578    "Introduction">1 Introduction</a></h2>
579    <p>Not long ago, computer systems were like separate worlds,
580    isolated from one another. The internet and related events have
581    changed all that. A single system can be built of many
582    different components, hardware and software, all needing to
583    work together. Many different technologies have been important
584    in bridging the gaps; in the internationalization arena,
585    Unicode has provided a lingua franca for communicating textual
586    data. However, there remain differences in the locale data used
587    by different systems.</p>
588    <p>The best practice for internationalization is to store and
589    communicate language-neutral data, and format that data for the
590    client. This formatting can take place on any of a number of
591    the components in a system; a server might format data based on
592    the user's locale, or it could be that a client machine does
593    the formatting. The same goes for parsing data, and
594    locale-sensitive analysis of data.</p>
595    <p>But there remain significant differences across systems and
596    applications in the locale-sensitive data used for such
597    formatting, parsing, and analysis. Many of those differences
598    are simply gratuitous; all within acceptable limits for human
599    beings, but yielding different results. In many other cases
600    there are outright errors. Whatever the cause, the differences
601    can cause discrepancies to creep into a heterogeneous system.
602    This is especially serious in the case of collation
603    (sort-order), where different collation caused not only
604    ordering differences, but also different results of queries!
605    That is, with a query of customers with names between "Abbot,
606    Cosmo" and "Arnold, James", if different systems have different
607    sort orders, different lists will be returned. (For comparisons
608    across systems formatted as HTML tables, see [<a href=
609    "#Comparisons">Comparisons</a>].)</p>
610    <blockquote>
611      <p class="note"><b>Note:</b> There are many different equally
612      valid ways in which data can be judged to be "correct" for a
613      particular locale. The goal for the common locale data is to
614      make it as consistent as possible with existing locale data,
615      and acceptable to users in that locale.</p>
616    </blockquote>
617    <p>This document specifies an XML format for the communication
618    of locale data: the Unicode Locale Data Markup Language (LDML).
619    This provides a common format for systems to interchange locale
620    data so that they can get the same results in the services
621    provided by internationalization libraries. It also provides a
622    standard format that can allow users to customize the behavior
623    of a system. With it, for example, collation (sorting) rules
624    can be exchanged, allowing two implementations to exchange a
625    specification of tailored collation rules. Using the same
626    specification, the two implementations will achieve the same
627    results in comparing strings. Unicode LDML can also be used to
628    let a user encapsulate specialized sorting behavior for a
629    specific domain, or create a customized locale for a minority
630    language. Unicode LDML is also used in the Unicode Common
631    Locale Data Repository (CLDR). CLDR uses an open process for
632    reconciling differences between the locale data used on
633    different systems and validating the data, to produce with a
634    useful, common, consistent base of locale data.</p>
635    <p>For more information, see the Common Locale Data Repository
636    project page [<a href="#localeProject">LocaleProject</a>].</p>
637    <p>As LDML is an interchange format, it was designed for ease
638    of maintenance and simplicity of transformation into other
639    formats, above efficiency of run-time lookup and use.
640    Implementations should consider converting LDML data into a
641    more compact format prior to use.</p>
642    <h3><a name="Conformance" href="#Conformance" id=
643    "Conformance">1.1 Conformance</a></h3>
644    <p>There are many ways to use the Unicode LDML format and the
645    data in CLDR, and the Unicode Consortium does not restrict the
646    ways in which the format or data are used. However, an
647    implementation may also claim conformance to LDML or to CLDR,
648    as follows:</p>
649    <p>&nbsp;</p>
650    <p><i><b>UAX35-C1.</b></i> An implementation that claims
651    conformance to this specification shall:</p>
652    <ol>
653      <li>Identify the sections of the specification that it
654      conforms to.
655        <ul>
656          <li>For example, an implementation might claim
657          conformance to all LDML features except for
658          <i>transforms</i> and <i>segments</i>.</li>
659        </ul>
660      </li>
661      <li>Interpret the relevant elements and attributes of LDML
662      documents in accordance with the descriptions in those
663      sections.
664        <ul>
665          <li>For example, an implementation that claims
666          conformance to the date format patterns must interpret
667          the characters in such patterns according to <a href=
668          "tr35-dates.html#Date_Field_Symbol_Table">Date Field
669          Symbol Table</a>.</li>
670        </ul>
671      </li>
672      <li>Declare which types of CLDR data that it uses.
673        <ul>
674          <li>For example, an implementation might declare that it
675          only uses language names, and those with a <i>draft</i>
676          status of <i>contributed</i> or <i>approved</i>.</li>
677        </ul>
678      </li>
679    </ol>
680    <p><i><b>UAX35-C2.</b></i> An implementation that claims
681    conformance to Unicode locale or language identifiers
682    shall:</p>
683    <ol>
684      <li>Specify whether Unicode locale extensions are
685      allowed</li>
686      <li>Specify the canonical form used for identifiers in terms
687      of casing and field separator characters.</li>
688    </ol>
689    <p>External specifications may also reference particular
690    components of Unicode locale or language identifiers, such
691    as:</p>
692    <blockquote>
693      <p><i>Field X can contain any Unicode region subtag values as
694      given in Unicode Technical Standard #35: Unicode Locale Data
695      Markup Language (LDML), excluding grouping codes.</i></p>
696    </blockquote>
697    <h2><a name="Locale" href="#Locale" id="Locale">2 What is a
698    Locale?</a></h2>
699    <p>Before diving into the XML structure, it is helpful to
700    describe the model behind the structure. People do not have to
701    subscribe to this model to use data in LDML, but they do need
702    to understand it so that the data can be correctly translated
703    into whatever model their implementation uses.</p>
704    <p>The first issue is basic: <i>what is a locale?</i> In this
705    model, a locale is an identifier (id) that refers to a set of
706    user preferences that tend to be shared across significant
707    swaths of the world. Traditionally, the data associated with
708    this id provides support for formatting and parsing of dates,
709    times, numbers, and currencies; for measurement units, for
710    sort-order (collation), plus translated names for time zones,
711    languages, countries, and scripts. The data can also include
712    support for text boundaries (character, word, line, and
713    sentence), text transformations (including transliterations),
714    and other services.</p>
715    <p>Locale data is not cast in stone: the data used on someone's
716    machine generally may reflect the US format, for example, but
717    preferences can typically set to override particular items,
718    such as setting the date format for 2002.03.15, or using metric
719    or Imperial measurement units. In the abstract, locales are
720    simply one of many sets of preferences that, say, a website may
721    want to remember for a particular user. Depending on the
722    application, it may want to also remember the user's time zone,
723    preferred currency, preferred character set, smoker/non-smoker
724    preference, meal preference (vegetarian, kosher, and so on),
725    music preference, religion, party affiliation, favorite
726    charity, and so on.</p>
727    <p>Locale data in a system may also change over time: country
728    boundaries change; governments (and currencies) come and go:
729    committees impose new standards; bugs are found and fixed in
730    the source data; and so on. Thus the data needs to be versioned
731    for stability over time.</p>
732    <p>In general terms, the locale id is a parameter that is
733    supplied to a particular service (date formatting, sorting,
734    spell-checking, and so on). The format in this document does
735    not attempt to represent all the data that could conceivably be
736    used by all possible services. Instead, it collects together
737    data that is in common use in systems and internationalization
738    libraries for basic services. The main difference among locales
739    is in terms of language; there may also be some differences
740    according to different countries or regions. However, the line
741    between <i>locales</i> and <i>languages</i>, as commonly used
742    in the industry, are rather fuzzy. Note also that the vast
743    majority of the locale data in CLDR is in fact language data;
744    all non-linguistic data is separated out into a separate tree.
745    For more information, see <i><a href=
746    "#Language_and_Locale_IDs">Section 3.10 Language and Locale
747    IDs</a></i>.</p>
748    <p>We will speak of data as being "in locale X". That does not
749    imply that a locale <i>is</i> a collection of data; it is
750    simply shorthand for "the set of data associated with the
751    locale id X". Each individual piece of data is called a
752    <i>resource</i> or <i>field</i>, and a tag indicating the key
753    of the resource is called a <i>resource tag.</i></p>
754    <h2><a name="Identifiers" href="#Identifiers" id=
755    "Identifiers"></a> <a name=
756    "Unicode_Language_and_Locale_Identifiers" href=
757    "#Unicode_Language_and_Locale_Identifiers" id=
758    "Unicode_Language_and_Locale_Identifiers">3 Unicode Language
759    and Locale Identifiers</a></h2>
760    <p>Unicode LDML uses stable identifiers based on [<a href=
761    "#BCP47">BCP47</a>] for distinguishing among languages,
762    locales, regions, currencies, time zones, transforms, and so
763    on. There are many systems for identifiers for these entities.
764    The Unicode LDML identifiers may not match the identifiers used
765    on a particular target system. If so, some process of
766    identifier translation may be required when using LDML
767    data.</p>
768    <p>The BCP 47 extensions (-u- and -t-) are described in
769    <em>Section 3.6 <a href="#u_Extension">Unicode BCP 47 U
770    Extension</a></em> and <em>Section 3.7 <a href=
771    "#BCP47_T_Extension">Unicode BCP 47 T Extension</a></em>.</p>
772    <h3><i><a name="Unicode_language_identifier" href=
773    "#Unicode_language_identifier" id=
774    "Unicode_language_identifier">3.1 Unicode Language
775    Identifier</a></i></h3>
776    <p>A <i>Unicode language identifier</i> has the following
777		structure (provided in EBNF (Perl-based)). The following table defines
778    syntactically well-formed identifiers: they are not necessarily
779    valid identifiers. For additional validity criteria, see the
780    links on the right.</p>
781    <table>
782      <tr>
783        <th>&nbsp;</th>
784        <th>
785          <div align="center">
786            EBNF
787          </div>
788        </th>
789        <th>
790          <div align="center">
791            Validity / Comments
792          </div>
793        </th>
794      </tr>
795      <tr>
796        <td><code><a href="#unicode_language_id" name=
797        "unicode_language_id" id=
798        "unicode_language_id">unicode_language_id</a></code></td>
799        <td><code>= "root"<br>
800        | (unicode_language_subtag<br>
801        &nbsp; &nbsp; (sep unicode_script_subtag)?<br>
802        &nbsp; | unicode_script_subtag)<br>
803        &nbsp; (sep unicode_region_subtag)?<br>
804        &nbsp; (sep unicode_variant_subtag)* ;</code></td>
805        <td>"root" is treated as a special
806        <code>unicode_language_subtag</code></td>
807      </tr>
808      <tr>
809        <td><code><a href="#unicode_language_subtag" name=
810        "unicode_language_subtag" id=
811        "unicode_language_subtag">unicode_language_subtag</a></code></td>
812        <td><code>= alpha{2,3} | alpha{5,8};</code></td>
813        <td><code><a href=
814        '#unicode_language_subtag_validity'>validity</a><br>
815        <a href=
816        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/language.xml'>
817        latest-data</a></code></td>
818      </tr>
819      <tr>
820        <td><code><a href="#unicode_script_subtag" name=
821        "unicode_script_subtag" id=
822        "unicode_script_subtag">unicode_script_subtag</a></code></td>
823        <td><code>= alpha{4} ;</code></td>
824        <td><code><a href=
825        '#unicode_script_subtag_validity'>validity</a><br>
826        <a href=
827        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/script.xml'>
828        latest-data</a></code></td>
829      </tr>
830      <tr>
831        <td><code><a href="#unicode_region_subtag" name=
832        "unicode_region_subtag" id=
833        "unicode_region_subtag">unicode_region_subtag</a></code></td>
834        <td><code>= (alpha{2} | digit{3}) ;</code></td>
835        <td><code><a href=
836        '#unicode_language_subtag_validity'>validity</a><br>
837        <a href=
838        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/region.xml'>
839        latest-data</a></code></td>
840      </tr>
841      <tr>
842        <td><code><a href="#unicode_variant_subtag" name=
843        "unicode_variant_subtag" id=
844        "unicode_variant_subtag">unicode_variant_subtag</a></code></td>
845        <td><code>= (alphanum{5,8}<br>
846        | digit alphanum{3}) ;</code></td>
847        <td><code><a href=
848        '#unicode_language_subtag_validity'>validity</a><br>
849        <a href=
850        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/variant.xml'>
851        latest-data</a></code></td>
852      </tr>
853      <tr>
854        <td><code>sep</code></td>
855        <td><code>= [-_] ;</code></td>
856      </tr>
857      <tr>
858        <td><code>digit</code></td>
859        <td><code>= [0-9] ;</code></td>
860      </tr>
861      <tr>
862        <td><code>alpha</code></td>
863        <td><code>= [A-Z a-z] ;</code></td>
864      </tr>
865      <tr>
866        <td><code>alphanum</code></td>
867        <td><code>= [0-9 A-Z a-z] ;</code></td>
868      </tr>
869    </table>
870    <p>The semantics of the various subtags is explained in
871    <em>Section 3.4 <a href="#Field_Definitions">Language
872    Identifier Field Definitions</a></em> ; there are also direct
873    links from <code><a href=
874    "#unicode_language_subtag">unicode_language_subtag</a></code> ,
875    etc. While theoretically the <code><a href=
876    "#unicode_language_subtag">unicode_language_subtag</a></code>
877    may have more than 3 letters through the IANA registration
878    process, in practice that has not occurred. The <code><a href=
879    "#unicode_language_subtag">unicode_language_subtag</a></code>
880    "und" may be omitted when there is a <code><a href=
881    "#unicode_script_subtag">unicode_script_subtag</a></code> ; for
882    that reason <code><a href=
883    "#unicode_language_subtag">unicode_language_subtag</a></code>
884    values with 4 letters are not permitted. However, such
885    <code><a href=
886    "#unicode_language_id">unicode_language_id</a></code> values
887    are not intended for general interchange, because they are not
888    valid BCP 47 tags. Instead, they are intended for certain
889    protocols such as the identification of transliterators or font
890    ScriptLangTag values. For more information on language subtags with 4 letters, see <a  href=
891    "#Language_Tag_to_Locale_Identifier" >BCP 47 Language Tag to
892	Unicode BCP 47 Locale Identifier</a>.</p>
893    <p>For example, "en-US" (American English), "en_GB" (British
894    English), "es-419" (Latin American Spanish), and "uz-Cyrl"
895    (Uzbek in Cyrillic) are all valid Unicode language
896    identifiers.</p>
897    <h3><i><a name="Unicode_locale_identifier" href=
898    "#Unicode_locale_identifier" id="Unicode_locale_identifier">3.2
899    Unicode Locale Identifier</a></i></h3>
900    <p>A <i>Unicode locale identifier</i> is composed of a Unicode
901    language identifier plus (optional) locale extensions. It has
902    the following structure. The semantics of the U and T
903    extensions are explained in <em>Section 3.6 <a href=
904    "#u_Extension">Unicode BCP 47 U Extension</a></em> and
905    <em>Section 3.7 <a href="#BCP47_T_Extension">Unicode BCP 47 T
906    Extension</a></em>. Other extensions and private use extensions
907    are supported for pass-through. The following table defines
908    syntactically <em>well-formed</em> identifiers: they are not
909    necessarily <em>valid</em> identifiers. For additional validity
910    criteria, see the links on the right. </p>
911    <p>As is often the case, the complete syntactic constraints are not easily captured by ABNF, so there is a further condition: There cannot be more than one extension with the
912		  same singleton (-a-, …, -t-, -u-, …). Note that the private use extension (-x-) must
913    come after all other extensions. </p>
914    <table border="0">
915      <tr>
916        <th>&nbsp;</th>
917        <th>
918          <div align="center">
919            EBNF
920          </div>
921        </th>
922        <th>
923          <div align="center">
924            Validity
925          </div>
926        </th>
927      </tr>
928      <tr>
929        <td><code><a href="#unicode_locale_id" name=
930        "unicode_locale_id" id=
931        "unicode_locale_id">unicode_locale_id</a></code></td>
932        <td><code>= unicode_language_id<br>
933        &nbsp; extensions*<br>
934        &nbsp; pu_extensions? ;</code></td>
935      </tr>
936      <tr>
937        <td><code><a href="#extensions" name="extensions" id=
938        "extensions">extensions</a></code></td>
939        <td><code>= unicode_locale_extensions<br>
940        | transformed_extensions<br>
941        | other_extensions ;</code></td>
942      </tr>
943      <tr>
944        <td><code><a href="#unicode_locale_extensions" name=
945        "unicode_locale_extensions" id=
946        "unicode_locale_extensions">unicode_locale_extensions</a></code></td>
947        <td><code>= sep [uU]<br>
948        &nbsp; ((sep keyword)+<br>
949        &nbsp; |(sep attribute)+ (sep keyword)*) ;</code></td>
950      </tr>
951      <tr>
952        <td><code><a href="#transformed_extensions" name=
953        "transformed_extensions" id=
954        "transformed_extensions">transformed_extensions</a></code></td>
955        <td><code>= sep [tT]<br>
956        &nbsp; ((sep tlang (sep tfield)*)<br>
957        &nbsp; | (sep tfield)+) ;</code></td>
958      </tr>
959      <tr>
960        <td><code><a href="#pu_extensions" name="pu_extensions" id=
961        "pu_extensions">pu_extensions</a></code></td>
962        <td><code>= sep [xX]<br>
963		&nbsp; (sep alphanum{1,8})+ ;</code></td>
964      </tr>
965      <tr>
966        <td><code><a href="#other_extensions" name=
967        "other_extensions" id=
968        "other_extensions">other_extensions</a></code></td>
969        <td><code>= sep [alphanum-[tTuUxX]]<br>
970        &nbsp; (sep alphanum{2,8})+ ;</code></td>
971      </tr>
972      <tr>
973        <td><code>keyword</code><br>
974			(Also known as <code>uvalue</code>)</td>
975        <td><code>= key (sep type)? ;</code></td>
976      </tr>
977      <tr>
978        <td><code>key</code><br>
979			(Also known as <code>ukey</code>)</td>
980        <td><code>= alphanum alpha ;</code><br>
981          (Note that this is narrower than in [<a href="https://www.ietf.org/rfc/rfc6067.txt" title="https://www.ietf.org/rfc/rfc6067.txt">RFC6067</a>], so that it is disjoint with tkey.)</td>
982        <td><code><a href="#Key_Type_Definitions">validity</a><br>
983        <a href=
984        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/bcp47'>latest-data</a></code></td>
985      </tr>
986      <tr>
987        <td><code>type</code><br>
988			(Also known as <code>uvalue</code>)</td>
989        <td><code>= alphanum{3,8}<br>
990        &nbsp; (sep alphanum{3,8})* ;</code></td>
991        <td><code><a href="#Key_Type_Definitions">validity</a><br>
992        <a href=
993        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/bcp47'>latest-data</a></code></td>
994      </tr>
995      <tr>
996        <td><code>attribute</code></td>
997        <td><code>= alphanum{3,8} ;</code></td>
998      </tr>
999      <tr>
1000        <td><code><a name="unicode_subdivision_id" href=
1001        "#unicode_subdivision_id" id=
1002        "unicode_subdivision_id">unicode_subdivision_id</a><a name=
1003        "unicode_subdivision_subtag" id=
1004        "unicode_subdivision_subtag"></a><a name=
1005        "subdivision_attribute" id=
1006        "subdivision_attribute"></a></code></td>
1007        <td><code>= <a href=
1008        "#unicode_region_subtag">unicode_region_subtag</a>
1009        unicode_subdivision_suffix ;</code></td>
1010        <td><code><a href=
1011        '#unicode_subdivision_subtag_validity'>validity</a><br>
1012        <a href=
1013        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/subdivision.xml'>
1014        latest-data</a></code></td>
1015      </tr>
1016      <tr>
1017        <td><code>unicode_subdivision_suffix</code></td>
1018        <td><code>= alphanum{1,4} ;</code></td>
1019      </tr>
1020      <tr>
1021        <td><code><a name="unicode_measure_unit" href=
1022        "#unicode_measure_unit" id=
1023        "unicode_measure_unit">unicode_measure_unit</a></code></td>
1024        <td><code>= alphanum{3,8}<br>
1025        &nbsp; (sep alphanum{3,8})* ;</code></td>
1026        <td><code><a href='#Validity_Data'>validity</a><br>
1027        <a href=
1028        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/unit.xml'>latest-data</a></code></td>
1029      </tr>
1030      <tr>
1031        <td><code>tlang</code></td>
1032        <td><code>= unicode_language_subtag<br>
1033        &nbsp; (sep unicode_script_subtag)?<br>
1034        &nbsp; (sep unicode_region_subtag)?<br>
1035        &nbsp; (sep unicode_variant_subtag)* ;</code></td>
1036      </tr>
1037      <tr>
1038        <td><code>tfield</code></td>
1039        <td><code>= tkey tvalue;</code></td>
1040        <td><code><a href="#BCP47_T_Extension">validity</a><br>
1041        <a href=
1042        'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/bcp47'>latest-data</a></code></td>
1043      </tr>
1044      <tr>
1045        <td><code>tkey</code></td>
1046        <td><code>= alpha digit ;</code></td>
1047      </tr>
1048      <tr>
1049        <td><code>tvalue</code></td>
1050        <td><code>= (sep alphanum{3,8})+ ;</code></td>
1051      </tr>
1052    </table>
1053    <p>For historical reasons, this is called a Unicode locale
1054    identifier. However, it really functions (with few exceptions)
1055    as a <span class="st">language</span> identifier, and accesses
1056    <span class="st">language</span>-based data. Except where it
1057    would be unclear, this document uses the term "locale" data
1058    loosely to encompass both types of data: for more information,
1059    see <i><a href="#Language_and_Locale_IDs">Section 3.10 Language
1060    and Locale IDs</a></i>.</p>
1061    <p>As of the release of this specification, there were no
1062    other_extensions defined. The other_extensions are present in
1063    the syntax to allow implementations to preserve that
1064    information.</p>
1065    <p>As for terminology, the term <i>code</i> may also be used
1066    instead of "subtag", and "territory" instead of "region". The
1067    primary language subtag is also called the <i>base language
1068    code</i>. For example, the base language code for "en-US"
1069    (American English) is "en" (English). The <i>type</i> may also
1070    be referred to as a <i>value</i> or <i>key-value</i>.</p>
1071    <p>The identifiers can vary in case and in the separator
1072    characters. The "-" and "_" separators are treated as
1073    equivalent, although "-" is preferred.</p>
1074    <p>All identifier field values are case-insensitive. Although
1075    case distinctions do not carry any special meaning, an
1076    implementation of LDML should use the casing recommendations in
1077    [<a href="#BCP47">BCP47</a>], especially when a Unicode locale
1078    identifier is used for locale data exchange in software
1079    protocols.</p>
1080    <h4><a name="Canonical_Unicode_Locale_Identifiers" href="#Canonical_Unicode_Locale_Identifiers">3.2.1 Canonical Unicode Locale Identifiers</a></h4>
1081    <p>A <code><a href=
1082    "#unicode_locale_id">unicode_locale_id</a></code> has <em>canonical syntax</em> when:</p>
1083    <ul>
1084		<li>It starts with a language subtag (those beginning with a script subtag are only for specialized use)</li>
1085      <li>Casing
1086        <ul>
1087        <li>Any script subtag is in title case (eg, Hant)</li>
1088        <li>Any region subtag is in uppercase (eg, DE)</li>
1089        <li>All other subtags are in lowercase (eg, en, fonipa)</li>
1090        </ul>
1091      </li>
1092      <li>Order
1093        <ul>
1094          <li>Any variants are in alphabetical order (eg, en-fonipa-scouse,
1095            not en-scouse-fonipa)</li>
1096          <li>Any extensions are in alphabetical order by their singleton
1097            (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)</li>
1098          <li>All attributes are sorted in alphabetical order.</li>
1099          <li>All keywords and tfields  are sorted by alphabetical order of their keys, within their respective extensions.</li>
1100          <li>Any type or tfield value "true" is removed.</li>
1101        </ul>
1102      </li>
1103    </ul>
1104	  <p>For example, the canonical form of
1105      "en-u-foo-bar-nu-thai-ca-buddhist-kk-true" is
1106      "en-u-bar-foo-ca-buddhist-kk-nu-thai". The attributes "foo" and
1107      "bar" in this example are provided only for illustration; no
1108      attribute subtags are defined by the current CLDR
1109      specification.</p>
1110    <p><b>Note:</b> The current version of CLDR data uses some
1111    non-preferred <em>syntax</em> for backward compatibility. This might be
1112    changed in future CLDR releases.</p>
1113    <ul>
1114      <li>It uses uppercase letters for variant subtags, while the
1115      preferred forms are all lowercase.</li>
1116      <li>It uses "_" as the separator, while the preferred form of
1117      the separator is "-".</li>
1118      <li>It uses "root", while the preferred form is "und".</li>
1119    </ul>
1120
1121    <p>A <code><a href=
1122    "#unicode_locale_id">unicode_locale_id</a></code> is in <em>canonical form</em> when it has canonical syntax and contains no aliased subtags. A <code><a href=
1123    "#unicode_locale_id">unicode_locale_id</a></code> can be transformed into canonical form according to <a href="#LocaleId_Canonicalization" >Annex C. LocaleId Canonicalization</a>.</p>
1124
1125
1126	    <p>A <code><a href=
1127    "#unicode_locale_id">unicode_locale_id</a></code> is <em>maximal</em> when the  <code><a href=
1128    "#unicode_language_id">unicode_language_id</a></code> and tlang (if any) have been transformed by the Add Likely Subtags operation in <em>Section 4.3 <a href="#Likely_Subtags">Likely Subtags</a></em>, excluding &quot;und&quot;. </p>
1129	    <blockquote><em>Example:</em> the maxmal form of ja-Kana-t-it is ja-Kana-JP-t-it-Latn-IT</blockquote>
1130	    <p>Two <code><a href=
1131    "#unicode_locale_id">unicode_locale_ids</a></code> are <em>equivalent</em> when their maximal canonical forms are identical.</p>
1132			    <blockquote>
1133			      <p><em>Example:</em> &quot;IW-HEBR-u-ms-imperial&quot; ~ &quot;he-u-ms-uksystem&quot;</p>
1134			    </blockquote>
1135		<p>The equivalence relationship may change over time, such as when subtags are deprecated or likely subtag mappings change. For example, if two countries were to merge, then various subtags would become deprecated. These kinds of changes are generally very infrequent.</p>
1136
1137    <h3><a name="BCP_47_Conformance" href="#BCP_47_Conformance" id=
1138    "BCP_47_Conformance">3.3 BCP 47 Conformance</a></h3>
1139    <p>Unicode language and locale identifiers inherit the design
1140    and the repertoire of subtags from [<a href="#BCP47">BCP47</a>]
1141    Language Tags. There are some extensions and restrictions made
1142    for the use of the Unicode locale identifier in CLDR:</p>
1143    <ul>
1144      <li>It does not allow for the full syntax of [<a href=
1145      "#BCP47">BCP47</a>]:
1146        <ul>
1147          <li>No extlang subtags are allowed (as in the BCP 47
1148          canonical form, see BCP 47 <a href=
1149          "https://tools.ietf.org/html/bcp47#section-4.5">Section
1150          4.5</a> and <a href=
1151          "https://tools.ietf.org/html/bcp47#section-3.1.7" target=
1152          "_blank">Section 3.1.7</a>)</li>
1153          <li>No irregular BCP 47 legacy language tags
1154          (marked as “Type: grandfathered” in BCP 47) are allowed
1155          (these are all deprecated in BCP 47)</li>
1156          <li>A tag must not start with the subtag "x": thus a
1157          <em>privateuse</em> (eg x-abc) can only be after a
1158          language subtag, like "und"</li>
1159        </ul>
1160      </li>
1161      <li>It allows for certain semantic additions and constraints:
1162        <ul>
1163          <li>Certain codes that are private-use in BCP-47 and ISO
1164          are given semantics by LDML</li>
1165          <li>Each macrolanguage has an identified primary
1166          encompassed language, which is treated as an alias for
1167          the macrolanguage, and thus is replaced when
1168          canonicalizing (as allowed by BCP 47, see <a href=
1169          "https://tools.ietf.org/html/bcp47#section-4.1.2">Section
1170          4.1.2</a>)</li>
1171        </ul>
1172      </li>
1173      <li>It allows certain syntax for backwards compatibility (not
1174      BCP 47-compatible):
1175        <ul>
1176          <li>The "_" character for field separator characters, as
1177          well as the "-" used in [<a href="#BCP47">BCP47</a>]
1178          (however, the canonical form is with "-")</li>
1179          <li>The subtag "root" to indicate the generic locale used
1180          as the parent of all languages in the CLDR data model
1181          ("und" can be used instead)</li>
1182          <li>The language tag may begin with a script subtag
1183          rather than a language subtag. This is specialized use
1184          only, and not required for CLDR conformance.</li>
1185        </ul>
1186      </li>
1187    </ul>
1188    <p>There are thus two subtypes of Unicode locale
1189    identifiers:</p>
1190    <ul>
1191      <li>the term <em>Unicode CLDR locale identifier</em> applies
1192      where the backwards compatibility syntax is used.</li>
1193      <li>the term <em>Unicode BCP 47 locale identifier</em>
1194      applies otherwise. A <em>Unicode BCP 47 locale
1195      identifier</em> is also a valid BCP 47 language tag.</li>
1196    </ul>
1197    <h4><a name="BCP_47_Language_Tag_Conversion" href=
1198    "#BCP_47_Language_Tag_Conversion" id=
1199    "BCP_47_Language_Tag_Conversion">3.3.1 BCP 47 Language Tag
1200    Conversion</a></h4>
1201    <p>The different identifiers can be converted to one another as
1202    described in this section.</p>
1203    <h5><a name="Language_Tag_to_Locale_Identifier" href=
1204    "#Language_Tag_to_Locale_Identifier" id=
1205    "Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to
1206    Unicode BCP 47 Locale Identifier</a></h5>
1207    <p>A valid [<a href="#BCP47">BCP47</a>] language tag can be
1208    converted to a valid Unicode BCP 47 locale identifier according to <a href="#LocaleId_Canonicalization" >Annex C. LocaleId Canonicalization</a></p>
1209
1210    <p>The result is a Unicode BCP 47 locale identifier, in
1211    canonical form. It is both a BCP 47 language tag and a Unicode
1212    locale identifier. Because the process maps from all BCP 47
1213    language tags into a subset of BCP 47 language tags, the format
1214    changes are not reversible, much as a lowercase transformation
1215    of the string “McGowan” is not reversible.</p><br>
1216    <p><em>Examples</em></p>
1217    <table>
1218      <tr>
1219        <th style='width:10em'>BCP 47 language tag</th>
1220        <th style='width:10em'>Unicode BCP 47 locale
1221        identifier</th>
1222        <th>Comments</th>
1223      </tr>
1224      <tr>
1225        <td><code>en-US</code></td>
1226        <td><code>en-US</code></td>
1227        <td>no changes</td>
1228      </tr>
1229      <tr>
1230        <td><code>iw-FX</code></td>
1231        <td><code>he-FR</code></td>
1232        <td>BCP 47 canonicalization [1]</td>
1233      </tr>
1234      <tr>
1235        <td><code>cmn-TW</code></td>
1236        <td><code>zh-TW</code></td>
1237        <td>language alias [2]</td>
1238      </tr>
1239      <tr>
1240        <td><code>zh-cmn-TW</code></td>
1241        <td><code>zh-TW</code></td>
1242        <td>BCP 47 canonicalization [1], then language alias
1243        [2]</td>
1244      </tr>
1245      <tr>
1246        <td><code>sr-CS</code></td>
1247        <td><code>sr-RS</code></td>
1248        <td>territory alias [3]</td>
1249      </tr>
1250      <tr>
1251        <td><code>sh</code></td>
1252        <td><code>sr-Latn</code></td>
1253        <td>multiple replacement subtags [2.1]</td>
1254      </tr>
1255      <tr>
1256        <td><code>sh-Cyrl</code></td>
1257        <td><code>sr-Cyrl</code></td>
1258        <td>no replacement with multiple replacement subtags [2.1
1259        doesn't apply]</td>
1260      </tr>
1261      <tr>
1262        <td><code>hy-SU</code></td>
1263        <td><code>hy-AM</code></td>
1264        <td>multiple territory values [3.2]<br>
1265        <code>&lt;territoryAlias type="SU" replacement="RU AM AZ BY
1266        EE GE KZ KG LV LT MD TJ TM UA UZ" …/&gt;</code></td>
1267      </tr>
1268      <tr>
1269        <td><code>i-enochian</code></td>
1270        <td><code>und-x-i-enochian</code></td>
1271        <td>prefix any legacy language tags
1272          (marked as “Type: grandfathered” in BCP 47) with "und-x-" [4]</td>
1273      </tr>
1274      <tr>
1275        <td><code>x-abc</code></td>
1276        <td><code>und-x-abc</code></td>
1277        <td>prefix with "und-", so that there is always a base
1278        language subtag [5]</td>
1279      </tr>
1280    </table>
1281    <p>&nbsp;</p>
1282    <h5><a name="Unicode_Locale_Identifier_CLDR_to_BCP_47" href=
1283    "#Unicode_Locale_Identifier_CLDR_to_BCP_47" id=
1284    "Unicode_Locale_Identifier_CLDR_to_BCP_47">Unicode Locale
1285    Identifier: CLDR to BCP 47</a></h5>
1286    <p>A Unicode CLDR locale identifier can be converted to a valid
1287    [<a href="#BCP47">BCP47</a>] language tag (which is also a
1288    Unicode BCP 47 locale identifier) by performing the following
1289    transformation.</p>
1290    <ol>
1291      <li>Replace the "_" separators with "-"</li>
1292      <li>Replace the special language identifier "root" with the
1293      BCP 47 primary language tag "und"</li>
1294      <li>Add an initial "und" primary language subtag if the first
1295      subtag is a script.</li>
1296    </ol>
1297    <p><em>Examples:</em></p>
1298    <table>
1299      <tr>
1300        <th style='width:10em'>Unicode CLDR locale identifier</th>
1301        <th style='width:10em'>BCP 47 language tag</th>
1302        <th>Comments</th>
1303      </tr>
1304      <tr>
1305        <td><code>en_US</code></td>
1306        <td><code>en-US</code></td>
1307        <td>change separator [1]</td>
1308      </tr>
1309      <tr>
1310        <td><code>de_DE_u_co_phonebk</code></td>
1311        <td><code>de-DE-u-co-phonebk</code></td>
1312        <td>change separator [1]</td>
1313      </tr>
1314      <tr>
1315        <td><code>root</code></td>
1316        <td><code>und</code></td>
1317        <td>change to "und" [2]</td>
1318      </tr>
1319      <tr>
1320        <td><code>root_u_cu_usd</code></td>
1321        <td><code>und-u-cu-usd</code></td>
1322        <td>change to "und" [1, 2]</td>
1323      </tr>
1324      <tr>
1325        <td><code>Latn_DE</code></td>
1326        <td><code>und-Latn-DE</code></td>
1327        <td>add "und" [1, 3]</td>
1328      </tr>
1329    </table><br>
1330    <h5><a name="Unicode_Locale_Identifier_BCP_47_to_CLDR" href=
1331    "#Unicode_Locale_Identifier_BCP_47_to_CLDR" id=
1332    "Unicode_Locale_Identifier_BCP_47_to_CLDR">Unicode Locale
1333    Identifier: BCP 47 to CLDR</a></h5>
1334    <p>A Unicode BCP 47 locale identifier can be transformed into a
1335    Unicode CLDR locale identifier by performing the following
1336    transformation.</p>
1337    <ol>
1338      <li>the separator is changed to "_"</li>
1339      <li>the primary language subtag "und" is replaced with "root"
1340      if no script, region, or variant subtags are present.</li>
1341    </ol>
1342    <p><em>Examples:</em></p>
1343    <table>
1344      <tr>
1345        <th style='width:10em'>BCP 47 language tag</th>
1346        <th style='width:10em'>Unicode CLDR locale identifier</th>
1347        <th>Comments</th>
1348      </tr>
1349      <tr>
1350        <td><code>en-US</code></td>
1351        <td><code>en_US</code></td>
1352        <td>changes separator [1]</td>
1353      </tr>
1354      <tr>
1355        <td><code>und</code></td>
1356        <td><code>root</code></td>
1357        <td>changes to "root", because no script, region, or
1358        variant tag is present [2]</td>
1359      </tr>
1360      <tr>
1361        <td><code>und-US</code></td>
1362        <td><code>und_US</code></td>
1363        <td>no change to "und", because a region subtag is present
1364        [1]</td>
1365      </tr>
1366      <tr>
1367        <td nowrap><code>und-u-cu-USD</code></td>
1368        <td nowrap><code>root_u_cu_usd</code></td>
1369        <td>changes to "root", because no script, region, or
1370        variant tag is present [1, 2]</td>
1371      </tr>
1372    </table>
1373    <h3><a name="Field_Definitions" href="#Field_Definitions" id=
1374    "Field_Definitions">3.4 Language Identifier Field
1375    Definitions</a></h3>
1376    <p>Unicode language and locale identifier field values are
1377    provided in the following table. Note that some private-use BCP
1378    47 field values are given specific meanings in CLDR. While
1379    field values are based on [<a href="#BCP47">BCP47</a>] subtag
1380    values, their validity status in CLDR is specified by means of
1381    machine-readable files in the <a href=
1382    'https://github.com/unicode-org/cldr/releases/tag/latest/common/validity/'>common/validity/</a>
1383    subdirectory, such as language.xml. For the format of those
1384    files and more information, see <em><a href=
1385    '#Validity_Data'>Section 3.11 Validity Data</a></em>.</p>
1386    <table>
1387      <caption>
1388        <a name="Language_Locale_Field_Definitions" href=
1389        "#Language_Locale_Field_Definitions" id=
1390        "Language_Locale_Field_Definitions">Language Identifier
1391        Field Definitions</a>
1392      </caption>
1393      <tr>
1394        <th>Field</th>
1395        <th>Valid values</th>
1396      </tr>
1397      <tr>
1398        <td>
1399          <a href="#unicode_language_subtag_validity" name=
1400          "unicode_language_subtag_validity" id=
1401          "unicode_language_subtag_validity">unicode_language_subtag</a>
1402          <p>(also known as a <i>Unicode base language
1403          code)</i></p>
1404        </td>
1405        <td>
1406          Subtags in the language.xml file (see <em>Section 3.11
1407          <a href="#Validity_Data">Validity Data</a></em> ). These
1408          are based on [<a href="#BCP47">BCP47</a>] subtag values
1409          marked as <b>Type: language</b>
1410          <p>ISO 639-3 introduces the notion of "macrolanguages",
1411          where certain ISO 639-1 or ISO 639-2 codes are given
1412          broad semantics, and additional codes are given for the
1413          narrower semantics. For backwards compatibility, Unicode
1414          language identifiers retain use of the narrower semantics
1415          for these codes. For example:</p>
1416          <table border="1" cellspacing="0" cellpadding="2" style=
1417          "margin: 0.5em">
1418            <tr>
1419              <th>For</th>
1420              <th>Use</th>
1421              <th><i>Not</i></th>
1422            </tr>
1423            <tr>
1424              <td>Standard Chinese (Mandarin)</td>
1425              <td><code>zh</code></td>
1426              <td><code>cmn</code></td>
1427            </tr>
1428            <tr>
1429              <td>Standard Arabic</td>
1430              <td><code>ar</code></td>
1431              <td><code>arb</code></td>
1432            </tr>
1433            <tr>
1434              <td>Standard Malay</td>
1435              <td><code>ms</code></td>
1436              <td><code>zsm</code></td>
1437            </tr>
1438            <tr>
1439              <td>Standard Swahili</td>
1440              <td><code>sw</code></td>
1441              <td><code>swh</code></td>
1442            </tr>
1443            <tr>
1444              <td>Standard Uzbek</td>
1445              <td><code>uz</code></td>
1446              <td><code>uzn</code></td>
1447            </tr>
1448            <tr>
1449              <td>Standard Konkani</td>
1450              <td><code>kok</code></td>
1451              <td><code>knn</code></td>
1452            </tr>
1453            <tr>
1454              <td>Northern Kurdish</td>
1455              <td><code>ku</code></td>
1456              <td><code>kmr</code></td>
1457            </tr>
1458          </table>
1459          <p>If a language subtag matches the type attribute of a
1460          languageAlias element, then the replacement value is used
1461          instead. For example, because "swh" occurs in
1462          <tt>&lt;languageAlias type="swh"
1463          replacement="sw"/&gt;</tt> , "sw" must be used instead of
1464          "swh". Thus Unicode language identifiers use "ar-EG" for
1465          Standard Arabic (Egypt), not "arb-EG"; they use "zh-TW"
1466          for Mandarin Chinese (Taiwan), not "cmn-TW".</p>
1467          <p>The private use codes listed as
1468          <strong>excluded</strong> in <em>Section 3.5.3 <a href=
1469          "#Private_Use_Codes">Private Use Codes</a></em> will never be
1470          given specific semantics in Unicode identifiers, and are
1471          thus safe for use for other purposes by other
1472          applications.</p>
1473          <p>The CLDR provides data for normalizing language/locale
1474          codes, including mapping overlong codes like "eng-840" or
1475          "eng-USA" to the correct code "en-US"; see the
1476          <strong><a href=
1477          "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/aliases.html">
1478          Aliases</a></strong> Chart.</p>
1479          <p>The following are special language subtags:</p>
1480          <table class="simple" border="1" cellspacing="0"
1481          cellpadding="2">
1482            <tr>
1483              <td>&nbsp;</td>
1484              <td><strong>Name</strong></td>
1485              <td><strong>Comment</strong></td>
1486            </tr>
1487            <tr>
1488              <td><code>mis</code></td>
1489              <td>Uncoded languages</td>
1490              <td>The content is in a language that doesn't yet
1491              have an ISO 639 code.</td>
1492            </tr>
1493            <tr>
1494              <td><code>mul</code></td>
1495              <td>Multiple languages</td>
1496              <td>The content contains more than one language or
1497              text that is simultaneously in multiple languages
1498              (such as brand names).</td>
1499            </tr>
1500            <tr>
1501              <td><code>zxx</code></td>
1502              <td>No linguistic content</td>
1503              <td>The content is not in any particular languages
1504              (such as images, symbols, etc.)</td>
1505            </tr>
1506          </table>
1507        </td>
1508      </tr>
1509      <tr>
1510        <td>
1511          <a href="#unicode_script_subtag_validity" name=
1512          "unicode_script_subtag_validity" id=
1513          "unicode_script_subtag_validity">unicode_script_subtag</a>
1514          <p>(also known as a <i>Unicode script code)</i></p>
1515        </td>
1516        <td>
1517          Subtags in the script.xml file (see <em>Section 3.11
1518          <a href="#Validity_Data">Validity Data</a></em>). These
1519          are based on [<a href="#BCP47">BCP47</a>] subtag values
1520          marked as <b>Type: script</b>
1521          <p>In most cases the script is not necessary, since the
1522          language is only customarily written in a single script.
1523          Examples of cases where it is used are:</p>
1524          <table border="1" cellspacing="0" cellpadding="2" style=
1525          "margin: 0.5em">
1526            <tr>
1527              <td><code>az_Arab</code></td>
1528              <td>Azerbaijani in Arabic script</td>
1529            </tr>
1530            <tr>
1531              <td><code>az_Cyrl</code></td>
1532              <td>Azerbaijani in Cyrillic script</td>
1533            </tr>
1534            <tr>
1535              <td><code>az_Latn</code></td>
1536              <td>Azerbaijani in Latin script</td>
1537            </tr>
1538            <tr>
1539              <td><code>zh_Hans</code></td>
1540              <td>Chinese, in simplified script (=zh, zh-Hans,
1541              zh-CN, zh-Hans-CN)</td>
1542            </tr>
1543            <tr>
1544              <td><code>zh_Hant</code></td>
1545              <td>Chinese, in traditional script</td>
1546            </tr>
1547          </table>
1548          <p>Unicode identifiers give specific semantics to certain
1549          Unicode Script values. For more information, see also
1550          [<a href=
1551          "https://www.unicode.org/reports/tr41/#UAX24">UAX24</a>]:</p>
1552          <table cellspacing="0" cellpadding="2" border="1" style=
1553          "margin: 0.5em">
1554            <tr>
1555              <td><code>Qaag</code></td>
1556              <td>Zawgyi</td>
1557              <td colspan="2">Qaag is a special script code for
1558              identifying the non-standard use of Myanmar
1559              characters for display with the Zawgyi font. The
1560              purpose of the code is to enable migration to
1561              standard, interoperable use of Unicode by providing
1562              an identifier for Zawgyi for tagging text,
1563              applications, input methods, font tables,
1564              transformations, and other mechanisms used for
1565              migration.</td>
1566            </tr>
1567            <tr>
1568              <td><code>Qaai</code></td>
1569              <td>Inherited</td>
1570              <td colspan="2"><strong>deprecated</strong>: the
1571              <em>canonicalized</em> form is Zinh</td>
1572            </tr>
1573            <tr>
1574              <td><code>Zinh</code></td>
1575              <td>Inherited</td>
1576              <td colspan="2">&nbsp;</td>
1577            </tr>
1578            <tr>
1579              <td><code>Zsye</code></td>
1580              <td>Emoji Style</td>
1581              <td colspan="2">Prefer emoji style for characters
1582              that have both text and emoji styles available.</td>
1583            </tr>
1584            <tr>
1585              <td><code>Zsym</code></td>
1586              <td>Text Style</td>
1587              <td colspan="2">Prefer text style for characters that
1588              have both text and emoji styles available.</td>
1589            </tr>
1590            <tr>
1591              <td rowspan="7"><code>Zxxx</code></td>
1592              <td rowspan="7">Unwritten</td>
1593              <td colspan="2">Indicates spoken or otherwise
1594              unwritten content. For example:</td>
1595            </tr>
1596            <tr>
1597              <th>Sample(s)</th>
1598              <th>Description</th>
1599            </tr>
1600            <tr>
1601              <td>uz</td>
1602              <td>either written or spoken content</td>
1603            </tr>
1604            <tr>
1605              <td>uz-Latn <em>or</em> uz-Arab</td>
1606              <td>written-only content (particular script)</td>
1607            </tr>
1608            <tr>
1609              <td>uz-Zyyy</td>
1610              <td>written-only content (unspecified script)</td>
1611            </tr>
1612            <tr>
1613              <td>uz-Zxxx</td>
1614              <td>spoken-only content</td>
1615            </tr>
1616            <tr>
1617              <td>uz-Latn, uz-Zxxx</td>
1618              <td>both specific written and spoken content (using a
1619              <em>language list</em>)</td>
1620            </tr>
1621            <tr>
1622              <td><code>Zyyy</code></td>
1623              <td>Common</td>
1624              <td colspan="2">&nbsp;</td>
1625            </tr>
1626            <tr>
1627              <td><code>Zzzz</code></td>
1628              <td>Unknown</td>
1629              <td colspan="2">&nbsp;</td>
1630            </tr>
1631          </table>
1632          <p>The private use subtags listed as
1633          <strong>excluded</strong> in <em>Section 3.5.3 <a href=
1634          "#Private_Use_Codes">Private Use Codes</a></em> will never be
1635          given specific semantics in Unicode identifiers, and are
1636          thus safe for use for other purposes by other
1637          applications.</p>
1638        </td>
1639      </tr>
1640      <tr>
1641        <td>
1642          <a href="#unicode_region_subtag_validity" name=
1643          "unicode_region_subtag_validity" id=
1644          "unicode_region_subtag_validity">unicode_region_subtag</a>
1645          <p>(also known as a <i>Unicode region code,</i> or <i>a
1646          Unicode territory code)</i></p>
1647        </td>
1648        <td>
1649          Subtags in the region.xml file (see <em>Section 3.11
1650          <a href="#Validity_Data">Validity Data</a></em>). These
1651          are based on [<a href="#BCP47">BCP47</a>] subtag values
1652          marked as <b>Type: region</b>
1653          <p>Unicode identifiers give specific semantics to the
1654          following subtags:</p>
1655          <table border="1" cellspacing="0" cellpadding="2">
1656            <tr>
1657              <td>&nbsp;</td>
1658              <td><strong>Name</strong></td>
1659              <td><strong>Comment</strong></td>
1660              <td><strong>ISO 3166-1 status</strong></td>
1661            </tr>
1662            <tr>
1663              <td><code>QO</code></td>
1664              <td>Outlying Oceania</td>
1665              <td>countries in Oceania [009] that do not have a
1666              <a href=
1667              "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html">
1668              subcontinent</a>.</td>
1669              <td>private use</td>
1670            </tr>
1671            <tr>
1672              <td><code>QU</code></td>
1673              <td>European Union</td>
1674              <td><strong>deprecated</strong>: the
1675              <em>canonicalized</em> form is EU</td>
1676              <td>private use</td>
1677            </tr>
1678            <tr>
1679              <td><code>UK</code></td>
1680              <td>United Kingdom</td>
1681              <td><strong>deprecated</strong>: the
1682              <em>canonicalized</em> form is GB</td>
1683              <td>exceptionally reserved</td>
1684            </tr>
1685            <tr>
1686              <td><code>XA</code></td>
1687              <td>Pseudo-Accents</td>
1688              <td>special code indicating derived testing locale
1689              with English + added accents and lengthened</td>
1690              <td>private use</td>
1691            </tr>
1692            <tr>
1693              <td><code>XB</code></td>
1694              <td>Pseudo-Bidi</td>
1695              <td>special code indicating derived testing locale
1696              with forced RTL English</td>
1697              <td>private use</td>
1698            </tr>
1699            <tr>
1700              <td><code>XK</code></td>
1701              <td>Kosovo</td>
1702              <td>industry practice</td>
1703              <td>private use</td>
1704            </tr>
1705            <tr>
1706              <td><code>ZZ</code></td>
1707              <td>Unknown or Invalid Territory</td>
1708              <td>used in APIs or as replacement for invalid
1709              code</td>
1710              <td>private use</td>
1711            </tr>
1712          </table>
1713          <p>The private use subtags listed as
1714          <strong>excluded</strong> in <em>Section 3.5.3 <a href=
1715          "#Private_Use_Codes">Private Use Codes</a></em> will normally
1716          never be given specific semantics in Unicode identifiers,
1717          and are thus safe for use for other purposes by other
1718          applications. However, LDML may follow widespread
1719          industry practice in the use of some of these codes, such
1720          as for XK.</p>
1721          <p>The CLDR provides data for normalizing
1722          territory/region codes, including mapping overlong codes
1723          like "eng-840" or "eng-USA" to the correct code
1724          "en-US".</p>
1725          <p>Special Codes:</p>
1726          <ul>
1727            <li>The territory code 'UK' has a special status in
1728            ISO, and is used for the domain name instead of GB. It
1729            is thus recognized by CLDR as being an alternate
1730            (unnormalized) form of 'GB'.</li>
1731            <li>The territory code '001' (the World) is used to
1732            indicate a standardized form, such as "ar-001" for
1733            Modern Standard Arabic.</li>
1734          </ul>
1735        </td>
1736      </tr>
1737      <tr>
1738        <td>
1739          <a href="#unicode_variant_subtag_validity" name=
1740          "unicode_variant_subtag_validity" id=
1741          "unicode_variant_subtag_validity">unicode_variant_subtag</a>
1742          <p>(also known as a <i>Unicode language variant
1743          code)</i></p>
1744        </td>
1745        <td>
1746          Subtags in the variant.xml file (see <em>Section 3.11
1747          <a href="#Validity_Data">Validity Data</a></em> ). These
1748          are based on [<a href="#BCP47">BCP47</a>] subtag values
1749          marked as <b>Type: variant</b>
1750          <p>CLDR provides data for normalizing variant codes.
1751          About handling of the "POSIX" variant see <i>Section
1752          3.8.2, <a href="#Legacy_Variants">Legacy
1753          Variants</a></i>.</p>
1754        </td>
1755      </tr>
1756    </table>
1757    <p><i>Examples:</i></p>
1758    <blockquote>
1759      <pre>en
1760fr_BE
1761zh-Hant-HK</pre>
1762    </blockquote>
1763    <p><em>Deprecated</em> codes—such as QU above—are valid, but
1764    strongly discouraged.</p>
1765    <p>A locale that only has a language subtag (and optionally a
1766    script subtag) is called a <i>language locale</i>; one with
1767    both language and territory subtag is called a <i>territory
1768    locale</i> (or <i>country locale</i>).</p>
1769    <h3><a name="Special_Codes" href="#Special_Codes" id=
1770    "Special_Codes">3.5 Special Codes</a></h3>
1771    <h4><a name="Unknown_or_Invalid_Identifiers" href=
1772    "#Unknown_or_Invalid_Identifiers" id=
1773    "Unknown_or_Invalid_Identifiers">3.5.1 Unknown or Invalid
1774    Identifiers</a></h4>
1775    <p>The following identifiers are used to indicate an unknown or
1776    invalid code in Unicode language and locale identifiers. For
1777    Unicode identifiers, the region code uses a private use ISO
1778    3166 code, and Time Zone code uses an additional code; the
1779    others are defined by the relevant standards. When these codes
1780    are used in APIs connected with Unicode identifiers, the
1781    meaning is that either there was no identifier available, or
1782    that at some point an input identifier value was determined to
1783    be invalid or ill-formed.</p>
1784    <table border="1" cellspacing="0" cellpadding="4" style=
1785    "margin-top: 0.5em; margin-bottom: 0.5em" id="table4">
1786      <tr>
1787        <th>Code Type</th>
1788        <th>Value</th>
1789        <th>Description in Referenced Standards</th>
1790      </tr>
1791      <tr>
1792        <td>Language</td>
1793        <td><code>und</code></td>
1794        <td>Undetermined language, also used for “root”</td>
1795      </tr>
1796      <tr>
1797        <td>Script</td>
1798        <td><code>Zzzz</code></td>
1799        <td>Code for uncoded script, Unknown [<a href=
1800        "https://www.unicode.org/reports/tr41/#UAX24">UAX24</a>]</td>
1801      </tr>
1802      <tr>
1803        <td>Region&nbsp;&nbsp;</td>
1804        <td><code>ZZ</code></td>
1805        <td>Unknown or Invalid Territory</td>
1806      </tr>
1807      <tr>
1808        <td>Currency</td>
1809        <td><code>XXX</code></td>
1810        <td>The codes assigned for transactions where no currency
1811        is involved</td>
1812      </tr>
1813      <tr>
1814        <td>Time Zone</td>
1815        <td><code>unk</code></td>
1816        <td>Unknown or Invalid Time Zone</td>
1817      </tr>
1818      <tr>
1819        <td>Subdivision</td>
1820        <td><em>&lt;region&gt;</em>zzzz</td>
1821        <td>Unknown or Invalid Subdivision</td>
1822      </tr>
1823    </table>
1824    <p>When only the script or region are known, then a locale ID
1825    will use "und" as the language subtag portion. Thus the locale
1826    tag "und_Grek" represents the Greek script; "und_US" represents
1827    the US territory.</p>
1828    <h4><a name="Numeric_Codes" href="#Numeric_Codes" id=
1829    "Numeric_Codes">3.5.2 Numeric Codes</a></h4>
1830    <p>For region codes, ISO and the UN establish a mapping to
1831    three-letter codes and numeric codes. However, this does not
1832    extend to the private use codes, which are the codes 900-999
1833    (total: 100), and AAA, QMA-QZZ, XAA-XZZ, and ZZZ (total: 1092).
1834    Unicode identifiers supply a standard mapping to these: for the
1835    numeric codes, it uses the top of the numeric private use
1836    range; for the 3-letter codes it doubles the final letter.
1837    These are the resulting mappings for all of the private use
1838    region codes:</p>
1839    <table border="1" cellspacing="0" cellpadding="4" style=
1840    "margin-top: 0.5em; margin-bottom: 0.5em" id="table19">
1841      <tr>
1842        <th>Region</th>
1843        <th>UN/ISO Numeric</th>
1844        <th>ISO 3-Letter</th>
1845      </tr>
1846      <tr>
1847        <td><code>AA</code></td>
1848        <td><code>958</code></td>
1849        <td><code>AAA</code></td>
1850      </tr>
1851      <tr>
1852        <td><code>QM..QZ</code></td>
1853        <td><code>959..972</code></td>
1854        <td><code>QMM..QZZ</code></td>
1855      </tr>
1856      <tr>
1857        <td><code>XA..XZ</code></td>
1858        <td><code>973..998</code></td>
1859        <td><code>XAA..XZZ</code></td>
1860      </tr>
1861      <tr>
1862        <td><code>ZZ</code></td>
1863        <td><code>999</code></td>
1864        <td><code>ZZZ</code></td>
1865      </tr>
1866    </table>
1867    <p>For script codes, ISO 15924 supplies a mapping (however, the
1868    numeric codes are not in common use):</p>
1869    <table border="1" cellspacing="0" cellpadding="4" style=
1870    "margin-top: 0.5em; margin-bottom: 0.5em" id="table21">
1871      <tr>
1872        <th>Script</th>
1873        <th>Numeric</th>
1874      </tr>
1875      <tr>
1876        <td><code>Qaaa..Qabx</code></td>
1877        <td><code>900..949</code></td>
1878      </tr>
1879    </table><br>
1880    <h4>3.5.3 <a name="Private_Use_Codes" href="#Private_Use_Codes" id=
1881    "Private_Use_Codes">Private Use Codes</a></h4>
1882    <p>Private use codes fall into three groups.</p>
1883    <ul>
1884      <li><strong>defined:</strong> those that are given particular
1885      semantics currently in CLDR</li>
1886      <li><strong>reserved:</strong> those that may be given
1887      particular semantics in future versions of CLDR</li>
1888      <li><strong>excluded:</strong> those that will never be given
1889      particular CLDR semantics in the future, and thus can
1890      normally be used by applications without worrying about
1891      collisions. However, CLDR may follow widespread industry
1892      practice in the use of some of these codes, such as for XA,
1893      XB, and XK.</li>
1894    </ul>
1895    <table>
1896      <caption>
1897        <a name="Private_Use_CLDR" href="#Private_Use_CLDR" id=
1898        "Private_Use_CLDR">Private Use Codes in CLDR</a>
1899      </caption>
1900      <tr>
1901        <th>category</th>
1902        <th>status</th>
1903        <th>codes</th>
1904      </tr>
1905      <tr>
1906        <td rowspan="3">base language</td>
1907        <td>defined</td>
1908        <td>none</td>
1909      </tr>
1910      <tr>
1911        <td>reserved</td>
1912        <td>qaa..qfy</td>
1913      </tr>
1914      <tr>
1915        <td>excluded</td>
1916        <td>qfz..qtz</td>
1917      </tr>
1918      <tr>
1919        <td rowspan="3">script</td>
1920        <td>defined</td>
1921        <td>Qaai (obsolete), Qaag</td>
1922      </tr>
1923      <tr>
1924        <td>reserved</td>
1925        <td>Qaaa..Qaaf Qaah Qaaj..Qaap</td>
1926      </tr>
1927      <tr>
1928        <td>excluded</td>
1929        <td>Qaaq..Qabx</td>
1930      </tr>
1931      <tr>
1932        <td rowspan="3">region</td>
1933        <td>defined</td>
1934        <td>QO, QU, UK, XA, XB, XK, ZZ</td>
1935      </tr>
1936      <tr>
1937        <td>reserved</td>
1938        <td>AA QM..QN QP..QT QV..QZ</td>
1939      </tr>
1940      <tr>
1941        <td>excluded</td>
1942        <td>XC..XJ, XL..XZ</td>
1943      </tr>
1944      <tr>
1945        <td rowspan="3">timezone</td>
1946        <td>defined</td>
1947        <td>IANA: Etc/Unknown<br>
1948        bcp47: as listed in bcp47/timezone.xml</td>
1949      </tr>
1950      <tr>
1951        <td>reserved</td>
1952        <td>bcp47: all non-5 letter codes not starting with x</td>
1953      </tr>
1954      <tr>
1955        <td>excluded</td>
1956        <td>bcp47: all non-5 letter codes starting with x</td>
1957      </tr>
1958    </table>
1959    <p>See also <em>Section 3.5.1 <a href=
1960    "#Unknown_or_Invalid_Identifiers">Unknown or Invalid
1961    Identifiers</a></em>.</p>
1962    <h3><a name="Locale_Extension_Key_and_Type_Data" id=
1963    "Locale_Extension_Key_and_Type_Data"></a><a name="u_Extension"
1964    href="#u_Extension" id="u_Extension">3.6 Unicode BCP 47 U
1965    Extension</a></h3>
1966    <p>[<a href="#BCP47">BCP47</a>] Language Tags provides a
1967    mechanism for extending language tags for use in various
1968    applications by extension subtags. Each extension subtag is
1969    identified by a single alphanumeric character subtag assigned
1970    by IANA.</p>
1971    <p>The Unicode Consortium has registered and is the maintaining
1972    authority for two BCP 47 language tag extensions: the extension
1973    'u' for Unicode locale extension [<a href=
1974    "#RFC6067">RFC6067</a>] and extension 't' for transformed
1975    content [<a href="#RFC6497">RFC6497</a>]. The Unicode BCP 47
1976    extension data defines the complete list of valid subtags.</p>
1977    <p>These subtags are all in lowercase (that is the canonical
1978    casing for these subtags), however, subtags are
1979    case-insensitive and casing does not carry any specific
1980    meaning. All subtags within the Unicode extensions are
1981    alphanumeric characters in length of two to eight that meet the
1982    rule <code>extension</code> in the [<a href=
1983    "#BCP47">BCP47</a>]</p>
1984    <p><strong>The -u- Extension.</strong> The syntax of 'u'
1985    extension subtags is defined by the rule
1986    <code>unicode_locale_extensions</code> in <a href=
1987    "#Unicode_locale_identifier">Section 3.2 Unicode locale
1988    identifier</a>, except the separator of subtags
1989    <code>sep</code> must be always hyphen '-' when the extension
1990    is used as a part of BCP 47 language tag.</p>
1991    <p>A 'u' extension may contain multiple <code>attribute</code>
1992    s or <code>keyword</code> s as defined in <a href=
1993    "#Unicode_locale_identifier">Section 3.2 Unicode locale
1994		identifier</a>. The canonical syntax is defined as in <a href="#Canonical_Unicode_Locale_Identifiers">3.2.1 Canonical Unicode Locale Identifiers</a>.    </p>
1995    <p><em>See also <a href=
1996    "http://cldr.unicode.org/index/bcp47-extension">Unicode
1997    Extensions for BCP 47</a> on the CLDR site.</em></p>
1998    <h4><a href="#Key_And_Type_Definitions_" name=
1999    "Key_And_Type_Definitions_" id=
2000    "Key_And_Type_Definitions_">3.6.1 Key And Type
2001    Definitions</a></h4>
2002    <p>The following chart contains a set of U extension key values
2003    that are currently available, with a description or sampling of
2004    the U extension type values. Each category is associated with
2005    an XML file in the bcp47 directory.</p>
2006    <p>For the complete list of valid keys and types defined for
2007    Unicode locale extensions, see <a href=
2008    "#Unicode_Locale_Extension_Data_Files">Section 3.6.4 U
2009    Extension Data Files</a>. For information on the process for
2010    adding new <i>key</i>/<i>type</i>, see [<a href=
2011    "#localeProject">LocaleProject</a>].</p>
2012    <p>Most type values are represented by a single subtag in the
2013    current version of CLDR. There are exceptions, such as types
2014    used for key "ca" (calendar) and "kr" (collation reordering).
2015    If the type is not included, then the type value "true" is
2016    assumed. Note that the default for key with a possible "true"
2017    value is often "false", but may not always be. Note also that
2018    "true"/"True" is not a valid script code, since <a href=
2019    "https://www.unicode.org/iso15924/codelists.html">the ISO 15924
2020    Registration Authority has exceptionally reserved it</a>, which
2021    means that it will not be assigned for any purpose.</p>
2022    <p>The BCP 47 form for keys and types is the canonical form,
2023    and recommended. Other aliases are included for backwards
2024    compatibility.</p>
2025    <table>
2026      <caption>
2027        <a name="Key_Type_Definitions" href="#Key_Type_Definitions"
2028        id="Key_Type_Definitions">Key/Type Definitions</a>
2029      </caption>
2030      <tr>
2031        <th>key<br>
2032        (old key name)</th>
2033        <th>key description</th>
2034        <th>example type<br>
2035        (old type name)</th>
2036        <th>type description</th>
2037      </tr>
2038      <tr>
2039        <td colspan="4"><strong>A <a href=
2040        "#UnicodeCalendarIdentifier" name=
2041        "UnicodeCalendarIdentifier" id=
2042        "UnicodeCalendarIdentifier">Unicode Calendar Identifier</a>
2043        defines a type of calendar. The valid values are those
2044        <em>name</em> attribute values in the <em>type</em>
2045        elements of key name="ca" in bcp47/<a target="_blank" href=
2046        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td>
2047      </tr>
2048      <tr>
2049        <td rowspan="10">"ca"<br>
2050        (calendar)</td>
2051        <td rowspan="10">Calendar algorithm<br>
2052        <br>
2053        <i>(For information on the calendar algorithms associated
2054        with the data used with these, see [<a href=
2055        "#Calendars">Calendars</a>].)</i></td>
2056        <td>"buddhist"</td>
2057        <td>Thai Buddhist calendar (same as Gregorian except for
2058        the year)</td>
2059      </tr>
2060      <tr>
2061        <td>"chinese"</td>
2062        <td>Traditional Chinese calendar</td>
2063      </tr>
2064      <tr>
2065        <td colspan="2">…</td>
2066      </tr>
2067      <tr>
2068        <td>"gregory"<br>
2069        (gregorian)</td>
2070        <td>Gregorian calendar</td>
2071      </tr>
2072      <tr>
2073        <td colspan="2">…</td>
2074      </tr>
2075      <tr>
2076        <td>"islamic"</td>
2077        <td>Islamic calendar</td>
2078      </tr>
2079      <tr>
2080        <td>"islamic-civil"</td>
2081        <td>Islamic calendar, tabular (intercalary years
2082        [2,5,7,10,13,16,18,21,24,26,29] - civil epoch)</td>
2083      </tr>
2084      <tr>
2085        <td>"islamic-umalqura"</td>
2086        <td>Islamic calendar, Umm al-Qura</td>
2087      </tr>
2088      <tr>
2089        <td colspan="2">…</td>
2090      </tr>
2091      <tr>
2092        <td colspan="2"><b>Note:</b> <i>Some calendar types are
2093        represented by two subtags. In such cases, the first subtag
2094        specifies a generic calendar type and the second subtag
2095        specifies a calendar algorithm variant. The CLDR uses
2096        generic calendar types (single subtag types) for tagging
2097        data when calendar algorithm variations within a generic
2098        calendar type are irrelevant. For example, type "islamic"
2099        is used for specifying Islamic calendar formatting data for
2100        all Islamic calendar types, including "islamic-civil" and
2101        "islamic-umalqura".</i></td>
2102      </tr>
2103      <tr>
2104        <td colspan="4"><strong>A <a href=
2105        "#UnicodeCurrencyFormatIdentifier" name=
2106        "UnicodeCurrencyFormatIdentifier" id=
2107        "UnicodeCurrencyFormatIdentifier">Unicode Currency Format
2108        Identifier</a> defines a style for currency formatting. The
2109        valid values are those <em>name</em> attribute values in
2110        the <em>type</em> elements of key name="cf" in
2111        bcp47/<a target="_blank" href=
2112        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/currency.xml">currency.xml</a></strong>.</td>
2113      </tr>
2114      <tr>
2115        <td rowspan="2">"cf"</td>
2116        <td rowspan="2">Currency Format style</td>
2117        <td>"standard"</td>
2118        <td>Negative numbers use the minusSign symbol (the
2119        default).</td>
2120      </tr>
2121      <tr>
2122        <td>"account"</td>
2123        <td>Negative numbers use parentheses or equivalent.</td>
2124      </tr>
2125      <tr>
2126        <td colspan="4"><strong>A <a href=
2127        "#UnicodeCollationIdentifier" name=
2128        "UnicodeCollationIdentifier" id=
2129        "UnicodeCollationIdentifier">Unicode Collation
2130        Identifier</a> defines a type of collation (sort order).
2131        The valid values are those <em>name</em> attribute values
2132        in the <em>type</em> elements of bcp47/<a target="_blank"
2133        href=
2134        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/collation.xml">collation.xml</a></strong>.</td>
2135      </tr>
2136      <tr>
2137        <td colspan="4"><i>For information on each collation
2138        setting parameter, from <strong>ka</strong> to
2139        <strong>vt</strong>, see <a href=
2140        "tr35-collation.html#Setting_Options">Setting
2141        Options</a></i></td>
2142      </tr>
2143      <tr>
2144        <td rowspan="9">"co"<br>
2145        (collation)</td>
2146        <td rowspan="9">Collation type</td>
2147        <td>"standard"</td>
2148        <td>The default ordering for each language. For root it is
2149        based on the [<a href="#DUCET">DUCET</a>] (Default Unicode
2150        Collation Element Table): see <em><a href=
2151        "tr35-collation.html#Root_Collation">Root
2152        Collation</a></em>. Each other locale is based on that,
2153        except for appropriate modifications to certain characters
2154        for that language.</td>
2155      </tr>
2156      <tr>
2157        <td>"search"</td>
2158        <td>A special collation type dedicated for string search—it
2159        is not used to determine the relative order of two strings,
2160        but only to determine whether they should be considered
2161        equivalent for the specified strength, using the string
2162        search matching rules appropriate for the language.
2163        Compared to the normal collator for the language, this may
2164        add or remove primary equivalences, may make additional
2165        characters ignorable or change secondary equivalences, and
2166        may modify contractions to allow matching within them,
2167        depending on the desired behavior. For example, in Czech,
2168        the distinction between ‘a’ and ‘á’ is secondary for normal
2169        collation, but primary for search; a search for ‘a’ should
2170        never match ‘á’ and vice versa. A search collator is
2171        normally used with strength set to PRIMARY or SECONDARY
2172        (should be SECONDARY if using “asymmetric” search as
2173        described in the [<a href=
2174        "https://www.unicode.org/reports/tr41/#UTS10">UCA</a>]
2175        section Asymmetric Search). The search collator in root
2176        supplies matching rules that are appropriate for most
2177        languages (and which are different than the root collation
2178        behavior); language-specific search collators may be
2179        provided to override the matching rules for a given
2180        language as necessary.</td>
2181      </tr>
2182      <tr>
2183        <td colspan="2">
2184          <p>Other keywords provide additional choices for certain
2185          locales; <i>they only have effect in certain
2186          locales.</i></p>
2187        </td>
2188      </tr>
2189      <tr>
2190        <td colspan="2">…</td>
2191      </tr>
2192      <tr>
2193        <td>"phonetic"</td>
2194        <td>Requests a phonetic variant if available, where text is
2195        sorted based on pronunciation. It may interleave different
2196        scripts, if multiple scripts are in common use.</td>
2197      </tr>
2198      <tr>
2199        <td>"pinyin"</td>
2200        <td>Pinyin ordering for Latin and for CJK characters; that
2201        is, an ordering for CJK characters based on a
2202        character-by-character transliteration into a pinyin. (used
2203        in Chinese)</td>
2204      </tr>
2205      <tr>
2206        <td>"reformed"</td>
2207        <td>Reformed collation (such as in Swedish)</td>
2208      </tr>
2209      <tr>
2210        <td>"searchjl"</td>
2211        <td>Special collation type for a modified string search in
2212        which a pattern consisting of a sequence of Hangul initial
2213        consonants (jamo lead consonants) will match a sequence of
2214        Hangul syllable characters whose initial consonants match
2215        the pattern. The jamo lead consonants can be represented
2216        using conjoining or compatibility jamo. This search
2217        collator is best used at SECONDARY strength with an
2218        "asymmetric" search as described in the [<a href=
2219        "https://www.unicode.org/reports/tr41/#UTS10">UCA</a>]
2220        section Asymmetric Search and obtained, for example, using
2221        ICU4C's usearch facility with attribute
2222        USEARCH_ELEMENT_COMPARISON set to value
2223        USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD; this ensures that
2224        a full Hangul syllable in the search pattern will only
2225        match the same syllable in the searched text (instead of
2226        matching any syllable with the same initial consonant),
2227        while a Hangul initial consonant in the search pattern will
2228        match any Hangul syllable in the searched text with the
2229        same initial consonant.</td>
2230      </tr>
2231      <tr>
2232        <td colspan="2">…</td>
2233      </tr>
2234      <tr>
2235        <td colspan="4"><strong>A <a href=
2236        "#UnicodeCurrencyIdentifier" name=
2237        "UnicodeCurrencyIdentifier" id=
2238        "UnicodeCurrencyIdentifier">Unicode Currency Identifier</a>
2239        defines a type of currency. The valid values are those
2240        <em>name</em> attribute values in the <em>type</em>
2241        elements of key name="cu" in bcp47/<a target="_blank" href=
2242        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/currency.xml">currency.xml</a>.</strong></td>
2243      </tr>
2244      <tr>
2245        <td>"cu"<br>
2246        (currency)</td>
2247        <td>Currency type</td>
2248        <td>
2249          <i>ISO 4217 code,</i>
2250          <p><i>plus others in common use</i></p>
2251        </td>
2252        <td>
2253          <p>Codes consisting of 3 ASCII letters that are or have
2254          been valid in ISO 4217, plus certain additional codes
2255          that are or have been in common use. The list of
2256          countries and time periods associated with each currency
2257          value is available in <a href=
2258          "tr35-numbers.html#Supplemental_Currency_Data">Supplemental
2259          Currency Data</a>, plus the default number of
2260          decimals.</p>
2261          <p>The XXX code is given a broader interpretation as
2262          <em>Unknown or Invalid Currency</em>.</p>
2263        </td>
2264      </tr>
2265      <tr>
2266        <td colspan="4"><strong>A <a href=
2267        "#UnicodeDictionaryBreakExclusionIdentifier" name=
2268        "UnicodeDictionaryBreakExclusionIdentifier" id=
2269        "UnicodeDictionaryBreakExclusionIdentifier">Unicode Dictionary Break Exclusion Identifier</a>
2270        specifies scripts to be excluded from dictionary-based text break (for words and lines).
2271        The valid values are of one or more items of type SCRIPT_CODE as specified in the
2272        <em>name</em> attribute value in the <em>type</em>
2273        element of key name="dx" in bcp47/<a target="_blank" href=
2274        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a>.</strong></td>
2275      </tr>
2276      <tr>
2277        <td>"dx"</td>
2278        <td>Dictionary break script exclusions</td>
2279        <td>
2280          <i><code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> values</i>
2281        </td>
2282        <td>
2283          <p>One or more items of type SCRIPT_CODE, which are valid
2284          <code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> values.</p>
2285          <p>The code Zyyy (Common) can be specified to exclude all scripts, in which case
2286          it should be the only SCRIPT_CODE value specified.</p>
2287        </td>
2288      </tr>
2289      <tr>
2290        <td colspan="4"><strong>A <a href=
2291        "#UnicodeEmojiPresentationStyleIdentifier" name=
2292        "UnicodeEmojiPresentationStyleIdentifier" id=
2293        "UnicodeEmojiPresentationStyleIdentifier">Unicode Emoji
2294        Presentation Style Identifier</a> specifies a request for
2295        the preferred emoji presentation style. This can be used as
2296        part of the value for an HTML lang attribute, for example
2297        <code>&lt;html lang="sr-Latn-u-em-emoji"&gt;</code>. The
2298        valid values are those <em>name</em> attribute values in
2299        the <em>type</em> elements of key name="em" in
2300        bcp47/<a target="_blank" href=
2301        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/variant.xml">variant.xml</a></strong>.</td>
2302      </tr>
2303      <tr>
2304        <td rowspan="3">"em"</td>
2305        <td rowspan="3">Emoji presentation style</td>
2306        <td>"emoji"</td>
2307        <td>Use an emoji presentation for emoji characters if
2308        possible.</td>
2309      </tr>
2310      <tr>
2311        <td>"text"</td>
2312        <td>Use a text presentation for emoji characters if
2313        possible.</td>
2314      </tr>
2315      <tr>
2316        <td>"default"</td>
2317        <td>Use the default presentation for emoji characters as
2318        specified in UTR #51 Section 4, <a href=
2319        "https://www.unicode.org/reports/tr51/#Presentation_Style">Presentation
2320        Style</a>.</td>
2321      </tr>
2322      <tr>
2323        <td colspan="4"><strong>A <a href=
2324        "#UnicodeFirstDayIdentifier" name=
2325        "UnicodeFirstDayIdentifier" id=
2326        "UnicodeFirstDayIdentifier">Unicode First Day
2327        Identifier</a> defines the preferred first day of the week
2328        for calendar display. Specifying "fw" in a locale
2329        identifier overrides the default value specified by
2330        supplemental week data (see Part 4 Dates, section 4.3
2331        <a href="tr35-dates.html#Week_Data">Week Data</a>). The
2332        valid values are those <em>name</em> attribute values in
2333        the <em>type</em> elements of key name="fw" in
2334        bcp47/<a target="_blank" href=
2335        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td>
2336      </tr>
2337      <tr>
2338        <td rowspan="4">"fw"</td>
2339        <td rowspan="4">First day of week</td>
2340        <td>"sun"</td>
2341        <td>Sunday</td>
2342      </tr>
2343      <tr>
2344        <td>"mon"</td>
2345        <td>Monday</td>
2346      </tr>
2347      <tr>
2348        <td colspan="2">…</td>
2349      </tr>
2350      <tr>
2351        <td>"sat"</td>
2352        <td>Saturday</td>
2353      </tr>
2354      <tr>
2355        <td colspan="4"><strong>A <a href=
2356        "#UnicodeHourCycleIdentifier" name=
2357        "UnicodeHourCycleIdentifier" id=
2358        "UnicodeHourCycleIdentifier">Unicode Hour Cycle
2359        Identifier</a> defines the preferred time cycle. Specifying
2360        "hc" in a locale identifier overrides the default value
2361        specified by supplemental time data (see Part 4 Dates,
2362        section 4.4 <a href="tr35-dates.html#Time_Data">Time
2363        Data</a>). The valid values are those <em>name</em>
2364        attribute values in the <em>type</em> elements of key
2365        name="hc" in bcp47/<a target="_blank" href=
2366        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td>
2367      </tr>
2368      <tr>
2369        <td rowspan="4">"hc"</td>
2370        <td rowspan="4">Hour cycle</td>
2371        <td>"h12"</td>
2372        <td>Hour system using 1–12; corresponds to 'h' in
2373        patterns</td>
2374      </tr>
2375      <tr>
2376        <td>"h23"</td>
2377        <td>Hour system using 0–23; corresponds to 'H' in
2378        patterns</td>
2379      </tr>
2380      <tr>
2381        <td>"h11"</td>
2382        <td>Hour system using 0–11; corresponds to 'K' in
2383        patterns</td>
2384      </tr>
2385      <tr>
2386        <td>"h24"</td>
2387        <td>Hour system using 1–24; corresponds to 'k' in
2388        pattern</td>
2389      </tr>
2390      <tr>
2391        <td colspan="4"><strong>A <a href=
2392        "#UnicodeLineBreakStyleIdentifier" name=
2393        "UnicodeLineBreakStyleIdentifier" id=
2394        "UnicodeLineBreakStyleIdentifier">Unicode Line Break Style
2395        Identifier</a> defines a preferred line break style
2396        corresponding to the CSS level 3 <a href=
2397        "https://drafts.csswg.org/css-text/#line-break-property">line-break
2398        option</a>. Specifying "lb" in a locale identifier
2399        overrides the locale‘s default style (which may correspond
2400        to "normal" or "strict"). The valid values are those
2401        <em>name</em> attribute values in the <em>type</em>
2402        elements of key name="lb" in bcp47/<a target="_blank" href=
2403        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td>
2404      </tr>
2405      <tr>
2406        <td rowspan="3">"lb"</td>
2407        <td rowspan="3">Line break style</td>
2408        <td>"strict"</td>
2409        <td>CSS level 3 line-break=strict, e.g. treat CJ as NS</td>
2410      </tr>
2411      <tr>
2412        <td>"normal"</td>
2413        <td>CSS level 3 line-break=normal, e.g. treat CJ as ID,
2414        break before hyphens for ja,zh</td>
2415      </tr>
2416      <tr>
2417        <td>"loose"</td>
2418        <td>CSS lev 3 line-break=loose</td>
2419      </tr>
2420      <tr>
2421        <td colspan="4"><strong>A <a href=
2422        "#UnicodeLineBreakWordIdentifier" name=
2423        "UnicodeLineBreakWordIdentifier" id=
2424        "UnicodeLineBreakWordIdentifier">Unicode Line Break Word
2425        Identifier</a> defines preferred line break word handling
2426        behavior corresponding to the CSS level 3 <a href=
2427        "https://drafts.csswg.org/css-text/#word-break-property">word-break
2428        option</a>. The valid values are those <em>name</em>
2429        attribute values in the <em>type</em> elements of key
2430        name="lw" in bcp47/<a target="_blank" href=
2431        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td>
2432      </tr>
2433      <tr>
2434        <td rowspan="3">"lw"</td>
2435        <td rowspan="3">Line break word handling</td>
2436        <td>"normal"</td>
2437        <td>CSS level 3 word-break=normal, normal script/language
2438        behavior for midword breaks</td>
2439      </tr>
2440      <tr>
2441        <td>"breakall"</td>
2442        <td>CSS level 3 word-break=break-all, allow midword breaks
2443        unless forbidden by lb setting</td>
2444      </tr>
2445      <tr>
2446        <td>"keepall"</td>
2447        <td>CSS level 3 word-break=keep-all, prohibit midword
2448        breaks except for dictionary breaks</td>
2449      </tr>
2450      <tr>
2451        <td colspan="4"><strong>A <a href=
2452        "#UnicodeMeasurementSystemIdentifier" name=
2453        "UnicodeMeasurementSystemIdentifier" id=
2454        "UnicodeMeasurementSystemIdentifier">Unicode Measurement
2455        System Identifier</a> defines a preferred measurement
2456        system. Specifying "ms" in a locale identifier overrides
2457        the default value specified by supplemental measurement
2458        system data (see Part 2 General, section 5 <a href=
2459        "tr35-general.html#Measurement_System_Data">Measurement
2460        System Data</a>). The valid values are those <em>name</em>
2461        attribute values in the <em>type</em> elements of key
2462        name="ms" in bcp47/<a target="_blank" href=
2463        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/measure.xml">measure.xml</a></strong>.</td>
2464      </tr>
2465      <tr>
2466        <td rowspan="3">"ms"</td>
2467        <td rowspan="3">Measurement system</td>
2468        <td>"metric"</td>
2469        <td>Metric System</td>
2470      </tr>
2471      <tr>
2472        <td>"ussystem"</td>
2473        <td>US System of measurement: feet, pints, etc.; pints are
2474        16oz</td>
2475      </tr>
2476      <tr>
2477        <td>"uksystem"</td>
2478        <td>UK System of measurement: feet, pints, etc.; pints are
2479        20oz</td>
2480      </tr>
2481      <tr>
2482        <td colspan="4"><strong>A <a href=
2483        "#UnicodeNumberSystemIdentifier" name=
2484        "UnicodeNumberSystemIdentifier" id=
2485        "UnicodeNumberSystemIdentifier">Unicode Number System
2486        Identifier</a> defines a type of number system. The valid
2487        values are those <em>name</em> attribute values in the
2488        <em>type</em> elements of bcp47/<a target="_blank" href=
2489        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/number.xml">number.xml</a>.</strong></td>
2490      </tr>
2491      <tr>
2492        <td rowspan="7">"nu"<br>
2493        (numbers)</td>
2494        <td rowspan="7">Numbering system</td>
2495        <td><i>Unicode script subtag</i></td>
2496        <td>
2497          <p>Four-letter types indicating the primary numbering
2498          system for the corresponding script represented in
2499          Unicode. Unless otherwise specified, it is a decimal
2500          numbering system using digits [:GeneralCategory=Nd:]. For
2501          example, "latn" refers to the ASCII / Western digits 0-9,
2502          while "taml" is an algorithmic (non-decimal) numbering
2503          system. (The code "tamldec" is indicates the "modern
2504          Tamil decimal digits".)<br></p>
2505          <p class="note">For more information, see <a href=
2506          "tr35-numbers.html#Numbering_Systems">Numbering
2507          Systems</a>.</p>
2508        </td>
2509      </tr>
2510      <tr>
2511        <td>"arabext"</td>
2512        <td>Extended Arabic-Indic digits ("arab" means the base
2513        Arabic-Indic digits)</td>
2514      </tr>
2515      <tr>
2516        <td>"armnlow"</td>
2517        <td>Armenian lowercase numerals</td>
2518      </tr>
2519      <tr>
2520        <td colspan="2">…</td>
2521      </tr>
2522      <tr>
2523        <td>"roman"</td>
2524        <td>Roman numerals</td>
2525      </tr>
2526      <tr>
2527        <td>"romanlow"</td>
2528        <td>Roman lowercase numerals</td>
2529      </tr>
2530      <tr>
2531        <td>"tamldec"</td>
2532        <td>Modern Tamil decimal digits</td>
2533      </tr>
2534      <tr>
2535        <td colspan="4"><strong>A <a href="#RegionOverride" name=
2536        "RegionOverride" id="RegionOverride">Region Override</a>
2537        specifies an alternate region to use for obtaining certain
2538        region-specific default values (those specified by the
2539        <a href="tr35-info.html#rgScope">&lt;rgScope&gt;</a>
2540        element), instead of using the region specified by the
2541        <a href="#unicode_region_subtag">unicode_region_subtag</a>
2542        in the Unicode Language Identifier (or inferred from the
2543        <a href=
2544        "#unicode_language_subtag">unicode_language_subtag</a>).</strong></td>
2545      </tr>
2546      <tr>
2547        <td rowspan="2">"rg"</td>
2548        <td rowspan="2">Region Override</td>
2549        <td>"uszzzz"<br>
2550        <br></td>
2551        <td rowspan="2">The value is a <a
2552        href= "#unicode_subdivision_id">unicode_subdivision_id</a>
2553        of type “unknown” or “regular”; this consists of a
2554        <a href=
2555        "#unicode_region_subtag">unicode_region_subtag</a> for a
2556        regular region (not a macroregion), suffixed
2557        either by “zzzz” (case is not
2558        significant) to designate the region
2559        as a whole, or by a unicode_subdivision_suffix to provide
2560        more specificity. For example, “en-GB-u-rg-uszzzz”
2561        represents a locale for British English but with
2562        region-specific defaults set to US for items such as
2563        default currency, default calendar and week data, default
2564        time cycle, and default measurement system and unit
2565        preferences.</td>
2566      </tr>
2567      <tr>
2568        <td>…</td>
2569      </tr>
2570      <tr>
2571        <td colspan="4"><strong>A <a name=
2572        "unicode_subdivision_subtag_validity" id=
2573        "unicode_subdivision_subtag_validity"></a><a href=
2574        "#UnicodeSubdivisionIdentifier" name=
2575        "UnicodeSubdivisionIdentifier" id=
2576        "UnicodeSubdivisionIdentifier">Unicode Subdivision
2577        Identifier</a> defines a regional subdivision used for
2578        locales. The valid values are based on the
2579        <em>subdivisionContainment</em> element as described in
2580        <em>Section <a href="#Unicode_Subdivision_Codes">3.6.5
2581        Subdivision Codes</a></em>.</strong></td>
2582      </tr>
2583      <tr>
2584        <td rowspan="2">"sd"</td>
2585        <td rowspan="2">Regional Subdivision</td>
2586        <td>"gbsct"<br>
2587        <br></td>
2588        <td rowspan="2">A <a href=
2589        "#unicode_subdivision_id">unicode_subdivision_id</a>, which
2590        is a <a href=
2591        "#unicode_region_subtag">unicode_region_subtag</a>
2592        concatenated with a unicode_subdivision_suffix.<br>
2593        For example, <em>gbsct</em> is “gb”+“sct” (where sct
2594        represents the subdivision code for Scotland). Thus
2595        “en-GB-u-sd-gbsct” represents the language variant “English
2596        as used in Scotland”. And both “en-u-sd-usca” and
2597        “en-US-u-sd-usca” represent “English as used in
2598        California”. See <strong><em><a href=
2599        "#Unicode_Subdivision_Codes">3.6.5 Subdivision
2600        Codes</a></em></strong>.</td>
2601      </tr>
2602      <tr>
2603        <td>…</td>
2604      </tr>
2605      <tr>
2606        <td colspan="4"><strong>A <a href=
2607        "#UnicodeSentenceBreakSuppressionsIdentifier" name=
2608        "UnicodeSentenceBreakSuppressionsIdentifier" id=
2609        "UnicodeSentenceBreakSuppressionsIdentifier">Unicode
2610        Sentence Break Suppressions Identifier</a> defines a set of
2611        data to be used for suppressing certain sentence breaks
2612        that would otherwise be found by UAX #14 rules. The valid
2613        values are those <em>name</em> attribute values in the
2614        <em>type</em> elements of key name="ss" in bcp47/<a target=
2615        "_blank" href=
2616        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td>
2617      </tr>
2618      <tr>
2619        <td rowspan="2">"ss"</td>
2620        <td rowspan="2">Sentence break suppressions</td>
2621        <td>"none"</td>
2622        <td>Don’t use sentence break suppressions data (the
2623        default).</td>
2624      </tr>
2625      <tr>
2626        <td>"standard"</td>
2627        <td>Use sentence break suppressions data of type
2628        "standard"</td>
2629      </tr>
2630      <tr>
2631        <td colspan="4"><strong>A <a href=
2632        "#UnicodeTimezoneIdentifier" name=
2633        "UnicodeTimezoneIdentifier" id=
2634        "UnicodeTimezoneIdentifier">Unicode Timezone Identifier</a>
2635        defines a timezone. The valid values are those name
2636        attribute values in the <em>type</em> elements of
2637        bcp47/<a target="_blank" href=
2638        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/timezone.xml">timezone.xml</a>.</strong></td>
2639      </tr>
2640      <tr>
2641        <td>"tz"<br>
2642        (timezone)</td>
2643        <td>Time zone</td>
2644        <td><i>Unicode short time zone IDs</i></td>
2645        <td>
2646          <p>Short identifiers defined in terms of a TZ time zone
2647          database [<a href="#Olson">Olson</a>] identifier in the
2648          file common/bcp47/timezone.xml file, plus a few extra
2649          values.</p>
2650          <p>For more information, see <a href=
2651          "#Time_Zone_Identifiers">Section 3.7.1.2 Time Zone
2652          Identifiers</a>.</p>
2653          <p>CLDR provides data for normalizing timezone codes.</p>
2654        </td>
2655      </tr>
2656      <tr>
2657        <td colspan="4"><strong>A <a href=
2658        "#UnicodeVariantIdentifier" name="UnicodeVariantIdentifier"
2659        id="UnicodeVariantIdentifier">Unicode Variant
2660        Identifier</a> defines a special variant used for locales.
2661        The valid values are those name attribute values in the
2662        <em>type</em> elements of bcp47/<a target="_blank" href=
2663        "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/variant.xml">variant.xml</a>.</strong></td>
2664      </tr>
2665      <tr>
2666        <td>"va"</td>
2667        <td>Common variant type</td>
2668        <td>"posix"</td>
2669        <td>POSIX style locale variant. About handling of the
2670        "POSIX" variant see <i>Section 3.8.2, <a href=
2671        "#Legacy_Variants">Legacy Variants</a></i>.</td>
2672      </tr>
2673    </table>
2674    <p>For more information on the allowed keys and types, see the
2675    specific elements below, and <a href=
2676    "#Unicode_Locale_Extension_Data_Files">Section 3.6.4 U
2677    Extension Data Files</a>.</p>
2678    <p>Additional keys or types might be added in future versions.
2679    Implementations of LDML should be robust to handle any
2680    syntactically valid key or type values.</p>
2681    <h4><a href="#Numbering%20System%20Data" name=
2682    "Numbering System Data">3.6.2 Numbering System Data</a></h4>
2683    <p>LDML supports multiple numbering systems. The identifiers
2684    for those numbering systems are defined in the file
2685    <strong>bcp47/number.xml</strong>. For example, for the 'trunk'
2686    version of the data see <a href=
2687    "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/number.xml">
2688    bcp47/number.xml</a>.<br></p>
2689    <p>Details about those numbering systems are defined in
2690    <strong>supplemental/numberingSystems.xml</strong>. For
2691    example, for the 'trunk' version of the data see <a href=
2692    "https://github.com/unicode-org/cldr/releases/tag/latest/common/supplemental/numberingSystems.xml">
2693    supplemental/numberingSystems.xml</a>.<br></p>
2694    <p>LDML makes certain stability guarantees on this
2695    data:&nbsp;<br></p>
2696    <ol>
2697      <li>Like other BCP 47 identifiers, once a numeric identifier
2698      is added to <strong>bcp47/number.xml</strong> or
2699      <strong>numberingSystems.xml</strong>, it will never be
2700      removed from either of those files.</li>
2701      <li>If an identifier has type="numeric" in
2702      numberingSystems.xml, then
2703        <ol>
2704          <li>It is a decimal, positional numbering system with an
2705          attribute digits=X, where X is a string with the 10
2706          digits in order used by the numbering system.</li>
2707          <li>The values of the type and digits will never
2708          change.</li>
2709        </ol>
2710      </li>
2711    </ol>
2712    <h4><a href="#Time_Zone_Identifiers" name=
2713    "Time_Zone_Identifiers" id="Time_Zone_Identifiers">3.6.3 Time
2714    Zone Identifiers</a></h4>
2715    <p>LDML inherits time zone IDs from the tz database [<a href=
2716    "#Olson">Olson</a>]. Because these IDs from the tz database do
2717    not satisfy the BCP 47 language subtag syntax requirements,
2718    CLDR defines short identifiers for the use in the Unicode
2719    locale extension. The short identifiers are defined in the file
2720    <strong>common/bcp47/timezone.xml</strong>.</p>
2721    <p>The short identifiers use UN/LOCODE [<a href=
2722    "#LOCODE">LOCODE</a>] (excluding a space character) codes where
2723    possible. For example, the short identifier for
2724    "America/Los_Angeles" is "uslax" (the LOCODE for Los Angeles,
2725    US is "US LAX"). Identifiers of length not equal to 5 are used
2726    where there is no corresponding UN/LOCODE, such as "usnavajo"
2727    for "America/Shiprock", or "utcw01" for "Etc/GMT+1", so that
2728    they do not overlap with future UN/LOCODE.</p>
2729    <p>Although the first two letters of a short identifier may
2730    match an ISO 3166 two-letter country code, a user should not
2731    assume that the time zone belongs to the country. The first two
2732    letters in an identifier of length not equal to 5 has no
2733    meaning. Also, the identifiers are stabilized, meaning that
2734    they will not change no matter what changes happen in the base
2735    standard. So if Hawaii leaves the US and joins Canada as a new
2736    province, the short time zone identifier "ushnl" would not
2737    change in CLDR even if the UN/LOCODE changes to "cahnl" or
2738    something else.</p>
2739    <p>There is a special code "unk" for an Unknown or Invalid time
2740    zone. This can be expressed in the tz database style ID
2741    "Etc/Unknown", although it is not defined in the tz
2742    database.</p>
2743    <p><b>Stability of Time Zone Identifiers</b></p>
2744    <p>Although the short time zone identifiers are guaranteed to
2745    be stable, the preferred IDs in the tz database (as those found
2746    in <strong>zone.tab</strong> file) might be changed time to
2747    time. For example, "Asia/Culcutta" was replaced with
2748    "Asia/Kolkata" and moved to <strong>backward</strong> file in
2749    the tz database. CLDR contains locale data using a time zone ID
2750    from the tz database as the key, stability of the IDs is
2751    cirtical.</p>
2752    <p>To maintain the stability of "long" IDs (for those inherited
2753    from the tz database), a special rule applied to the
2754    <i>alias</i> attribute in the &lt;type&gt; element for "tz" -
2755    the first "long" ID is the CLDR canonical "long" time zone
2756    ID.</p>
2757    <p>For example:</p>
2758    <blockquote>
2759      &lt;type name="inccu" alias="Asia/Calcutta Asia/Kolkata"
2760      description="Kolkata, India"/&gt;
2761    </blockquote>
2762    <p>Above &lt;type&gt; element defines the short time zone ID
2763    "inccu" (for the use in the Unicode locale extension),
2764    corresponding <em>CLDR canonical "long" ID</em>
2765    "Asia/Culcutta", and an alias "Asia/Kolkata".</p>
2766    <h4><a href="#Unicode_Locale_Extension_Data_Files" name=
2767    "Unicode_Locale_Extension_Data_Files" id=
2768    "Unicode_Locale_Extension_Data_Files">3.6.4 U Extension Data
2769    Files</a></h4>
2770    <p>The 'u' extension data is stored in multiple XML files
2771    located under common/bcp47 directory in CLDR. Each file
2772    contains the locale extension key/type values and their
2773    backward compatibility mappings appropriate for a particular
2774    domain. <a href=
2775    "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/collation.xml">
2776    common/bcp47/collation.xml</a> contains key/type values for
2777    collation, including optional collation parameters and valid
2778    type values for each key.</p>
2779    <p>The 't' extension data is stored in <a href=
2780    "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml">
2781    common/bcp47/transform.xml</a>.</p>
2782    <p class="dtd">&lt;!ELEMENT keyword ( key* )&gt;</p>
2783    <p class="dtd">&lt;!ELEMENT key ( type* )&gt;<br>
2784    &lt;!ATTLIST key extension NMTOKEN #IMPLIED&gt;<br>
2785    &lt;!ATTLIST key name NMTOKEN #REQUIRED&gt;<br>
2786    &lt;!ATTLIST key description CDATA #IMPLIED&gt;<br>
2787    &lt;!ATTLIST key deprecated ( true | false ) "false"&gt;<br>
2788    &lt;!ATTLIST key preferred NMTOKEN #IMPLIED&gt;<br>
2789    &lt;!ATTLIST key alias NMTOKEN #IMPLIED&gt;<br>
2790    &lt;!ATTLIST key valueType (single | multiple | incremental |
2791    any) #IMPLIED &gt;<br>
2792    &lt;!ATTLIST key since CDATA #IMPLIED&gt;</p>
2793    <p class="dtd">&lt;!ELEMENT type EMPTY&gt;<br>
2794    &lt;!ATTLIST type name NMTOKEN #REQUIRED&gt;<br>
2795    &lt;!ATTLIST type description CDATA #IMPLIED&gt;<br>
2796    &lt;!ATTLIST type deprecated ( true | false ) "false"&gt;<br>
2797    &lt;!ATTLIST type preferred NMTOKEN #IMPLIED&gt;<br>
2798    &lt;!ATTLIST type alias CDATA #IMPLIED&gt;<br>
2799    &lt;!ATTLIST type since CDATA #IMPLIED&gt;</p>
2800    <p class="dtd">&lt;!ELEMENT attribute EMPTY&gt;<br>
2801    &lt;!ATTLIST attribute name NMTOKEN #REQUIRED&gt;<br>
2802    &lt;!ATTLIST attribute description CDATA #IMPLIED&gt;<br>
2803    &lt;!ATTLIST attribute deprecated ( true | false )
2804    "false"&gt;<br>
2805    &lt;!ATTLIST attribute preferred NMTOKEN #IMPLIED&gt;<br>
2806    &lt;!ATTLIST attribute since CDATA #IMPLIED&gt;</p>
2807    <p>The extension attribute in &lt;key&gt; element specifies the
2808    BCP 47 language tag extension type. The default value of the
2809    extension attribute is "u" (Unicode locale extension). The
2810    &lt;type&gt; element is only applicable to the enclosing
2811    &lt;key&gt;.</p>
2812    <p>In the Unicode locale extension 'u' and 't' data files, the
2813    common attributes for the &lt;key&gt;, &lt;type&gt; and
2814    &lt;attribute&gt; elements are as follows:</p>
2815    <dl>
2816      <dt><b>name</b></dt>
2817      <dd>
2818        <p>The key or type name used by Unicode locale extension
2819        with <a href="#Unicode_locale_identifier">'u' extension
2820        syntax</a> or the 't' extensions syntax. When <i>alias</i>
2821        below is absent, this name can be also used with the old
2822        style <a href="#Old_Locale_Extension_Syntax">"@key=type"
2823        syntax</a>.</p>
2824        <p>Most type names are <strong>literal type names</strong>,
2825        which match exactly the same value. All of these have at
2826        least one lowercase letter, such as "buddhist". There are a
2827        small number of <strong>indirect type names</strong>, such
2828        as "RG_KEY_VALUE". These have no lowercase letters. The
2829        interpretation of each one is listed below.</p>
2830        <h5><a name="CODEPOINTS" href="#CODEPOINTS" id=
2831        "CODEPOINTS">CODEPOINTS</a></h5>
2832        <p>The type name <strong>"CODEPOINTS"</strong> is reserved
2833        for a variable representing Unicode code point(s). The
2834        syntax is:</p>
2835        <table border="0">
2836          <tr>
2837            <th>&nbsp;</th>
2838            <th>
2839              <div align="center">
2840                EBNF
2841              </div>
2842            </th>
2843          </tr>
2844          <tr>
2845            <td>
2846              <pre>codepoints</pre>
2847            </td>
2848            <td>
2849              <pre>= codepoint (sep codepoint)?</pre>
2850            </td>
2851          </tr>
2852          <tr>
2853            <td>
2854              <pre>codepoint</pre>
2855            </td>
2856            <td>
2857              <pre>= [0-9 A-F a-f]{4,6}</pre>
2858            </td>
2859          </tr>
2860        </table>
2861        <p>In addition, no codepoint may exceed 10FFFF. For
2862        example, "00A0", "300b", "10D40C" and "00C1-00E1" are
2863        valid, but "A0", "U060C" and "110000" are not.</p>
2864        <p>In the current version of CLDR, the type "CODEPOINTS" is
2865        only used for the deprecated locale extension key "vt"
2866        (variableTop). The subtags forming the type for "vt"
2867        represent an arbitrary string of characters. There is no
2868        formal limit in the number of characters, although
2869        practically anything above 1 will be rare, and anything
2870        longer than 4 might be useless. Repetition is allowed, for
2871        example, 0061-0061 ("aa") is a Valid type value for "vt",
2872        since the sequence may be a collating element. Order is
2873        vital: 0061-0062 ("ab") is different than 0062-0061 ("ba").
2874        Note that for variableTop any character sequence must be a
2875        contraction which yields exactly one primary weight.</p>
2876        <p>For example,</p>
2877        <blockquote>
2878          <p><strong>en-u-vt-00A4</strong> : this indicates
2879          English, with any characters sorting at or below " ¤" (at
2880          a primary level) considered Variable.</p>
2881        </blockquote>
2882        <p>By default in UCA, variable characters are ignored in
2883        sorting at a primary, secondary, and tertiary level. But in
2884        CLDR, they are not ignorable by default. For more
2885        information, see <a href=
2886        "tr35-collation.html#Setting_Options">Collation: Section
2887        3.3 <em>Setting Options</em></a> .</p>
2888        <h5><a name="REORDER_CODE" href="#REORDER_CODE" id=
2889        "REORDER_CODE">REORDER_CODE</a></h5>
2890        <p>The type name <strong>"REORDER_CODE"</strong> is
2891        reserved for reordering block names (e.g. "latn", "digit"
2892        and "others") defined in the <i><a href=
2893        "tr35-collation.html#Root_Collation">Root
2894        Collation</a></i>. The type "REORDER_CODE" is used for
2895        locale extension key "kr" (colReorder). The value of type
2896        for "kr" is represented by one or more reordering block
2897        names such as "latn-digit". For more information, see
2898        <a href="tr35-collation.html#Script_Reordering">Collation:
2899        Section 3.12 <em>Collation Reordering</em></a> .</p>
2900        <h5><a name="RG_KEY_VALUE" href="#RG_KEY_VALUE" id=
2901        "RG_KEY_VALUE">RG_KEY_VALUE</a></h5>
2902        <p>The type name <strong>"RG_KEY_VALUE"</strong> is
2903        reserved for region codes in the format required by the
2904        "rg" key; this is a subdivision
2905        code with idStatus='unknown' or 'regular' from the
2906        idValidity data in common/validity/subdivision.xml.</p>
2907        <h5><a name="SCRIPT_CODE" href="#SCRIPT_CODE" id=
2908        "SCRIPT_CODE">SCRIPT_CODE</a></h5>
2909        <p>The type name <strong>"SCRIPT_CODE"</strong> is
2910        reserved for <code><a href="#unicode_script_subtag">unicode_script_subtag</a></code>
2911        values (e.g. "thai", "laoo").
2912        The type "SCRIPT_CODE" is used for locale extension key "dx".
2913        The value of type for "dx" is represented by one or more SCRIPT_CODEs,
2914        such as "thai-laoo".</p>
2915        <h5><a name="SUBDIVISION_CODE" href="#SUBDIVISION_CODE" id=
2916        "SUBDIVISION_CODE">SUBDIVISION_CODE</a></h5>
2917        <p>The type name <strong>"SUBDIVISION_CODE"</strong> is
2918        reserved for subdivision codes in the format required by
2919        the "sd" key; this is a subdivision code from the
2920        idValidity data in common/validity/subdivision.xml,
2921        excluding those with idStatus='unknown'. Codes with
2922        idStatus='deprecated' should not be generated, and those
2923        with idStatus='private_use' are only to be used with prior
2924        agreement.</p>
2925        <h5><a name="PRIVATE_USE" href="#PRIVATE_USE" id=
2926        "PRIVATE_USE">PRIVATE_USE</a></h5>
2927        <p>The type name <strong>"PRIVATE_USE"</strong> is reserved
2928        for private use types. A valid type value is composed of
2929        one or more subtags separated by hyphens and each subtag
2930        consists of three to eight ASCII alphanumeric characters.
2931        In the current version of CLDR,
2932        <strong>"PRIVATE_USE"</strong> is only used for transform
2933        extension "x0".</p>
2934      </dd>
2935      <dt><b>valueType</b></dt>
2936      <dd>
2937        <p>The valueType attribute indicates how many subtags are
2938        valid for a given key:</p>
2939        <table class='simple' width="100%" border="1">
2940          <tbody>
2941            <tr>
2942              <th>single</th>
2943              <td>Either exactly one type value, or no type value
2944              (but only if the value of "true" would be valid).
2945              This is the default if no valueType attribute is
2946              present.</td>
2947            </tr>
2948            <tr>
2949              <th>incremental</th>
2950              <td>Multiple type values are allowed, but only if a
2951              prefix is also present, and the sequence is
2952              explicitly listed. Each successive type value
2953              indicates a refinement of its prefix. For
2954              example:<br>
2955              &lt;key name="ca" description="Calendar algorithm
2956              key" <strong>valueType="incremental"</strong>&gt;<br>
2957              &nbsp;&nbsp;&lt;type name="islamic"
2958              description="Islamic calendar"/&gt;<br>
2959              &nbsp;&nbsp;&lt;type name="islamic-umalqura"
2960              description="Islamic calendar, Umm al-Qura"/&gt;<br>
2961              Thus <em>ca-islamic-umalqura</em> is valid. However,
2962              <em>ca-gregory-japanese</em> is not valid, because
2963              "gregory-japanese" is not listed as a type.</td>
2964            </tr>
2965            <tr>
2966              <th>multiple</th>
2967              <td>Multiple type values are allowed, but each may
2968              only occur once. For example:<br>
2969              &lt;key name="kr" description="Collation reorder
2970              codes" <strong>valueType="multiple"</strong>&gt;<br>
2971              &nbsp;&nbsp;&lt;type name="REORDER_CODE" …/&gt;</td>
2972            </tr>
2973            <tr>
2974              <th>any</th>
2975              <td>Any number of type values are allowed, with none
2976              of the above restrictions. For example:<br>
2977              &lt;key extension="t" name="x0" description="Private
2978              use transform type key."
2979              <strong>valueType="any"</strong>&gt;<br>
2980              &nbsp;&nbsp;&lt;type name="PRIVATE_USE" …/&gt;</td>
2981            </tr>
2982          </tbody>
2983        </table>
2984      </dd>
2985      <dt><b>description</b></dt>
2986      <dd>
2987        <p>The description of the key, type or attribute element.
2988        There is also some informative text about certain keys and
2989        types in the Section 3.5 <a href=
2990        "#Key_And_Type_Definitions_">Key And Type
2991        Definitions</a>.</p>
2992      </dd>
2993      <dt><b>deprecated</b></dt>
2994      <dd>
2995        <p>The deprecation status of the key, type or attribute
2996        element. The value "true" indicates the element is
2997        deprecated and no longer used in the version of CLDR. The
2998        default value is "false".</p>
2999      </dd>
3000      <dt><b>preferred</b></dt>
3001      <dd>
3002        <p>The preferred value of the deprecated key, type or
3003        attribute element. When a key, type or attribute element is
3004        deprecated, this attribute is used for specifying a new
3005        canonical form if available.</p>
3006      </dd>
3007      <dt><b>alias</b> (Not applicable to &lt;attribute&gt;)</dt>
3008      <dd>
3009        <p>The BCP 47 form is the canonical form, and recommended.
3010        Other aliases are included only for backwards
3011        compatibility.</p>
3012      </dd>
3013      <dd><em>Example:</em></dd>
3014      <dd>
3015        <p>&lt;type name="phonebk"
3016        <strong>alias="phonebook"</strong> description="Phonebook
3017        style ordering (such as in German)"/&gt;<br></p>The
3018        preferred term, and the only one to be used in BCP 47, is
3019        the name: in this example, "phonebk".<br>
3020      </dd>
3021      <dd>
3022        <p>The alias is a key or type name used by Unicode locale
3023        extensions with the old <a href=
3024        "#Old_Locale_Extension_Syntax">"@key=type" syntax</a>. The
3025        attribute value for type may contain multiple names
3026        delimited by ASCII space characters. Of those aliases, the
3027        first name is the preferred value.</p>
3028      </dd>
3029      <dt><b>since</b></dt>
3030      <dd>The version of CLDR in which this key or type was
3031      introduced. Absence of this attribute value implies the key
3032      or type was available in CLDR 1.7.2.</dd>
3033    </dl>
3034    <p><em>Note: There are no values defined for the locale
3035    extension attribute in the current CLDR release.</em></p>
3036    <p>For example,</p>
3037    <pre>
3038&lt;key name="co" alias="collation" description="Collation type key"&gt;
3039  &lt;type name="pinyin" description="Pinyin ordering for Latin and for CJK characters (used in Chinese)"/&gt;
3040&lt;/key&gt;
3041
3042&lt;key name="ka" alias="colAlternate" description="Collation parameter key for alternate handling"&gt;
3043  &lt;type name="noignore" alias="non-ignorable" description="Variable collation elements are not reset to ignorable"/&gt;
3044  &lt;type name="shifted" description="Variable collation elements are reset to zero at levels one through three"/&gt;
3045&lt;/key&gt;
3046
3047&lt;key name="tz" alias="timezone"&gt;
3048  ...
3049  &lt;type name="aumel" alias="Australia/Melbourne Australia/Victoria" description="Melbourne, Australia"/&gt;
3050  &lt;type name="aumqi" alias="Antarctica/Macquarie" description="Macquarie Island Station, Macquarie Island" since="1.8.1"/&gt;
3051  ...
3052&lt;/key&gt;
3053    </pre>The data above indicates:
3054    <ul>
3055      <li>type "pinyin" is valid for key "co", thus "u-co-pinyin"
3056      is a valid Unicode locale extension.</li>
3057      <li>type "pinyin" is not valid for key "ka", thus
3058      "u-ka-pinyin" is not a valid Unicode locale extension.</li>
3059      <li>type "pinyin" has no <i>alias</i>, so
3060      "zh@collation=pinyin" is a valid Unicode locale identifier
3061      according to the old syntax.</li>
3062      <li>type "noignore" has an alias attribute, so
3063      "en@colAlternate=noignore" is not a valid Unicode locale
3064      identifier according to the old syntax.</li>
3065      <li>type "aumel" is valid for key "tz", supported by CLDR
3066      1.7.2 (default value) or later versions.</li>
3067      <li>type "aumqi" is valid for key "tz", supported by CLDR
3068      1.8.1 or later versions.</li>
3069    </ul>
3070    <p>It is strongly recommended that all API methods accept all
3071    possible aliases for keywords and types, but generate the
3072    canonical form. For example, "ar-u-ca-islamicc" would be
3073    equivalent to "ar-u-ca-islamic-civil" on input, but the latter
3074    should be output. The one exception is where an alias would
3075    only be well-formed with the old syntax, such as "gregorian"
3076    (for "gregory").</p>
3077    <h4><a href="#Unicode_Subdivision_Codes" name=
3078    "Unicode_Subdivision_Codes" id=
3079    "Unicode_Subdivision_Codes">3.6.5 Subdivision Codes</a></h4>
3080    <p>The subdivision codes designate a subdivision of a country
3081    or region. They are called various names, such as a
3082    <em>state</em> in the United States, or a <em>province</em> in
3083    Canada. The codes in CLDR are based on ISO 3166-2 subdivision
3084    codes. The ISO codes have a region code followed by a hyphen,
3085    then a suffix consisting of 1..3 ASCII letters or digits.</p>
3086    <p>The CLDR codes are designed to work in a <a href=
3087    '#unicode_locale_id'>unicode_locale_id</a> (BCP47), and are
3088    thus all lowercase, with no hyphen. For example, the following
3089    are valid, and mean “English as used in California, USA”.</p>
3090    <ul>
3091      <li>en-u-sd-<strong>usca</strong></li>
3092      <li>en-US-u-sd-<strong>usca</strong></li>
3093    </ul>
3094    <p>CLDR has additional subdivision codes. These may start with
3095    a 3-digit region code or use a suffix of 4 ASCII letters or
3096    digits, so they will not collide with the ISO codes.
3097    Subdivision codes for unknown values are the region code plus
3098    "zzzz", such as "uszzzz" for an unknown subdivision of the US.
3099    Other codes may be added for stability.</p>
3100    <p>Like BCP 47, CLDR requires stable codes, which are not
3101    guaranteed for ISO 3166-2 (nor have the ISO 3166-2 codes been
3102    stable in the past). If an ISO 3166-2 code is removed, it
3103    remains valid (though marked as deprecated) in CLDR. If an ICU
3104    3166-2 code is reused (for the same region), then CLDR will
3105    define a new equivalent code using these a 4-character
3106    suffixes.</p>
3107    <h5><a name="Validity" href="#Validity" id="Validity">3.6.5.1
3108    Validity</a></h5>
3109    <p>A <a href=
3110    "#unicode_subdivision_id">unicode_subdivision_id</a> is only
3111    valid when it is present in the subdivision.xml file as
3112    described in <em>Section 3.11 <a href="#Validity_Data">Validity
3113    Data</a></em>. The data is in a compressed form, and thus needs
3114    to be expanded before such a test is made.</p>
3115    <p><em>Examples:<br></em></p>
3116    <ul>
3117      <li><strong>usca</strong> is valid — there is an
3118      <strong>id</strong>
3119      element<code>&lt;id&nbsp;type="subdivision"…&gt;… usca
3120      …&lt;/id&gt;</code></li>
3121      <li><strong>ussct</strong> is invalid — there is no
3122      <strong>id</strong> element
3123      <code>&lt;id&nbsp;type="subdivision"…&gt;… ussct
3124      …&lt;/id&gt;</code></li>
3125    </ul>
3126    <p>If a <a href='#unicode_locale_id'>unicode_locale_id</a>
3127    contains both a <a href=
3128    "#unicode_region_subtag">unicode_region_subtag</a> and a
3129    <a href="#unicode_subdivision_id">unicode_subdivision_id</a>,
3130    it is only valid if the <a href=
3131    "#unicode_subdivision_id">unicode_subdivision_id</a> starts
3132    with the <a href=
3133    "#unicode_region_subtag">unicode_region_subtag</a>
3134    (case-insensitively).<br></p>
3135    <p>It is recommended that a <a href=
3136    '#unicode_locale_id'>unicode_locale_id</a> contain a <a href=
3137    "#unicode_region_subtag">unicode_region_subtag</a> if it
3138    contains a <a href=
3139    "#unicode_subdivision_id">unicode_subdivision_id</a> and the
3140    region would not be added by adding likely subtags. That
3141    produces better behavior if the <a href=
3142    "#unicode_subdivision_id">unicode_subdivision_id</a> is ignored
3143    by an implementation or if the language tag is truncated.</p>
3144    <p>Examples:<br></p>
3145    <ul>
3146      <li>en-<strong>US</strong>-u-sd-<strong>us</strong>ca is
3147      valid — the region "US" matches the first part of "usca"</li>
3148      <li>en-u-sd-<strong>us</strong>ca is valid — it still works
3149      after adding likely subtags.</li>
3150      <li>en-<strong>CA</strong>-u-sd-<strong>gb</strong>sct is
3151      invalid — the region "CA" does not match the first part of
3152      "gbsct". An implementation should disregard the subdivision
3153      id (or return an error).</li>
3154      <li>en-u-sd-<strong>gb</strong>sct is valid but not
3155      recommended — an implementation that ignores the <a href=
3156      "#unicode_subdivision_id">unicode_subdivision_id</a> can get
3157      the wrong fallback behavior, or could add likely subtags and
3158      get the invalid
3159      en<strong>-Latn-US</strong>-u-sd-<strong>gb</strong>sct</li>
3160    </ul>
3161    <p>In version 28.0, the subdivisions in the validity files used
3162    the ISO format, uppercase with a hyphen separating two
3163    components, instead of the BCP 47 format.</p>
3164    <h3><a name="t_Extension" id="t_Extension"></a><a name=
3165    "BCP47_T_Extension" href="#BCP47_T_Extension" id=
3166    "BCP47_T_Extension">3.7 Unicode BCP 47 T Extension</a></h3>
3167    <p>The Unicode Consortium has registered and is the maintaining
3168    authority for two BCP 47 language tag extensions: the extension
3169    'u' for Unicode locale extension [<a href=
3170    "#RFC6067">RFC6067</a>] and extension 't' for transformed
3171    content [<a href="#RFC6497">RFC6497</a>]. The Unicode BCP 47
3172    extension data defines the complete list of valid subtags.
3173    While the title of the RFC is “Transformed Content”, the
3174    abstract makes it clear that the scope is broader than the term
3175    "transformed" might indicate to a casual
3176    reader:&nbsp;“including content that has been transliterated,
3177    transcribed, or translated, or&nbsp;<em>in some other way
3178    influenced by the source. It also provides for additional
3179    information used for identification.</em>”</p>
3180    <p><strong>The -t- Extension.</strong> The syntax of 't'
3181    extension subtags is defined by the rule
3182    <code>unicode_locale_extensions</code> in <a href=
3183    "#Unicode_locale_identifier"><em>Section 3.2 Unicode locale
3184    identifier</em></a>, except the separator of subtags
3185    <code>sep</code> must be always hyphen '-' when the extension
3186    is used as a part of BCP 47 language tag. For information about
3187    the registration process, meaning, and usage of the 't'
3188    extension, see [<a href="#RFC6497">RFC6497</a>].</p>
3189    <p>These subtags are all in lowercase (that is the canonical
3190    casing for these subtags), however, subtags are
3191    case-insensitive and casing does not carry any specific
3192    meaning. All subtags within the Unicode extensions are
3193    alphanumeric characters in length of two to eight that meet the
3194    rule <code>extension</code> in the [<a href=
3195    "#BCP47">BCP47</a>].</p>
3196    <p>The following keys are defined for the -t- extension:</p>
3197    <table class='simple'>
3198      <tbody>
3199        <tr>
3200          <th>Keys</th>
3201          <th>Description</th>
3202          <th>Values in latest release</th>
3203        </tr>
3204        <tr>
3205          <td>m0</td>
3206          <td><strong>Transform extension mechanism:</strong> to
3207          reference an authority or rules for a type of
3208          transformation</td>
3209          <td><a href=
3210          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml">
3211transform.xml</a></td>
3212        </tr>
3213        <tr>
3214          <td nowrap>s0, d0</td>
3215          <td><strong>Transform source/destination:</strong> for
3216          non-languages/scripts, such as fullwidth-halfwidth
3217          conversion.</td>
3218          <td><a href=
3219          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform-destination.xml">
3220transform-destination.xml</a></td>
3221        </tr>
3222        <tr>
3223          <td>i0</td>
3224          <td><strong>Input Method Engine transform:</strong> Used
3225          to indicate an input method transformation, such as one
3226          used by a client-side input method. The first subfield in
3227          a sequence would typically be a 'platform' or vendor
3228          designation.</td>
3229          <td><a href=
3230          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_ime.xml">
3231transform_ime.xml</a></td>
3232        </tr>
3233        <tr>
3234          <td>k0</td>
3235          <td><strong>Keyboard transform:</strong> Used to indicate
3236          a keyboard transformation, such as one used by a
3237          client-side virtual keyboard. The first subfield in a
3238          sequence would typically be a 'platform' designation,
3239          representing the platform that the keyboard is intended
3240          for. The keyboard might or might not correspond to a
3241          keyboard mapping shipped by the vendor for the platform.
3242          One or more subsequent fields may occur, but are only
3243          added where needed to distinguish from others.</td>
3244          <td><a href=
3245          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_keyboard.xml">
3246transform_keyboard.xml</a></td>
3247        </tr>
3248        <tr>
3249          <td>t0</td>
3250          <td><strong>Machine Translation:</strong> Used to
3251          indicate content that has been machine translated, or a
3252          request for a particular type of machine translation of
3253          content. The first subfield in a sequence would typically
3254          be a 'platform' or vendor designation.</td>
3255          <td><a href=
3256          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_mt.xml">
3257transform_mt.xml</a></td>
3258        </tr>
3259        <tr>
3260          <td nowrap>h0</td>
3261          <td><strong>Hybrid Locale Identifiers:</strong> h0 with
3262          the value 'hybrid' indicates that the -t- value is a
3263          language that is mixed into the main language tag to form
3264          a hybrid. For more information, and examples, see
3265          <em>Section 3.10.2 <a href="#Hybrid_Locale">Hybrid Locale
3266          Identifiers</a>.</em></td>
3267          <td><a href=
3268          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_hybrid.xml">
3269transform_hybrid.xml</a></td>
3270        </tr>
3271        <tr>
3272          <td>x0</td>
3273          <td><strong>Private use transform</strong></td>
3274          <td><a href=
3275          "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_private_use.xml">
3276transform_private_use.xml</a></td>
3277        </tr>
3278      </tbody>
3279    </table>
3280    <h4><a href="#Transformed_Content_Data_File" name=
3281    "Transformed_Content_Data_File" id=
3282    "Transformed_Content_Data_File">3.7.1 T Extension Data
3283    Files</a></h4>
3284    <p>The overall structure of the data files is the similar to
3285    the U Extension, with the following exceptions.</p>
3286    <p>In the transformed content 't' data file, the name attribute
3287    in a &lt;key&gt; element defines a valid field separator
3288    subtag. The name attribute in an enclosed &lt;type&gt; element
3289    defines a valid field subtag for the field separator subtag.
3290    For example:</p>
3291    <pre>
3292&lt;key extension="t" name="m0"
3293    description="Transform extension mechanism"&gt;
3294        &lt;type name="ungegn"
3295                description="United Nations Group of Experts on Geographical Names"
3296      since="21"/&gt;
3297&lt;key&gt;
3298</pre>The data above indicates:
3299    <ul>
3300      <li>"m0" is a valid field separator for the transformed
3301      content extension 't'.</li>
3302      <li>field subtag "ungegn" is valid for field separator
3303      "m0".</li>
3304      <li>field subtag "ungegn" was introduced in CLDR 21.</li>
3305    </ul>
3306    <p>The attributes are:</p>
3307    <dl>
3308      <dt><b>name</b></dt>
3309      <dd>The name of the mechanism, limited to 3-8 characters (or
3310      sequences of them). Any indirect type names are listed in
3311      3.6.4 <a href="#Unicode_Locale_Extension_Data_Files">U
3312      Extension Data Files</a>.</dd>
3313      <dt><b>description</b></dt>
3314      <dd>A description of the name, with all and only that
3315      information necessary to distinguish one name from | American
3316      Library others with which it might be confused. Descriptions
3317      are not intended to provide general background
3318      information.</dd>
3319      <dt><b>since</b></dt>
3320      <dd>Indicates the first version of CLDR where the name
3321      appears. (Required for new items.)</dd>
3322      <dt>&nbsp;</dt>
3323      <dt><b>alias</b></dt>
3324      <dd>Alternative name, not limited in number of characters.
3325      Aliases are intended for compatibility, not to provide all
3326      possible alternate names or designations.
3327      <em>(Optional)</em></dd>
3328    </dl>
3329    <p>For information about the registration process, meaning, and
3330    usage of the 't' extension, see [<a href=
3331    "#RFC6497">RFC6497</a>].</p>
3332    <h3><a name="Compatibility_with_Older_Identifiers" href=
3333    "#Compatibility_with_Older_Identifiers" id=
3334    "Compatibility_with_Older_Identifiers">3.8 Compatibility with
3335    Older Identifiers</a></h3>
3336    <p>LDML version before 1.7.2 used slightly different syntax for
3337    variant subtags and locale extensions. Implementations of LDML
3338    may provide backward compatible identifier support as described
3339    in following sections.</p>
3340    <h4><a name="Old_Locale_Extension_Syntax" href=
3341    "#Old_Locale_Extension_Syntax" id=
3342    "Old_Locale_Extension_Syntax">3.8.1 Old Locale Extension
3343    Syntax</a></h4>
3344    <p>LDML 1.7 or older specification used different syntax for
3345    representing unicode locale extensions. The previous definition
3346    of Unicode locale extensions had the following structure:</p>
3347    <table border="0">
3348      <tr>
3349        <th>&nbsp;</th>
3350        <th>
3351          <div align="center">
3352            EBNF
3353          </div>
3354        </th>
3355      </tr>
3356      <tr>
3357        <td>old_unicode_locale_extensions</td>
3358        <td>
3359          <pre>= "@" old_key "=" old_type
3360 (";" old_key "=" old_type)*</pre>
3361        </td>
3362      </tr>
3363    </table>
3364    <p>The new specification mandates keys to be two alphanumeric
3365    characters and types to be three to eight alphanumeric
3366    characters. As the result, new codes were assigned to all
3367    existing keys and some types. For example, a new key "co"
3368    replaced the previous key "collation", a new type "phonebk"
3369    replaced the previous type "phonebook". However, the existing
3370    collation type "big5han" already satisfied the new requirement,
3371    so no new type code was assigned to the type. All new keys and
3372    types introduced after LDML 1.7 satisfy the new requirement, so
3373    they do not have aliases dedicated for the old syntax, except
3374    time zone types. The conversion between old types and new types
3375    can be done regardless of key, with one known exception (old
3376    type "traditional" is mapped to new type "trad" for collation
3377    and "traditio" for numbering system), and this relationship
3378    will be maintained in the future versions unless otherwise
3379    noted.</p>
3380    <p>The new specification introduced a new field
3381    <code>attribute</code> in addition to key/type pairs in the
3382    Unicode locale extension. When it is necessary to map a new
3383    Unicode locale identifier with <code>attribute</code> field to
3384    a well-formed old locale identifier, a special key name
3385    <i>attribute</i> with the value of entire
3386    <code>attribute</code> subtags in the new identifier is used.
3387    For example, a new identifier
3388    <code>ja-u-xxx-yyy-ca-japanese</code> is mapped to an old
3389    identifier <code>ja@attribute=xxx-yyy;calendar=japanese</code>
3390    .</p>
3391    <p>The chart below shows some example mappings between the new
3392    syntax and the old syntax.</p>
3393    <table>
3394      <caption>
3395        <a name="Locale_Extension_Mappings" href=
3396        "#Locale_Extension_Mappings" id=
3397        "Locale_Extension_Mappings">Locale Extension Mappings</a>
3398      </caption>
3399      <tr>
3400        <th>Old (LDML 1.7 or older)</th>
3401        <th>New</th>
3402      </tr>
3403      <tr>
3404        <td>de_DE@collation=phonebook</td>
3405        <td>de_DE_u_co_phonebk</td>
3406      </tr>
3407      <tr>
3408        <td>zh_Hant_TW@collation=big5han</td>
3409        <td>zh_Hant_TW_u_co_big5han</td>
3410      </tr>
3411      <tr>
3412        <td>th_TH@calendar=gregorian;numbers=thai</td>
3413        <td>th_TH_u_ca_gregory_nu_thai</td>
3414      </tr>
3415      <tr>
3416        <td>en_US_POSIX@timezone=America/Los_Angeles</td>
3417        <td>en_US_u_tz_uslax_va_posix</td>
3418      </tr>
3419    </table>
3420    <p>Where the old API is supplied the bcp47 language code, or
3421    vice versa, the recommendation is to:</p>
3422    <ol>
3423      <li>Have all methods that take the old syntax also take the
3424      new syntax, interpreted correctly. For example,
3425      "zh-TW-u-co-pinyin" and "zh_TW@collation=pinyin" would both
3426      be interpreted as meaning the same.</li>
3427      <li>Have all methods (both for old and new syntax) accept all
3428      possible aliases for keywords and types. For example,
3429      "ar-u-ca-islamicc" would be equivalent to
3430      "ar-u-ca-islamic-civil".
3431        <ul>
3432          <li>The one exception is where an alias would only be
3433          well-formed with the old syntax, such as "gregorian" (for
3434          "gregory").</li>
3435        </ul>
3436      </li>
3437      <li>Where an API cannot successfully accept the alternate
3438      syntax, throw an exception (or otherwise indicate an error)
3439      so that people can detect that they are using the wrong
3440      method (or wrong input).</li>
3441      <li>Provide a method that tests a purported locale ID string
3442      to determine its status:
3443        <ol>
3444          <li><strong>well-formed</strong> - syntactically
3445          correct</li>
3446          <li><strong>valid</strong> - well-formed and only uses
3447          registered language subtags, extensions, keywords,
3448          types...</li>
3449          <li><strong>canonical</strong> - valid and no deprecated
3450          codes or structure.</li>
3451        </ol>
3452      </li>
3453    </ol>
3454    <h4><a name="Legacy_Variants" href="#Legacy_Variants" id=
3455    "Legacy_Variants">3.8.2 Legacy Variants</a></h4>
3456    <p>Old LDML specification allowed codes other than registered
3457    [<a href="#BCP47">BCP47</a>] variant subtags used in Unicode
3458    language and locale identifiers for representing variations of
3459    locale data. Unicode locale identifiers including such variant
3460    codes can be converted to the new [<a href="#BCP47">BCP47</a>]
3461    compatible identifiers by following the descriptions below:</p>
3462    <table>
3463      <caption>
3464        <a name="Legacy_Variant_Mappings" href=
3465        "#Legacy_Variant_Mappings" id=
3466        "Legacy_Variant_Mappings">Legacy Variant Mappings</a>
3467      </caption>
3468      <tr>
3469        <th>Variant Code</th>
3470        <th>Description</th>
3471      </tr>
3472      <tr>
3473        <td>AALAND</td>
3474        <td>Åland, variant of "sv" Swedish used in Finland. Use
3475        "sv_AX" to indicate this.</td>
3476      </tr>
3477      <tr>
3478        <td>BOKMAL</td>
3479        <td>Bokmål, variant of "no" Norwegian. Use primary language
3480        subtag "nb" to indicate this.</td>
3481      </tr>
3482      <tr>
3483        <td>NYNORSK</td>
3484        <td>Nynorsk, variant of "no" Norwegian. Use primary
3485        language subtag "nn" to indicate this.</td>
3486      </tr>
3487      <tr>
3488        <td>POSIX</td>
3489        <td>POSIX variation of locale data. Use Unicode locale
3490        extension "-u-va-posix" to indicate this.</td>
3491      </tr>
3492      <tr>
3493        <td>POLYTONI</td>
3494        <td>Polytonic, variant of "el" Greek. Use [<a href=
3495        "#BCP47">BCP47</a>] variant subtag "polyton" to indicate
3496        this.</td>
3497      </tr>
3498      <tr>
3499        <td>SAAHO</td>
3500        <td>The Saaho variant of Afar. Use primary language subtag
3501        "ssy" to indicated this.</td>
3502      </tr>
3503    </table>
3504    <p>When converting to old syntax, the Unicode locale extension
3505    "-u-va-posix" should be converted to the "POSIX" variant,
3506    <i>not</i> to old extension syntax like "@va=posix". This is an
3507    exception: The other mappings above should not be reversed.</p>
3508    <p>Examples:</p>
3509    <ul>
3510      <li>en_US_POSIX ↔ en-US-u-va-posix</li>
3511      <li>en_US_POSIX@colNumeric=yes ↔ en-US-u-kn-va-posix</li>
3512      <li>en-US-POSIX-u-kn-true → en-US-u-kn-va-posix</li>
3513      <li>en-US-POSIX-u-kn-va-posix → en-US-u-kn-va-posix</li>
3514    </ul>
3515    <h4><a name="Relation_to_OpenI18n" href="#Relation_to_OpenI18n"
3516    id="Relation_to_OpenI18n">3.8.3 Relation to OpenI18n</a></h4>
3517    <p>The locale id format generally follows the description in
3518    the <i>OpenI18N Locale Naming Guideline</i> [<a href=
3519    "#NamingGuideline">NamingGuideline</a>], with some
3520    enhancements. The main differences from the those guidelines
3521    are that the locale id:</p>
3522    <ol type="a">
3523      <li style="margin-top: 0.5em; margin-bottom: 0.5em">does not
3524      include a charset (since the data in LDML format always
3525      provides a representation of all Unicode characters. The
3526      repository is stored in UTF-8, although that can be
3527      transcoded to other encodings as well.),</li>
3528      <li style="margin-top: 0.5em; margin-bottom: 0.5em">adds the
3529      ability to have a variant, as in Java</li>
3530      <li style="margin-top: 0.5em; margin-bottom: 0.5em">adds the
3531      ability to discriminate the written language by script (or
3532      script variant).</li>
3533      <li style="margin-top: 0.5em; margin-bottom: 0.5em">is a
3534      superset of [<a href="#BCP47">BCP47</a>] codes.</li>
3535    </ol>
3536    <h3><a name="Transmitting_Locale_Information" href=
3537    "#Transmitting_Locale_Information" id=
3538    "Transmitting_Locale_Information">3.9 Transmitting Locale
3539    Information</a></h3>
3540    <p>In a world of on-demand software components, with arbitrary
3541    connections between those components, it is important to get a
3542    sense of where localization should be done, and how to transmit
3543    enough information so that it can be done at that appropriate
3544    place. End-users need to get messages localized to their
3545    languages, messages that not only contain a translation of
3546    text, but also contain variables such as date, time, number
3547    formats, and currencies formatted according to the users'
3548    conventions. The strategy for doing the so-called <i>JIT
3549    localization</i> is made up of two parts:</p>
3550    <ol>
3551      <li>Store and transmit <i>neutral-format</i> data wherever
3552      possible.
3553        <ul>
3554          <li>Neutral-format data is data that is kept in a
3555          standard format, no matter what the local user's
3556          environment is. Neutral-format is also (loosely) called
3557          <i>binary data</i>, even though it actually could be
3558          represented in many different ways, including a textual
3559          representation such as in XML.</li>
3560          <li>Such data should use accepted standards where
3561          possible, such as for currency codes.</li>
3562          <li>Textual data should also be in a uniform character
3563          set (Unicode/10646) to avoid possible data corruption
3564          problems when converting between encodings.</li>
3565        </ul>
3566      </li>
3567      <li>Localize that data as "<i>close</i>" to the end-user as
3568      possible.</li>
3569    </ol>
3570    <p>There are a number of advantages to this strategy. The
3571    longer the data is kept in a neutral format, the more flexible
3572    the entire system is. On a practical level, if transmitted data
3573    is neutral-format, then it is much easier to manipulate the
3574    data, debug the processing of the data, and maintain the
3575    software connections between components.</p>
3576    <p>Once data has been localized into a given language, it can
3577    be quite difficult to programmatically convert that data into
3578    another format, if required. This is especially true if the
3579    data contains a mixture of translated text and formatted
3580    variables. Once information has been localized into, say,
3581    Romanian, it is much more difficult to localize that data into,
3582    say, French. Parsing is more difficult than formatting, and may
3583    run up against different ambiguities in interpreting text that
3584    has been localized, even if the original translated message
3585    text is available (which it may not be).</p>
3586    <p>Moreover, the closer we are to end-user, the more we know
3587    about that user's preferred formats. If we format dates, for
3588    example, at the user's machine, then it can easily take into
3589    account any customizations that the user has specified. If the
3590    formatting is done elsewhere, either we have to transmit
3591    whatever user customizations are in play, or we only transmit
3592    the user's locale code, which may only approximate the desired
3593    format. Thus the closer the localization is to the end user,
3594    the less we need to ship all of the user's preferences around
3595    to all the places that localization could possibly need to be
3596    done.</p>
3597    <p>Even though localization should be done as close to the
3598    end-user as possible, there will be cases where different
3599    components need to be aware of whatever settings are
3600    appropriate for doing the localization. Thus information such
3601    as a locale code or time zone needs to be communicated between
3602    different components.</p>
3603    <h4><a name="Message_Formatting_and_Exceptions" href=
3604    "#Message_Formatting_and_Exceptions" id=
3605    "Message_Formatting_and_Exceptions">3.9.1 Message Formatting
3606    and Exceptions</a></h4>
3607    <p>Windows (<a href=
3608    "https://msdn.microsoft.com/en-us/library/ms679351.aspx">FormatMessage</a>,
3609    <a href=
3610    "https://msdn.microsoft.com/en-us/library/aa331875.aspx">String.Format</a>),
3611    Java (<a href=
3612    "https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html">MessageFormat</a>)
3613    and ICU (<a href=
3614    "http://www.icu-project.org/apiref/icu4c/classMessageFormat.html">MessageFormat</a>,
3615    <a href=
3616    "http://www.icu-project.org/apiref/icu4c/umsg_8h.html">umsg</a>)
3617    all provide methods of formatting variables (dates, times, etc)
3618    and inserting them at arbitrary positions in a string. This
3619    avoids the manual string concatenation that causes severe
3620    problems for localization. The question is, where to do this?
3621    It is especially important since the original code site that
3622    originates a particular message may be far down in the bowels
3623    of a component, and passed up to the top of the component with
3624    an exception. So we will take that case as representative of
3625    this class of issues.</p>
3626    <p>There are circumstances where the message can be
3627    communicated with a language-neutral code, such as a numeric
3628    error code or mnemonic string key, that is understood outside
3629    of the component. If there are arguments that need to accompany
3630    that message, such as a number of files or a datetime, those
3631    need to accompany the numeric code so that when the
3632    localization is finally at some point, the full information can
3633    be presented to the end-user. This is the best case for
3634    localization.</p>
3635    <p>More often, the exact messages that could originate from
3636    within the component are not known outside of the component
3637    itself; or at least they may not be known by the component that
3638    is finally displaying text to the user. In such a case, the
3639    information as to the user's locale needs to be communicated in
3640    some way to the component that is doing the localization. That
3641    locale information does not necessarily need to be communicated
3642    deep within the component; ideally, any exceptions should
3643    bundle up some language-neutral message ID, plus the arguments
3644    needed to format the message (for example, datetime), but not
3645    do the localization at the throw site. This approach has the
3646    advantages noted above for JIT localization.</p>
3647    <p>In addition, exceptions are often caught at a higher level;
3648    they do not end up being displayed to any end-user at all. By
3649    avoiding the localization at the throw site, it the cost of
3650    doing formatting, when that formatting is not really necessary.
3651    In fact, in many running programs most of the exceptions that
3652    are thrown at a low level never end up being presented to an
3653    end-user, so this can have considerable performance
3654    benefits.</p>
3655    <h3><a name="Language_and_Locale_IDs" href=
3656    "#Language_and_Locale_IDs" id="Language_and_Locale_IDs">3.10
3657    Unicode Language and Locale IDs</a></h3>
3658    <p>People have very slippery notions of what distinguishes a
3659    language code versus a locale code. The problem is that both
3660    are somewhat nebulous concepts.</p>
3661    <p>In practice, many people use [<a href="#BCP47">BCP47</a>]
3662    codes to mean locale codes instead of strictly language codes.
3663    It is easy to see why this came about; because [<a href=
3664    "#BCP47">BCP47</a>] includes an explicit region (territory)
3665    code, for most people it was sufficient for use as a locale
3666    code as well. For example, when typical web software receives
3667    an [<a href="#BCP47">BCP47</a>] code, it will use it as a
3668    locale code. Other typical software will do the same: in
3669    practice, language codes and locale codes are treated
3670    interchangeably. Some people recommend distinguishing on the
3671    basis of "-" versus "_" (for example, <i>zh-TW</i> for language
3672    code, <i>zh_TW</i> for locale code), but in practice that does
3673    not work because of the free variation out in the world in the
3674    use of these separators. Notice that Windows, for example, uses
3675    "-" as a separator in its locale codes. So pragmatically one is
3676    forced to treat "-" and "_" as equivalent when interpreting
3677    either one on input.</p>
3678    <p>Another reason for the conflation of these codes is that
3679    <i>very</i> little data in most systems is distinguished by
3680    region alone; currency codes and measurement systems being some
3681    of the few. Sometimes date or number formats are mentioned as
3682    regional, but that really does not make much sense. If people
3683    see the sentence "You will have to adjust the value to
3684    १,२३४.५६७ from ૭૧,૨૩૪.૫૬" (using Indic digits), they would say
3685    that sentence is simply not English. Number format is far more
3686    closely associated with language than it is with region. The
3687    same is true for date formats: people would never expect to see
3688    intermixed a date in the format "2003年4月1日" (using Kanji) in
3689    text purporting to be purely English. There are regional
3690    differences in date and number format — differences which can
3691    be important — but those are different in kind than other
3692    language differences between regions.</p>
3693    <p>As far as we are concerned — <i>as a completely practical
3694    matter</i> — two languages are different if they require
3695    substantially different localized resources. Distinctions
3696    according to spoken form are important in some contexts, but
3697    the written form is by far and away the most important issue
3698    for data interchange. Unfortunately, this is not the principle
3699    used in [<a href="#ISO639">ISO639</a>], which has the fairly
3700    unproductive notion (for data interchange) that only spoken
3701    language matters (it is also not completely consistent about
3702    this, however).</p>
3703    <p>[<a href="#BCP47">BCP47</a>] <i><b>can</b></i> express a
3704    difference if the use of written languages happens to
3705    correspond to region boundaries expressed as [<a href=
3706    "#ISO3166">ISO3166</a>] region codes, and has recently added
3707    codes that allow it to express some important cases that are
3708    not distinguished by [<a href="#ISO3166">ISO3166</a>] codes.
3709    These written languages include simplified and traditional
3710    Chinese (both used in Hong Kong S.A.R.); Serbian in Latin
3711    script; Azerbaijani in Arab script, and so on.</p>
3712    <p>Notice also that <i>currency codes</i> are different than
3713    <i>currency localizations</i>. The currency localizations
3714    should largely be in the language-based resource bundles, not
3715    in the territory-based resource bundles. Thus, the resource
3716    bundle <i>en</i> contains the localized mappings in English for
3717    a range of different currency codes: USD → US$, RUR → Rub, AUD
3718    → $A and so on. Of course, some currency symbols are used for
3719    more than one currency, and in such cases specializations
3720    appear in the territory-based bundles. Continuing the example,
3721    <i>en_US</i> would have USD → $, while <i>en_AU</i> would have
3722    AUD → $. (In protocols, the currency codes should always
3723    accompany any currency amounts; otherwise the data is
3724    ambiguous, and software is forced to use the user's territory
3725    to guess at the currency. For some informal discussion of this,
3726    see <a href=
3727    "http://source.icu-project.org/repos/icu/icuhtml/trunk/design/jit_localization.html">
3728    JIT Localization</a>.)</p>
3729    <h4><a name="Written_Language" href="#Written_Language" id=
3730    "Written_Language">3.10.1 Written Language</a></h4>
3731    <p>Criteria for what makes a written language should be purely
3732    pragmatic; <i>what would copy-editors say?</i> If one gave them
3733    text like the following, they would respond that is far from
3734    acceptable English for publication, and ask for it to be
3735    redone:</p>
3736    <ol>
3737      <li type="A">"Theatre Center News: The date of the last
3738      version of this document was 2003年3月20日. A copy can be
3739      obtained for $50,0 or 1.234,57 грн. We would like to
3740      acknowledge contributions by the following authors (in
3741      alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed
3742      Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug
3743      Felt."</li>
3744    </ol>
3745    <p>So one would change it to either B or C below, depending on
3746    which orthographic variant of English was the target for the
3747    publication:</p>
3748    <ol type="A" start="2">
3749      <li>"Theater Center News: The date of the last version of
3750      this document was 3/20/2003. A copy can be obtained for
3751      $50.00 or 1,234.57 Ukrainian Hryvni. We would like to
3752      acknowledge contributions by the following authors (in
3753      alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus
3754      Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric
3755      Mader."</li>
3756      <li>"Theatre Centre News: The date of the last version of
3757      this document was 20/3/2003. A copy can be obtained for
3758      $50.00 or 1,234.57 Ukrainian Hryvni. We would like to
3759      acknowledge contributions by the following authors (in
3760      alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus
3761      Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric
3762      Mader."</li>
3763    </ol>
3764    <p>Clearly there are many acceptable variations on this text.
3765    For example, copy editors might still quibble with the use of
3766    first versus last name sorting in the list, but clearly the
3767    first list was <i>not</i> acceptable English alphabetical
3768    order. And in quoting a name, like "Theatre Centre News", one
3769    may leave it in the source orthography even if it differs from
3770    the publication target orthography. And so on. However, just as
3771    clearly, there limits on what is acceptable English, and
3772    "2003年3月20日", for example, is <i>not</i>.</p>
3773    <p>Note that the language of locale data may differ from the
3774    language of localized software or web sites, when those latter
3775    are not localized into the user's preferred language. In such
3776    cases, the kind of incongruous juxtapositions described above
3777    may well appear, but this situation is usually preferable to
3778    forcing unfamiliar date or number formats on the user as
3779    well.</p>
3780    <h4><a name="Hybrid_Locale" href="#Hybrid_Locale" id=
3781    "Hybrid_Locale">3.10.2 Hybrid Locale Identifiers</a></h4>
3782    <p>Hybrid locales have intermixed content from 2 (or more)
3783    languages, often with one language's grammatical structure
3784    applied to words in another. These are commonly referred to
3785    with portmanteau words such as&nbsp;<em>Franglais, <a href=
3786    "https://en.oxforddictionaries.com/definition/spanglish">​Spanglish</a></em>
3787    or <em>Denglish</em>. Hybrid locales do not&nbsp;<em>not</em>
3788    reference text simply containing two languages: a book of
3789    parallel text containing English and French, such as the
3790    following, is not Franglais:</p>
3791    <table style='margin-left:2em; margin-right:2em'>
3792      <tbody>
3793        <tr>
3794          <td width='50%' style='font-family:serif'>On the 24th of
3795          May, 1863, my uncle, Professor Liedenbrock, rushed into
3796          his little house, No. 19 Königstrasse, one of the oldest
3797          streets in the oldest portion of the city of
3798          Hamburg…</td>
3799          <td style='font-family:serif'>Le 24 mai 1863, un
3800          dimanche, mon oncle, le professeur Lidenbrock, revint
3801          précipitamment vers sa petite maison située au numéro 19
3802          de Königstrasse, l’une des plus anciennes rues du vieux
3803          quartier de Hambourg…</td>
3804        </tr>
3805      </tbody>
3806    </table>
3807    <p>While text in a document can be tagged as partly in one
3808    language and partly in another, that is not the same having a
3809    hybrid locale. There is a difference between having a Spanglish
3810    document, and a Spanish document that has some passages quoted
3811    in English. Fine-grained tagging doesn't handle grammatical
3812    combinations like Denglisch “<a href=
3813    "https://www.duden.de/rechtschreibung/downloaden">​gedownloadet</a>”,
3814    which is neither English nor German — similarly the Franglais
3815    “<a href=
3816    'https://www.le-dictionnaire.com/definition.php?mot=downloader'>downloadé</a>”.
3817    More importantly, it doesn’t work for the very common use case
3818    for a <a href="#unicode_locale_id">unicode_locale_id</a>:
3819    <i>locale selection</i>.</p>
3820    <p>To communicate requests for localized content and
3821    internationalization services, locales are used. When people
3822    pick a language from a menu, internally they are picking a
3823    locale (en-GB, es-419, etc.). To allow an application to
3824    support Spanglish or Hinglish locale selection, <a href=
3825    "#unicode_locale_id">unicode_locale_id</a>s can represent
3826    hybrid locales using the T extension key-value 'h0-hybrid'.
3827    (For more information on the T extension, see <em>Section 3.7
3828    <a href="#t_Extension">Unicode BCP 47 T
3829    Extension</a>.</em>)</p>
3830    <p>Examples:</p>
3831    <table class='simple'>
3832      <tbody>
3833        <tr>
3834          <td>hi-t-<u>en-h0-hybrid</u></td>
3835          <td>Hinglish</td>
3836          <td>Hindi-English hybrid locale</td>
3837        </tr>
3838        <tr>
3839          <td>ta-t-<u>en-h0-hybrid</u></td>
3840          <td>Tanglish</td>
3841          <td>Tamil-English hybrid locale</td>
3842        </tr>
3843        <tr>
3844          <td>ba-t-<u>en-h0-hybrid</u></td>
3845          <td>Banglish</td>
3846          <td>Bangla-English hybrid locale</td>
3847        </tr>
3848        <tr>
3849          <td colspan="3">…</td>
3850        </tr>
3851        <tr>
3852          <td>en-t-<u>hi-h0-hybrid</u></td>
3853          <td>Hinglish</td>
3854          <td>English-Hindi hybrid locale</td>
3855        </tr>
3856        <tr>
3857          <td>en-t-<u>zh-h0-hybrid</u></td>
3858          <td>Chinglish</td>
3859          <td>English-Chinese hybrid locale</td>
3860        </tr>
3861        <tr>
3862          <td colspan="3">…</td>
3863        </tr>
3864      </tbody>
3865    </table>
3866    <blockquote>
3867      <p><em>Note: The <a href=
3868      "#unicode_language_id">unicode_language_id</a> should be the
3869      language used as the ‘scaffold’: for the fallback locale for
3870      internationalization services, typically used for more of the
3871      core vocabulary/structure in the content. Thus Hinglish
3872      should be represented as hi-t-h0-en where Hindi is the
3873      scaffold, and as en-t-h0-hi where English is.</em></p>
3874    </blockquote>
3875    <p>The value of -t- is a full <em><a href=
3876    "#unicode_language_id">unicode_language_id</a></em>, and can
3877    contain subtags for script or region where it is important to
3878    include them, as in the following. It may be useful in order to
3879    emphasize the script, even where it is the default script for
3880    the language, if it is not the same as the script of the main
3881    language tag.</p>
3882    <table class='simple'>
3883      <tbody>
3884        <tr>
3885          <td>ru-t<u>-en-latn-gb-h0-hybrid</u></td>
3886          <td>Runglish</td>
3887          <td>Russian with an admixture of British English in Latin
3888          script</td>
3889        </tr>
3890        <tr>
3891          <td>ru-t-<u>en-cyrl-gb-h0-hybrid</u></td>
3892          <td>Runglish</td>
3893          <td>Russian with an admixture of British English in
3894          Cyrillic script</td>
3895        </tr>
3896      </tbody>
3897    </table>
3898    <p>Should there ever be strong need for hybrids of more than
3899    two languages or for other purposes such as hybrid languages as
3900    the source of translated content, additional structure could be
3901    added.</p>
3902    <h3><a name="Validity_Data" href="#Validity_Data" id=
3903    "Validity_Data">3.11 Validity Data</a></h3>
3904    <p class='dtd'>&lt;!ELEMENT idValidity (id*) &gt;<br>
3905    &lt;!ELEMENT id ( #PCDATA ) &gt;<br>
3906    &lt;!ATTLIST id type NMTOKEN #REQUIRED &gt;<br>
3907    &lt;!ATTLIST id idStatus NMTOKEN #REQUIRED &gt;</p>
3908    <p>The directory <a href=
3909    'https://github.com/unicode-org/cldr/releases/tag/latest/common/validity/'>common/validity</a>
3910    contains machine-readable data for validating the language,
3911    region, script, and variant subtags, as well as currency,
3912    subdivisions and measure units. Each file contains a number of
3913    subtags with the following <strong>idStatus</strong>
3914    values:</p>
3915    <ul>
3916      <li><strong>regular</strong> — the standard codes used for
3917      the specific type of subtag</li>
3918      <li><strong>special</strong> — certain exceptional language
3919      codes like 'mul' <em>(languages only)</em></li>
3920      <li><strong>unknown</strong> — the code used to indicate the
3921      "unknown", "undetermined" or "invalid" values. For more
3922      information, see <em>Section 3.5.1 <a href=
3923      "#Unknown_or_Invalid_Identifiers">Unknown or Invalid
3924      Identifiers</a></em>.</li>
3925      <li>
3926        <strong>macroregion</strong> — the standard codes that are
3927        macroregions <em>(for regions only).</em>
3928        <ul>
3929          <li>Note that some two-letter region codes are
3930          macroregions, and (in the future) some three-digit codes
3931          may be regular codes.</li>
3932          <li>For details as to which regions are contained within
3933          which macroregions, see the
3934          <strong>&lt;containment&gt;</strong> element of the
3935          supplemental data.</li>
3936        </ul>
3937      </li>
3938      <li><strong>deprecated</strong> — codes that should not be
3939      used. The <strong>&lt;alias&gt;</strong> element in the
3940      supplementalMeta file contains more information about these
3941      codes, and which codes should be used instead.</li>
3942      <li><strong>private_use</strong> — codes that, for CLDR, are
3943		  considered private use. Note that some private-use
3944		  codes in a source standard such as BCP47 have defined CLDR semantics, and are considered regular
3945      codes. For more information, see <em>Section 3.5.3 <a href=
3946      "#Private_Use_Codes">Private Use Codes</a>.</em></li>
3947      <li><strong>reserved</strong> — codes that are private use in a source standard, but are reserved for future use as regular codes by CLDR.</li>
3948    </ul>
3949    <p>The list of subtags for each idStatus use a compact format
3950    as a space-delimited list of StringRanges, as defined in
3951    <em>Section <a href="#String_Range">5.3.4 String
3952    Range</a>.</em> The separator for each StringRange is a
3953    "~".</p>
3954    <p>Each measure unit is a sequence of subtags, such as
3955    “angle-arc-minute”. The first subtag provides a general
3956    “category” of the unit.</p>
3957    <p>In version 28.0, the subdivisions in the validity files used
3958    the ISO format, uppercase with a hyphen separating two
3959    components, instead of the BCP 47 format.</p>
3960    <h2><a name="Locale_Inheritance" href="#Locale_Inheritance" id=
3961    "Locale_Inheritance">4 Locale Inheritance and Matching</a></h2>
3962    <p>The XML format relies on an inheritance model, whereby the
3963    resources are collected into <i>bundles</i>, and the bundles
3964    organized into a tree. Data for the many Spanish locales does
3965    not need to be duplicated across all of the countries having
3966    Spanish as a national language. Instead, common data is
3967    collected in the Spanish language locale, and territory locales
3968    only need to supply differences. The parent of all of the
3969    language locales is a generic locale known as <i>root</i>.
3970    Wherever possible, the resources in the root are language &amp;
3971    territory neutral. For example, the collation (sorting) order
3972    in the root is based on the [<a href="#DUCET">DUCET</a>]
3973    (see<em><a href="tr35-collation.html#Root_Collation">Root
3974    Collation</a></em>). Since English language collation has the
3975    same ordering as the root locale, the 'en' locale data does not
3976    need to supply any collation data, nor do the 'en_US', 'en_GB'
3977    or the any of the various other locales that use English.</p>
3978    <p>Given a particular locale id "en_US_someVariant", the search
3979    chain for a particular resource is the following.</p>
3980    <blockquote>
3981      <pre>en_US_someVariant
3982en_US
3983en
3984root</pre>
3985    </blockquote>
3986    <p><em>The inheritance is often not simple truncation, as will
3987    be seen later in this section.</em></p>
3988    <p>If a type and key are supplied in the locale id, then
3989    logically the chain from that id to the root is searched for a
3990    resource tag with a given type, all the way up to root. If no
3991    resource is found with that tag and type, then the chain is
3992    searched again without the type.</p>
3993    <p>Thus the data for any given locale will only contain
3994    resources that are different from the parent locale. For
3995    example, most territory locales will inherit the bulk of their
3996    data from the language locale: "en" will contain the bulk of
3997    the data: "en_IE" will only contain a few items like currency.
3998    All data that is inherited from a parent is presumed to be
3999    valid, just as valid as if it were physically present in the
4000    file. This provides for much smaller resource bundles, and much
4001    simpler (and less error-prone) maintenance. At the script or
4002    region level, the "primary" child locale will be empty, since
4003    its parent will contain all of the appropriate resources for
4004    it. For more information see <i>CLDR Information : Section 9.3
4005    <a href="tr35-info.html#Default_Content">Default
4006    Content</a>.</i></p>
4007    <p>Certain data items depend only on the region specified in a
4008    locale id (by a <a href=
4009    "#unicode_region_subtag_validity">unicode_region_subtag</a> or
4010    an “rg” <a href="#RegionOverride">Region Override</a> key) ,
4011    and are obtained from supplemental data rather than through
4012    locale resources. For example:</p>
4013    <ul>
4014      <li>The currency for the specified region (see <a href=
4015      "tr35-numbers.html#Supplemental_Currency_Data">Supplemental
4016      Currency Data</a>)</li>
4017      <li>The measurement system for the specified region (see
4018      <a href=
4019      "tr35-general.html#Measurement_System_Data">Measurement
4020      System Data</a>)</li>
4021      <li>The week conventions for the specified region (see
4022      <a href="tr35-dates.html#Week_Data">Week Data</a>)</li>
4023    </ul>
4024    <p>(For more information on the specific items handled this
4025    way, see <a href=
4026    "tr35-info.html#Territory_Based_Preferences">Territory-Based
4027    Preferences</a>.) These items will be correct for the specified
4028    region regardless of whether a locale bundle actually exists
4029    with the same combination of language and region as in the
4030    locale id. For example, suppose data is requested for the
4031    locale id "fr_US" and there is no bundle for that combination.
4032    Data obtained via locale inheritance, such as currency patterns
4033    and currency symbols, will be obtained from the parent locale
4034    "fr". However, currency amounts would be formatted by default
4035    using US dollars, just displayed in the manner governed by the
4036    locale "fr". When a locale id does not specify a region, the
4037    region-specific items such as those above are obtained from the
4038    likely region for the locale (obtained via <a href=
4039    "#Likely_Subtags">Likely Subtags</a>).</p>
4040    <p>For the relationship between Inheritance, DefaultContent,
4041    LikelySubtags, and LocaleMatching, see Section 4.2.6 <a href=
4042    "tr35.html#Inheritance_vs_Related">Inheritance vs Related
4043    Information</a>.</p>
4044    <h3><a href="#Lookup" name="Lookup" id="Lookup">4.1
4045    Lookup</a></h3>
4046    <p>If a language has more than one script in customary modern
4047    use, then the CLDR file structure in common/main follows the
4048    following model:</p>
4049    <blockquote>
4050      <p>lang<br>
4051      lang_script<br>
4052      lang_script_region<br>
4053      lang_region <i>(aliases to lang_script_region)</i></p>
4054    </blockquote>
4055    <h4><a href="#Bundle_vs_Item_Lookup" name=
4056    "Bundle_vs_Item_Lookup" id="Bundle_vs_Item_Lookup">4.1.1 Bundle
4057    vs Item Lookup</a></h4>
4058    <p>There are actually two different kinds of inheritance
4059    fallback: <em>resource&nbsp;bundle&nbsp;lookup</em> and
4060    <em>resource&nbsp;item&nbsp;lookup</em>. For the former, a
4061    process is looking to find the first, best resource bundle it
4062    can; for the later, it is fallback&nbsp;within&nbsp;bundles on
4063    individual items, like the translated name for the region "CN"
4064    in Breton.</p>
4065    <p>These are closely related, but distinct, processes. They are
4066    illustrated in the table <a href="#Lookup-Differences">Lookup
4067    Differences</a>, where "key" stands for zero or more key/type
4068    pairs. Logically speaking, when looking up an item for a given
4069    locale, you first do a resource bundle lookup to find the best
4070    bundle for the locale, then you do a inherited item lookup
4071    starting with that resource bundle.</p>
4072    <p>The table <a href="#Lookup-Differences">Lookup
4073    Differences</a> uses the naïve resource bundle lookup for
4074    illustration. More sophisticated systems will get far better
4075    results for resource bundle lookup if they use the algorithm
4076    described in <em>Section 4.4 <a href=
4077    "#LanguageMatching">Language Matching</a></em>. That algorithm
4078    takes into account both the user’s desired locale(s) and the
4079    application’s supported locales, in order to get the best
4080    match.</p>
4081    <p>If the naïve resource bundle lookup is used, the desired
4082    locale needs to be canonicalized using 4.3 <a href=
4083    "#Likely_Subtags">Likely Subtags</a> and the supplemental alias
4084    information, so that locales that CLDR considers identical are
4085    treated as such. Thus eng-Latn-GB should be mapped to en-GB,
4086    and cmn-TW mapped to zh-Hant-TW.</p>
4087    <p>For the purposes of CLDR, everything with the &lt;ldml&gt;
4088    dtd is treated logically as if it is one resource bundle, even
4089    if the implementation separates data into separate physical
4090    resource bundles. For example, suppose that there is a main XML
4091    file for Nama (naq), but there are no &lt;unit&gt; elements for
4092    it because the units are all inherited from root. If the
4093    &lt;unit&gt; elements are separated into a separate data tree
4094    for modularity in the implementation, the Nama &lt;unit&gt;
4095    resource bundle would be empty. However, for purposes of
4096    resource-bundle lookup the resource bundle lookup still stops
4097    at naq.xml.</p>
4098    <div id="iqaw2" style="margin-top: 0px; margin-bottom: 0px;">
4099      <table class='simple' id="a1bn" border="1" cellpadding="3"
4100      cellspacing="0">
4101        <caption>
4102          <a href="#Lookup-Differences" name="Lookup-Differences"
4103          id="Lookup-Differences">Lookup Differences</a>
4104        </caption>
4105        <tbody id="iqaw3">
4106          <tr id="x40y0">
4107            <th id="x40y1" style="vertical-align: top;" nowrap>
4108            Lookup Type</th>
4109            <th id="x40y3" style="vertical-align: top;" nowrap>
4110            Example</th>
4111            <th id="x40y5" style="vertical-align: top;">
4112            Comments</th>
4113          </tr>
4114          <tr id="iqaw4">
4115            <td id="iqaw5" style="vertical-align: top;" nowrap>
4116              <p id="rkc40"><strong>Resource bundle</strong>
4117              lookup</p>
4118            </td>
4119            <td id="iqaw7" style="vertical-align: top;" nowrap>
4120              <p>se-FI&nbsp;→</p>
4121              <p>se&nbsp; →</p>
4122              <p><em>default-locale*&nbsp;&nbsp;→</em></p>
4123              <p>root</p>
4124            </td>
4125            <td id="rkc41" style="vertical-align: top;">
4126              <p>* The default-locale may have its own inheritance
4127              change; for example, it may be "en-GB&nbsp;→&nbsp;en"
4128              In that case, the chain is expanded by inserting the
4129              chain, resulting in:</p>
4130              <blockquote>
4131                <p>se-FI →</p>
4132                <p>se →</p>
4133                <p>fi →</p>
4134                <p><em>en-GB →</em></p>
4135                <p><em>en →</em></p>
4136                <p>root</p>
4137              </blockquote>
4138            </td>
4139          </tr>
4140          <tr id="iqaw9">
4141            <td id="iqaw10" style="vertical-align: top;" nowrap>
4142              <p><strong>Inherited item</strong> lookup</p>
4143            </td>
4144            <td id="iqaw12" style="vertical-align: top;" nowrap>
4145              <p>se-FI+key&nbsp;→</p>
4146              <p>se+key →</p>
4147              <p><em>root_alias*+key&nbsp;</em></p>
4148              <p>→&nbsp;root+key</p>
4149            </td>
4150            <td id="rkc43" style="vertical-align: top;">
4151              <p>* If there is a root_alias to another key or
4152              locale, then insert that entire chain. For example,
4153              suppose that months for another calendar system have
4154              a root alias to Gregorian months. In that case, the
4155              root alias would change the key, and retry from se-FI
4156              downward. This can happen multiple times.</p>
4157              <blockquote>
4158                <p>se-FI+key&nbsp;→</p>
4159                <p>se+key →</p>
4160                <p>root_alias*+key →</p>
4161                <p><em>se-FI+key2&nbsp;→</em></p>
4162                <p><em>se+key2 →</em></p>
4163                <p>root_alias*+key2 →</p>
4164                <p>root+key2</p>
4165              </blockquote>
4166            </td>
4167          </tr>
4168        </tbody>
4169      </table>
4170    </div>
4171    <p>Both the resource bundle inheritance and the inherited item
4172    inheritance use the parentLocale data, where available, instead
4173    of simple trunctation.</p>
4174    <p>The fallback is a bit different for these two cases;
4175    internal aliases and keys are are not involved in the bundle
4176    lookup, and the default locale is not involved in the item
4177    lookup. If the default-locale were used in the resource-item
4178    lookup, then strange results will occur. For example, suppose
4179    that the default locale is Swedish, and there is a Nama locale
4180    but no specific inherited item for collation. If the
4181    default-locale were used in resource-item lookup, it would
4182    produce odd and unexpected results for Nama sorting.</p>
4183    <p>The default locale is not even always used in resource
4184    bundle inheritance. For the following services, the fallback is
4185    always directly to the root locale rather than through default
4186    locale.</p>
4187    <ul>
4188      <li>collation</li>
4189      <li>break iteration</li>
4190      <li>case mapping</li>
4191      <li>transliteration
4192        <ul>
4193          <li>The lookup for transliteration is yet more
4194          complicated because of the interplay of source and target
4195          locales: see <em>Part 2 General, Section
4196          10.1&nbsp;<a href=
4197          "https://www.unicode.org/reports/tr35/tr35-general.html#Inheritance">Inheritance.</a></em></li>
4198        </ul>
4199      </li>
4200    </ul>
4201    <p>Thus if there is no Akan locale, for example, asking for a
4202    collation for Akan should produce the root collation, <em>not
4203    the Swedish collation.</em></p>
4204    <p>The inherited item lookup must remain stable, because the
4205    resources are built with a certain fallback in mind; changing
4206    the core fallback order can render the bundle structure
4207    incoherent.</p>
4208    <p>Resource bundle lookup, on the other hand, is more flexible;
4209    changes in the view of the "best" match between the input
4210    request and the output bundle are more tolerant, when represent
4211    overall improvements for users. For more information, see
4212    <i><a href="#Fallback_Elements">A.1 Element
4213    fallback</a></i>.</p>
4214    <p>Where the LDML inheritance relationship does not match a
4215    target system, such as POSIX, the data logically should be
4216    fully resolved in converting to a format for use by that
4217    system, by adding <i>all</i> inherited data to each locale data
4218    set.</p>
4219    <p>For a more complete description of how inheritance applies
4220    to data, and the use of keywords, see <i><a href=
4221    "#Inheritance_and_Validity">Section 4.2 Inheritance</a></i>
4222    .</p>
4223    <p>The locale data does not contain general character
4224    properties that are derived from the <i>Unicode Character
4225    Database</i> [<a href=
4226    "https://unicode.org/reports/tr41/#UAX44">UAX44</a>]. That data
4227    being common across locales, it is not duplicated in the
4228    bundles. Constructing a POSIX locale from the CLDR data
4229    requires use of UCD data. In addition, POSIX locales may also
4230    specify the character encoding, which requires the data to be
4231    transformed into that target encoding.</p>
4232    <p><b>Warning:</b> If a locale has a different script than its
4233    parent (for example, sr_Latn), then special attention must be
4234    paid to make sure that all inheritance is covered. For example,
4235    auxiliary exemplar characters may need to be empty ("[]") to
4236    block inheritance.</p>
4237    <p><strong>Empty Override:</strong> There is one special value
4238    reserved in LDML to indicate that a child locale is to have no
4239    value for a path, even if the parent locale has a value for
4240    that path. That value is "∅∅∅". For example, if there is no
4241    phrase for "two days ago" in a language, that can be indicated
4242    with:</p>
4243    <pre>&lt;field type="day"&gt;
4244  &lt;relative type="-2"&gt;∅∅∅&lt;/relative&gt;
4245</pre>
4246    <h4><a name="Multiple_Inheritance" id=
4247    "Multiple_Inheritance"></a><a name="Lateral_Inheritance" href=
4248    "#Lateral_Inheritance" id="Lateral_Inheritance">4.1.2 Lateral
4249    Inheritance</a></h4>
4250    <p>In the following instances, resources may inherit from
4251    within the same locale, <em>before inheriting from the parent</em>. </p>
4252
4253    <table border="1" cellpadding="3" cellspacing=
4254    "0" class='simple' >
4255      <tbody>
4256        <tr>
4257          <th nowrap style="vertical-align: top;">Element</th>
4258          <th nowrap style="vertical-align: top;">Source</th>
4259          <th nowrap style="vertical-align: top;">Context</th>
4260        </tr>
4261        <tr>
4262          <td  style="vertical-align: top;">currency/pattern</td>
4263          <td   style="vertical-align: top;">currencyFormat</td>
4264          <td   style="vertical-align: top;">numberSystem = defaultNumberingSystem, unless otherwise specified*<br>
4265            currencyFormatLength  type=none, unless otherwise specified<br>
4266            currencyFormat type=&quot;standard&quot;, unless otherwise specified</td>
4267        </tr>
4268        <tr>
4269          <td  style="vertical-align: top;">currency/decimal</td>
4270          <td   style="vertical-align: top;">symbols/decimal</td>
4271          <td  style="vertical-align: top;">numberSystem = defaultNumberingSystem, unless otherwise specified</td>
4272        </tr>
4273        <tr>
4274          <td  style="vertical-align: top;">currency/group</td>
4275          <td  style="vertical-align: top;">symbols/group</td>
4276          <td style="vertical-align: top;">numberSystem = defaultNumberingSystem, unless otherwise specified</td>
4277        </tr>
4278      </tbody>
4279    </table>
4280    <p>* The &quot;unless otherwise specified&quot; clause is for when an API or other context indicates a different choice, such as <span style="vertical-align: top;">currencyFormat type=&quot;accounting&quot;</span>.    </p>
4281    <p>For example, with 			/currency [@type=&quot;CVE&quot;], the decimal symbol for almost all locales is the value from symbols/decimal, but for pt_CV it is explicitly							&lt;decimal&gt;$&lt;/decimal&gt;.</p>
4282    <p>&nbsp;</p>
4283    <p>The following attributes use lateral inheritance for all elements with the DTD root = ldml, except where otherwise noted. The process is applied recursively.</p>
4284    <table border="1" cellpadding="3" cellspacing=
4285    "0" class='simple' >
4286      <tbody>
4287        <tr>
4288          <th nowrap style="vertical-align: top;">Atttribute</th>
4289          <th nowrap style="vertical-align: top;">Fallback</th>
4290          <th nowrap style="vertical-align: top;">Exception Elements</th>
4291        </tr>
4292        <tr>
4293          <td  style="vertical-align: top;">case</td>
4294          <td   style="vertical-align: top;">&quot;nominative&quot; → ∅</td>
4295          <td   style="vertical-align: top;">caseMinimalPairs</td>
4296        </tr>
4297        <tr>
4298          <td  style="vertical-align: top;">gender</td>
4299          <td   style="vertical-align: top;">default_gender(locale) → ∅</td>
4300          <td   style="vertical-align: top;">genderMinimalPairs</td>
4301        </tr>
4302        <tr>
4303          <td  style="vertical-align: top;">count</td>
4304          <td  style="vertical-align: top;">plural_rules(locale, x)  → &quot;other&quot;  → ∅</td>
4305          <td  style="vertical-align: top;">minDays, pluralMinimalPairs</td>
4306        </tr>
4307        <tr>
4308          <td  style="vertical-align: top;">ordinal</td>
4309          <td  style="vertical-align: top;">plural_rules(locale, x)  → &quot;other&quot;  → ∅</td>
4310          <td  style="vertical-align: top;">ordinalMinimalPairs</td>
4311        </tr>
4312      </tbody>
4313    </table>
4314    <p>The gender fallback is to neuter if the locale has a neuter gender, otherwise masculine. This may be extended in the future if necessary. See also <a href="tr35-general.html#Grammatical_Features">Part 2, Section 15, Grammatical Features</a>.</p>
4315
4316    <p>For example,    if there is no value for a path, and that path has a
4317      [@count="x"] attribute and value, then:</p>
4318    <ol>
4319      <li>If &quot;x&quot; is numeric, the path falls back to the path with [@count=«the plural rules category for x for that locale»], within that the same locale.
4320        <ol>
4321          <li>For example, [@count="0"] for English falls back to @count="other"], while for French falls back to [@count="one"].</li>
4322        </ol>
4323      </li>
4324      <li>If "x" is anything but "other", it falls back to
4325        a path [@count="other"], within that the same locale.</li>
4326      <li>If &quot;x&quot; is &quot;other&quot;,
4327       it falls back to the path
4328      that is completely missing the count item, within that the same locale.</li>
4329      <li>If there is no value for that path the same locale, the same
4330      process is used for the original path in the parent locale.</li>
4331    </ol>
4332
4333	  	    <p>A path may have multiple attributes with lateral inheritance. In such a case, all of the combinations are tried, and in the order supplied above. For example (this is the very worst case):</p>
4334	  	    <p> 				/compoundUnitPattern1[@count=&quot;few&quot;][@gender=&quot;feminine&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4335	  	    <p>/compoundUnitPattern1[@count=&quot;few&quot;][@gender=&quot;feminine&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span>      </p>
4336	  	    <p>/compoundUnitPattern1[@count=&quot;few&quot;][@gender=&quot;feminine&quot;]<span style="vertical-align: top;"> →</span></p>
4337	  	    <p>/compoundUnitPattern1[@count=&quot;few&quot;][@gender=&quot;neuter&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4338            <p>/compoundUnitPattern1[@count=&quot;few&quot;][@gender=&quot;neuter&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4339            <p>/compoundUnitPattern1[@count=&quot;few&quot;][@gender=&quot;neuter&quot;]<span style="vertical-align: top;"> →</span></p>
4340            <p>/compoundUnitPattern1[@count=&quot;few&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4341            <p>/compoundUnitPattern1[@count=&quot;few&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4342            <p>/compoundUnitPattern1[@count=&quot;few&quot;]<span style="vertical-align: top;"> →</span></p>
4343            <p>&nbsp;</p>
4344            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@gender=&quot;feminine&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4345            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@gender=&quot;feminine&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4346            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@gender=&quot;feminine&quot;]<span style="vertical-align: top;"> →</span></p>
4347            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@gender=&quot;neuter&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4348            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@gender=&quot;neuter&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4349            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@gender=&quot;neuter&quot;]<span style="vertical-align: top;"> →</span></p>
4350            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4351            <p>/compoundUnitPattern1[@count=&quot;other&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4352            <p>/compoundUnitPattern1[@count=&quot;other&quot;]<span style="vertical-align: top;"> →</span></p>
4353            <p>&nbsp;</p>
4354            <p>/compoundUnitPattern1[@gender=&quot;feminine&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4355            <p>/compoundUnitPattern1[@gender=&quot;feminine&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4356            <p>/compoundUnitPattern1[@gender=&quot;feminine&quot;]<span style="vertical-align: top;"> →</span></p>
4357            <p>/compoundUnitPattern1[@gender=&quot;neuter&quot;][@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4358            <p>/compoundUnitPattern1[@gender=&quot;neuter&quot;][@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4359            <p>/compoundUnitPattern1[@gender=&quot;neuter&quot;]<span style="vertical-align: top;"> →</span></p>
4360            <p>/compoundUnitPattern1[@case=&quot;accusative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4361            <p>/compoundUnitPattern1[@case=&quot;nominative&quot;&gt;]<span style="vertical-align: top;"> →</span></p>
4362            <p>/compoundUnitPattern1</p>
4363
4364    <p>&nbsp;</p>
4365    <p><em>Examples:</em></p>
4366    <table class='simple' border="1" cellpadding="3" cellspacing=
4367    "0" id="a1bn3">
4368      <caption>
4369        <a name="Count_Fallback_normal" href=
4370        "#Count_Fallback_normal" id="Count_Fallback_normal">Count
4371        Fallback: normal</a>
4372      </caption>
4373      <tbody>
4374        <tr>
4375          <th nowrap style="vertical-align: top;">Locale</th>
4376          <th nowrap style="vertical-align: top;">Path</th>
4377        </tr>
4378        <tr>
4379          <td nowrap style="vertical-align: top;">fr-CA</td>
4380          <td nowrap id="iqaw" style="vertical-align: top;">
4381          <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong></code></td>
4382        </tr>
4383        <tr>
4384          <td nowrap style="vertical-align: top;">fr-CA</td>
4385          <td nowrap id="iqaw16" style="vertical-align: top;">
4386          <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong></code></td>
4387        </tr>
4388        <tr>
4389          <td nowrap style="vertical-align: top;">fr</td>
4390          <td nowrap id="iqaw19" style="vertical-align: top;">
4391          <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong></code></td>
4392        </tr>
4393        <tr>
4394          <td nowrap style="vertical-align: top;">fr</td>
4395          <td nowrap id="iqaw18" style="vertical-align: top;">
4396          <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong></code></td>
4397        </tr>
4398        <tr>
4399          <td nowrap style="vertical-align: top;">root</td>
4400          <td nowrap id="iqaw21" style="vertical-align: top;">
4401          <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong></code></td>
4402        </tr>
4403        <tr>
4404          <td nowrap style="vertical-align: top;">root</td>
4405          <td nowrap id="iqaw20" style="vertical-align: top;">
4406          <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong></code></td>
4407        </tr>
4408      </tbody>
4409    </table>
4410    <p>Note that there may be an alias in root that changes the
4411    path and starts again from the requested locale, such as:</p>
4412    <p><code>&lt;unitLength type="<strong>narrow</strong>"&gt;<br>
4413    &nbsp;&nbsp;&nbsp;&lt;alias source="locale"
4414    path="../unitLength[@type='<strong>short</strong>']"/&gt;<br>
4415    &lt;/unitLength&gt;</code></p>
4416    <table class='simple' border="1" cellpadding="3" cellspacing=
4417    "0" id="a1bn2">
4418      <caption>
4419        <a name="Count_Fallback_currency" href=
4420        "#Count_Fallback_currency" id=
4421        "Count_Fallback_currency">Count Fallback: currency</a>
4422      </caption>
4423      <tbody>
4424        <tr>
4425          <th nowrap style="vertical-align: top;">Locale</th>
4426          <th nowrap style="vertical-align: top;">Path</th>
4427        </tr>
4428        <tr>
4429          <td nowrap style="vertical-align: top;">fr-CA</td>
4430          <td nowrap id="iqaw11" style="vertical-align: top;">
4431          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong></code></td>
4432        </tr>
4433        <tr>
4434          <td nowrap style="vertical-align: top;">fr-CA</td>
4435          <td nowrap id="iqaw6" style="vertical-align: top;">
4436          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong></code></td>
4437        </tr>
4438        <tr>
4439          <td nowrap style="vertical-align: top;">fr-CA</td>
4440          <td nowrap id="iqaw8" style="vertical-align: top;">
4441          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td>
4442        </tr>
4443        <tr>
4444          <td nowrap style="vertical-align: top;">fr</td>
4445          <td nowrap id="iqaw15" style="vertical-align: top;">
4446          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong></code></td>
4447        </tr>
4448        <tr>
4449          <td nowrap style="vertical-align: top;">fr</td>
4450          <td nowrap id="iqaw14" style="vertical-align: top;">
4451          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong></code></td>
4452        </tr>
4453        <tr>
4454          <td nowrap style="vertical-align: top;">fr</td>
4455          <td nowrap id="iqaw13" style="vertical-align: top;">
4456          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td>
4457        </tr>
4458        <tr>
4459          <td nowrap style="vertical-align: top;">root</td>
4460          <td nowrap id="iqaw25" style="vertical-align: top;">
4461          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong></code></td>
4462        </tr>
4463        <tr>
4464          <td nowrap style="vertical-align: top;">root</td>
4465          <td nowrap id="iqaw24" style="vertical-align: top;">
4466          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong></code></td>
4467        </tr>
4468        <tr>
4469          <td nowrap style="vertical-align: top;">root</td>
4470          <td nowrap id="iqaw23" style="vertical-align: top;">
4471          <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td>
4472        </tr>
4473      </tbody>
4474    </table><br>
4475    <h4><a name="Parent_Locales" href="#Parent_Locales" id=
4476    "Parent_Locales">4.1.3 Parent Locales</a></h4>
4477    <p class="dtd">&lt;!ELEMENT parentLocales ( parentLocale* )
4478    &gt;<br>
4479    &lt;!ELEMENT parentLocale EMPTY &gt;<br>
4480    &lt;!ATTLIST parentLocale parent NMTOKEN #REQUIRED &gt;<br>
4481    &lt;!ATTLIST parentLocale locales NMTOKENS #REQUIRED &gt;</p>
4482    <p>In some cases, the normal truncation inheritance does not
4483    function well. This happens when:</p>
4484    <ol>
4485      <li>The child locale is of a different script. In this case,
4486      mixing elements from the parent into the child data results
4487      in a mishmash.</li>
4488      <li>A large number of child locales behave similarly, and
4489      differently from the truncation parent.</li>
4490    </ol>
4491    <p>The <span class="element">parentLocale</span> element is
4492    used to override the normal inheritance when accessing CLDR
4493    data.</p>
4494    <p>For case 1, the children are script locales, and the parent
4495    is "root". For example:</p>
4496    <pre>
4497    &lt;parentLocale parent="root" locales="az_Cyrl ha_Arab … zh_Hant"/&gt;</pre>
4498    <p>For case 2, the children and parent share the same primary
4499    language, but the region is changed. For example:</p>
4500    <pre>
4501    &lt;parentLocale parent="es_419" locales="es_AR es_BO … es_UY es_VE"/&gt;</pre>
4502    <p>Collation data, however, is an exception. Since collation
4503    rules do not truly inherit data from the parent, the
4504    parentLocale element is not necessary and not used for
4505    collation. Thus, for a locale like zh_Hant in the example
4506    above, the parentLocale element would dictate the parent as
4507    "root" when referring to main locale data, but for collation
4508    data, the parent locale would still be "zh", even though the
4509    parentLocale element is present for that locale.</p>
4510    <p>Since parentLocale information is not localizable on a per
4511    locale basis, the parentLocale information is contained in
4512    CLDR’s <a href="tr35-info.html">supplemental data.</a></p>
4513    <p>When a <span class="element">parentLocale</span> element is
4514    used to override normal inheritance, the following invariants
4515    must always be true:</p>
4516    <ol>
4517      <li>If X is the parentLocale of Y, then either X is the root
4518      locale, or X has the same base language code as Y. For
4519      example, the parent of "en" cannot be "fr", and the parent of
4520      "en_YY" cannot be "fr" or "fr_XX".</li>
4521      <li>If X is the parentLocale of Y, Y must not be a base
4522      language locale. For example, the parent of "en" cannot be
4523      "en_XX".</li>
4524      <li>There can never be cycles, such as: X parent of Y ...
4525      parent of X.</li>
4526    </ol>
4527    <h3><a name="Inheritance_and_Validity" href=
4528    "#Inheritance_and_Validity" id="Inheritance_and_Validity">4.2
4529    Inheritance and Validity</a></h3>
4530    <p>The following describes in more detail how to determine the
4531    exact inheritance of elements, and the validity of a given
4532    element in LDML.</p>
4533    <h4><a name="Definitions" href="#Definitions" id=
4534    "Definitions">4.2.1 Definitions</a></h4>
4535    <p><i>Blocking</i> elements are those whose subelements do not
4536    inherit from parent locales. For example, a &lt;collation&gt;
4537    element is a blocking element: everything in a
4538    &lt;collation&gt; element is treated as a single lump of data,
4539    as far as inheritance is concerned. For more information, see
4540    <a href="#Valid_Attribute_Values">Section 5.5 Valid Attribute
4541    Values</a>.</p>
4542    <p>Attributes that serve to distinguish multiple elements at
4543    the same level are called <i>distinguishing</i> attributes. For
4544    example, the <i>type</i> attribute distinguishes different
4545    elements in lists of translations, such as:</p>
4546    <pre>&lt;language type="aa"&gt;Afar&lt;/language&gt;
4547&lt;language type="ab"&gt;Abkhazian&lt;/language&gt;</pre>
4548    <p>Distinguishing attributes affect inheritance; two elements
4549    with different distinguishing attributes are treated as
4550    different for purposes of inheritance. For more information,
4551    see <a href="#Valid_Attribute_Values">Section 5.5 Valid
4552    Attribute Values</a>. Other attributes are called
4553    nondistinguishing (or informational) attributes. These carry
4554    separate information, and do not affect inheritance.</p>
4555    <p>For any element in an XML file, <i>an element chain</i> is a
4556    resolved [<a href="#XPath">XPath</a>] leading from the root to
4557    an element, with attributes on each element in alphabetical
4558    order. So in, say, <a href=
4559    "https://github.com/unicode-org/cldr/blob/master/common/main/el.xml">https://github.com/unicode-org/cldr/blob/master/common/main/el.xml</a>
4560    we may have:</p>
4561    <pre>&lt;ldml&gt;
4562  &lt;identity&gt;
4563    &lt;version number="1.1" /&gt;
4564    &lt;language type="el" /&gt;
4565  &lt;/identity&gt;
4566  &lt;localeDisplayNames&gt;
4567    &lt;languages&gt;
4568      &lt;language type="ar"&gt;Αραβικά&lt;/language&gt;
4569...</pre>
4570    <p>Which gives the following element chains (among others):</p>
4571    <ul>
4572      <li>//ldml/identity/version[@number="1.1"]</li>
4573      <li>
4574      //ldml/localeDisplayNames/languages/language[@type="ar"]</li>
4575    </ul>
4576    <p>An element chain A is an <i>extension</i> of an element
4577    chain B if B is equivalent to an initial portion of A. For
4578    example, #2 below is an extension of #1. (Equivalent, depending
4579    on the tree, may not be "identical to". See below for an
4580    example.)</p>
4581    <ol>
4582      <li>//ldml/localeDisplayNames</li>
4583      <li>
4584      //ldml/localeDisplayNames/languages/language[@type="ar"]</li>
4585    </ol>
4586    <p>An LDML file can be thought of as an ordered list of
4587    <i>element pairs</i>: &lt;element chain, data&gt;, where the
4588    element chains are all the chains for the end-nodes. (This
4589    works because of restrictions on the structure of LDML,
4590    including that it does not allow mixed content.) The ordering
4591    is the ordering that the element chains are found in the file,
4592    and thus determined by the DTD.</p>
4593    <p>For example, some of those pairs would be the following.
4594    Notice that the first has the null string as element
4595    contents.</p>
4596    <ul>
4597      <li><b>&lt;</b>//ldml/identity/version[@number="1.1"]<b>,</b>
4598      ""<b>&gt;</b></li>
4599      <li>
4600      <b>&lt;</b>//ldml/localeDisplayNames/languages/language[@type="ar"]<b>,</b>
4601      "Αραβικά"<b>&gt;</b></li>
4602    </ul>
4603    <blockquote>
4604      <p><b>Note:</b> There are two exceptions to this:</p>
4605      <ol>
4606        <li>Blocking nodes and their contents are treated as a
4607        single end node.</li>
4608        <li>In terms of computing inheritance, the element pair
4609        consists of the element chain plus all distinguishing
4610        attributes; the value consists of the value (if any) plus
4611        any nondistinguishing attributes.</li>
4612      </ol>
4613      <blockquote>
4614        <p>Thus instead of the element pair being (a) below, it is
4615        (b):</p>
4616        <ol type="a">
4617          <li>
4618          <b>&lt;</b>//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart[@day='sun'][@time='00:00']<b>,</b><br>
4619
4620          <b>""&gt;</b></li>
4621          <li>
4622          <b>&lt;</b>//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart<b>,</b><br>
4623
4624          [@day='sun'][@time='00:00']<b>&gt;</b></li>
4625        </ol>
4626      </blockquote>
4627    </blockquote>
4628    <p>Two LDML element chains are <i>equivalent</i> when they
4629    would be identical if all attributes and their values were
4630    removed — except for distinguishing attributes. Thus the
4631    following are equivalent:</p>
4632    <ul>
4633      <li>
4634      <code>//ldml/localeDisplayNames/languages/language[@type="ar"]</code></li>
4635      <li>
4636      <code>//ldml/localeDisplayNames/languages/language[@type="ar"][@draft="unconfirmed"]</code></li>
4637    </ul>
4638    <p>For any locale ID, an <i>locale chain</i> is an ordered list
4639    starting with the root and leading down to the ID. For
4640    example:</p>
4641    <blockquote>
4642      <p>&lt;root, de, de_DE, de_DE_xxx&gt;</p>
4643    </blockquote>
4644    <h4><a name="Resolved_Data_File" href="#Resolved_Data_File" id=
4645    "Resolved_Data_File">4.2.2 Resolved Data File</a></h4>
4646    <p>To produce fully resolved locale data file from CLDR for a
4647    locale ID L, you start with L, and successively add unique
4648    items from the parent locales until you get up to root. More
4649    formally, this can be expressed as the following procedure.</p>
4650    <ol>
4651      <li>Let Result be initially L.</li>
4652      <li>For each Li in the locale chain for L, starting at L and
4653      going up to root:
4654        <ol>
4655          <li>Let Temp be a copy of the pairs in the LDML file for
4656          Li</li>
4657          <li>Replace each alias in Temp by the resolved list of
4658          pairs it points to.
4659            <ol>
4660              <li>The resolved list of pairs is obtained by
4661              recursively applying this procedure.</li>
4662              <li>That alias now blocks any inheritance from the
4663              parent. (See <i><a href="#Common_Elements">Section
4664              5.1 Common Elements</a></i> for an example.)</li>
4665            </ol>
4666          </li>
4667          <li>For each element pair P in Temp:
4668            <ol>
4669              <li>If P does not contain a blocking element, and
4670              Result does not have an element pair Q with an
4671              equivalent element chain, add P to Result.</li>
4672            </ol>
4673          </li>
4674        </ol>
4675      </li>
4676    </ol>
4677    <p><b>Notes:</b></p>
4678    <ul>
4679      <li>When adding an element pair to a result, it has to go in
4680      the right order for it to be valid according to the DTD.</li>
4681      <li>The identity element and its children are unaffected by
4682      resolution.</li>
4683      <li>The LDML data must be constructed so as to avoid
4684      circularity in step 2.2.</li>
4685    </ul>
4686    <h4><a name="Valid_Data" href="#Valid_Data" id=
4687    "Valid_Data">4.2.3 Valid Data</a></h4>
4688    <p>The attribute <i>draft="x"</i> in LDML means that the data
4689    has not been approved by the subcommittee. (For more
4690    information, see <a href=
4691    "http://cldr.unicode.org/index/process">Process</a>). However,
4692    some data that is not explicitly marked as <i>draft</i> may be
4693    implicitly <i>draft</i>, either because it inherits it from a
4694    parent, or from an enclosing element.</p>
4695    <p><b>Example 2.</b> Suppose that new locale data is added for
4696    af (Afrikaans). To indicate that all of the data is
4697    <i>unconfirmed</i>, the attribute can be added to the top
4698    level.</p>
4699    <p><code>&lt;ldml version="1.1" draft="unconfirmed"&gt;<br>
4700    &nbsp;&lt;identity&gt;<br>
4701    &nbsp; &lt;version number="1.1" /&gt;<br>
4702    &nbsp; &lt;language type="af" /&gt;<br>
4703    &nbsp;&lt;/identity&gt;<br>
4704    &nbsp;&lt;characters&gt;...&lt;/characters&gt;<br>
4705    &nbsp;&lt;localeDisplayNames&gt;...&lt;/localeDisplayNames&gt;<br>
4706
4707    &lt;/ldml&gt;</code></p>
4708    <p>Any data can be added to that file, and the status will all
4709    be draft=<i>unconfirmed</i>. Once an item is vetted—<i>whether
4710    it is inherited or explicitly in the file</i>—then its status
4711    can be changed to <i>approved</i>. This can be done either by
4712    leaving draft="unconfirmed" on the enclosing element and
4713    marking the child with draft="approved", such as:</p>
4714    <p><code>&lt;ldml version="1.1" draft="unconfirmed"&gt;<br>
4715    &nbsp;&lt;identity&gt;<br>
4716    &nbsp; &lt;version number="1.1" /&gt;<br>
4717    &nbsp; &lt;language type="af" /&gt;<br>
4718    &nbsp;&lt;/identity&gt;<br>
4719    &nbsp;&lt;characters
4720    draft="approved"&gt;...&lt;/characters&gt;<br>
4721    &nbsp;&lt;localeDisplayNames&gt;...&lt;/localeDisplayNames&gt;<br>
4722
4723    &nbsp;&lt;dates/&gt;<br>
4724    &nbsp;&lt;numbers/&gt;<br>
4725    &nbsp;&lt;collations/&gt;<br>
4726    &lt;/ldml&gt;</code></p>
4727    <p>However, normally the draft attributes should be
4728    canonicalized, which means they are pushed down to leaf nodes
4729    as described in <i><a href="#Canonical_Form">Section 5.6
4730    Canonical Form</a></i>. If an LDML file does has draft
4731    attributes that are not on leaf nodes, the file should be
4732    interpreted as if it were the canonicalized version of that
4733    file.</p>
4734    <p>More formally, here is how to determine whether data for an
4735    element chain E is implicitly or explicitly draft, given a
4736    locale L. Sections 1, 2, and 4 are simply formalizations of
4737    what is in LDML already. Item 3 adds the new element.</p>
4738    <h4><a name="Checking_for_Draft_Status" href=
4739    "#Checking_for_Draft_Status" id=
4740    "Checking_for_Draft_Status">4.2.4 Checking for Draft
4741    Status</a></h4>
4742    <ol>
4743      <li>
4744        <b>Parent Locale Inheritance</b>
4745        <ol>
4746          <li>Walk through the locale chain until you find a locale
4747          ID L' with a data file D. (L' may equal L).</li>
4748          <li>Produce the fully resolved data file D' for D.</li>
4749          <li>In D', find the first element pair whose element
4750          chain E' is either equivalent to or an extension of
4751          E.</li>
4752          <li>If there is no such E', return <i>true</i></li>
4753          <li>If E' is not equivalent to E, truncate E' to the
4754          length of E.</li>
4755        </ol>
4756      </li>
4757      <li>
4758        <b>Enclosing Element Inheritance</b>
4759        <ol>
4760          <li>Walk through the elements in E', from back to front.
4761            <ol>
4762              <li>If you ever encounter draft=<i>x</i>, return
4763              <i>x</i></li>
4764            </ol>
4765          </li>
4766          <li>If L' = L, return <i>false</i></li>
4767        </ol>
4768      </li>
4769      <li>
4770        <b>Missing File Inheritance</b>
4771        <ol>
4772          <li>Otherwise, walk again through the elements in E',
4773          from back to front.
4774            <ol>
4775              <li>If you encounter a validSubLocales attribute
4776              (deprecated):
4777                <ol>
4778                  <li>If L is in the attribute value, return
4779                  <i>false</i></li>
4780                  <li>Otherwise return <i>true</i></li>
4781                </ol>
4782              </li>
4783            </ol>
4784          </li>
4785        </ol>
4786      </li>
4787      <li>
4788        <b>Otherwise</b>
4789        <ol>
4790          <li>Return <i>true</i></li>
4791        </ol>
4792      </li>
4793    </ol>
4794    <p>The validSubLocales in the most specific (farthest from root
4795    file) locale file "wins" through the full resolution step (data
4796    from more specific files replacing data from less specific
4797    ones).</p>
4798    <h4><a name="Keyword_and_Default_Resolution" href=
4799    "#Keyword_and_Default_Resolution" id=
4800    "Keyword_and_Default_Resolution">4.2.5 Keyword and Default
4801    Resolution</a></h4>
4802    <p>When accessing data based on keywords, the following process
4803    is used. Consider the following example:</p>
4804    <ul>
4805      <li>The locale 'de' has collation types A, B, C, and no
4806      &lt;default&gt; element</li>
4807      <li>The locale 'de_CH' has &lt;default type='B'&gt;</li>
4808    </ul>
4809    <p>Here are the searches for various combinations.</p>
4810    <table class='simple' border="1" cellpadding="0" cellspacing=
4811    "0">
4812      <tr>
4813        <td><strong>User Input</strong></td>
4814        <td><strong>Lookup in Locale</strong></td>
4815        <td><strong>For</strong></td>
4816        <td><strong>Comment</strong></td>
4817      </tr>
4818      <tr>
4819        <td rowspan="3">de_CH<br>
4820        <em>no keyword</em></td>
4821        <td>de_CH</td>
4822        <td>default collation type</td>
4823        <td>finds "B"</td>
4824      </tr>
4825      <tr>
4826        <td>de_CH</td>
4827        <td>collation type=B</td>
4828        <td>not found</td>
4829      </tr>
4830      <tr>
4831        <td>de</td>
4832        <td>collation type=B</td>
4833        <td><em>found</em></td>
4834      </tr>
4835      <tr>
4836        <td rowspan="4">de<br>
4837        <em>no keyword</em></td>
4838        <td>de</td>
4839        <td>default collation type</td>
4840        <td>not found</td>
4841      </tr>
4842      <tr>
4843        <td>root</td>
4844        <td>default collation type</td>
4845        <td>finds "standard"</td>
4846      </tr>
4847      <tr>
4848        <td>de</td>
4849        <td>collation type=standard</td>
4850        <td>not found</td>
4851      </tr>
4852      <tr>
4853        <td>root</td>
4854        <td>collation type=standard</td>
4855        <td><i>found</i></td>
4856      </tr>
4857      <tr>
4858        <td>de_u_co_A</td>
4859        <td>de</td>
4860        <td>collation type=A</td>
4861        <td><i>found</i></td>
4862      </tr>
4863      <tr>
4864        <td rowspan="2">de_u_co_standard</td>
4865        <td>de</td>
4866        <td>collation type=standard</td>
4867        <td>not found</td>
4868      </tr>
4869      <tr>
4870        <td>root</td>
4871        <td>collation type=standard</td>
4872        <td><i>found</i></td>
4873      </tr>
4874      <tr>
4875        <td rowspan="6">de_u_co_foobar</td>
4876        <td>de</td>
4877        <td>collation type=foobar</td>
4878        <td>not found</td>
4879      </tr>
4880      <tr>
4881        <td>root</td>
4882        <td>collation type=foobar</td>
4883        <td>not found, starts looking for default</td>
4884      </tr>
4885      <tr>
4886        <td>de</td>
4887        <td>default collation type</td>
4888        <td>not found</td>
4889      </tr>
4890      <tr>
4891        <td>root</td>
4892        <td>default collation type</td>
4893        <td>finds "standard"</td>
4894      </tr>
4895      <tr>
4896        <td>de</td>
4897        <td>collation type=standard</td>
4898        <td>not found</td>
4899      </tr>
4900      <tr>
4901        <td>root</td>
4902        <td>collation type=standard</td>
4903        <td><i>found</i></td>
4904      </tr>
4905    </table>
4906    <p>Examples of "search" collator lookup; 'de' has a
4907    language-specific version, but 'en' does not:</p>
4908    <table class='simple' border="1" cellpadding="0" cellspacing=
4909    "0">
4910      <tr>
4911        <td><strong>User Input</strong></td>
4912        <td><strong>Lookup in Locale</strong></td>
4913        <td><strong>For</strong></td>
4914        <td><strong>Comment</strong></td>
4915      </tr>
4916      <tr>
4917        <td rowspan="2">de_CH_u_co_search</td>
4918        <td>de_CH</td>
4919        <td>collation type=search</td>
4920        <td>not found</td>
4921      </tr>
4922      <tr>
4923        <td>de</td>
4924        <td>collation type=search</td>
4925        <td><i>found</i></td>
4926      </tr>
4927      <tr>
4928        <td rowspan="3">en_US_u_co_search</td>
4929        <td>en_US</td>
4930        <td>collation type=search</td>
4931        <td>not found</td>
4932      </tr>
4933      <tr>
4934        <td>en</td>
4935        <td>collation type=search</td>
4936        <td>not found</td>
4937      </tr>
4938      <tr>
4939        <td>root</td>
4940        <td>collation type=search</td>
4941        <td><i>found</i></td>
4942      </tr>
4943    </table>
4944    <p>Examples of lookup for Chinese collation types. Note:</p>
4945    <ul>
4946      <li>All of the Chinese-specific collation types are provided
4947      in the 'zh' locale</li>
4948      <li>For 'zh' the &lt;default&gt; element specifies "pinyin";
4949      for 'zh_Hant' the &lt;default&gt; element specifies "stroke".
4950      However any of the available Chinese collation types can be
4951      explicitly requested for any Chinese locale.</li>
4952    </ul>
4953    <table class='simple' border="1" cellpadding="0" cellspacing=
4954    "0">
4955      <tr>
4956        <td><strong>User Input</strong></td>
4957        <td><strong>Lookup in Locale</strong></td>
4958        <td><strong>For</strong></td>
4959        <td><strong>Comment</strong></td>
4960      </tr>
4961      <tr>
4962        <td rowspan="3">zh_Hant<br>
4963        <em>no keyword</em></td>
4964        <td>zh_Hant</td>
4965        <td>default collation type</td>
4966        <td>finds "stroke"</td>
4967      </tr>
4968      <tr>
4969        <td>zh_Hant</td>
4970        <td>collation type=stroke</td>
4971        <td>not found</td>
4972      </tr>
4973      <tr>
4974        <td>zh</td>
4975        <td>collation type=stroke</td>
4976        <td><i>found</i></td>
4977      </tr>
4978      <tr>
4979        <td rowspan="3">zh_Hant_HK_u_co_pinyin</td>
4980        <td>zh_Hant_HK</td>
4981        <td>collation type=pinyin</td>
4982        <td>not found</td>
4983      </tr>
4984      <tr>
4985        <td>zh_Hant</td>
4986        <td>collation type=pinyin</td>
4987        <td>not found</td>
4988      </tr>
4989      <tr>
4990        <td>zh</td>
4991        <td>collation type=pinyin</td>
4992        <td><i>found</i></td>
4993      </tr>
4994      <tr>
4995        <td rowspan="2">zh<br>
4996        <em>no keyword</em></td>
4997        <td>zh</td>
4998        <td>default collation type</td>
4999        <td>finds "pinyin"</td>
5000      </tr>
5001      <tr>
5002        <td>zh</td>
5003        <td>collation type=pinyin</td>
5004        <td><i>found</i></td>
5005      </tr>
5006    </table>
5007    <blockquote>
5008      <p><b>Note:</b> It is an invariant that the default in root
5009      for a given element must<br>
5010      always be a value that exists in root. So you can not have
5011      the following in root:</p>
5012    </blockquote>
5013    <p><code>&lt;someElements&gt;<br>
5014    &nbsp; &lt;default type='a'/&gt;<br>
5015    &nbsp; &lt;someElement type='b'&gt;...&lt;/someElement&gt;<br>
5016    &nbsp; &lt;someElement type='c'&gt;...&lt;/someElement&gt;<br>
5017    <b>&nbsp; &lt;!-- no 'a' --&gt;</b><br>
5018    &lt;/someElements&gt;</code></p>
5019    <p>For identifiers, such as language codes, script codes,
5020    region codes, variant codes, types, keywords, currency symbols
5021    or currency display names, the default value is the identifier
5022    itself whenever if no value is found in the root. Thus if there
5023    is no display name for the region code 'QA' in root, then the
5024    display name is simply 'QA'.</p>
5025    <h4><a name="Inheritance_vs_Related" href=
5026    "#Inheritance_vs_Related" id="Inheritance_vs_Related">4.2.6
5027    Inheritance vs Related Information</a></h4>
5028    <p>There are related types of data and processing that are easy
5029    to confuse:</p>
5030    <table class='simple'>
5031      <tr>
5032        <td rowspan="4">
5033          <p><strong>Inheritance</strong></p>
5034        </td>
5035        <td colspan="2">Part of the internal mechanism used by CLDR
5036        to organize and manage locale data. This is used to share
5037        common resources, and ease maintenance, and provide the
5038        best fallback behavior in the absence of data. <em>Should
5039        not be used for locale matching or likely
5040        subtags.</em></td>
5041      </tr>
5042      <tr>
5043        <td><em>Example:</em></td>
5044        <td>parent(en_AU) ⇒ en_001<br>
5045        parent(en_001) ⇒ en<br>
5046        parent(en) ⇒ root</td>
5047      </tr>
5048      <tr>
5049        <td><em>Data:</em></td>
5050        <td>supplementalData.xml &lt;parentLocale&gt;</td>
5051      </tr>
5052      <tr>
5053        <td><em>Spec:</em></td>
5054        <td><strong>Section <a href="#Inheritance_and_Validity">4.2
5055        Inheritance and Validity</a></strong></td>
5056      </tr>
5057      <tr>
5058        <td rowspan="4"><strong>DefaultContent</strong></td>
5059        <td colspan="2">Part of the internal mechanism used by CLDR
5060        to manage locale data. A particular sublocale is designated
5061        the defaultContent for a parent, so that the parent
5062        exhibits consistent behavior. <em>Should not be used for
5063        locale matching or likely subtags.</em></td>
5064      </tr>
5065      <tr>
5066        <td><em>Example:</em></td>
5067        <td>addLikelySubtags(sr-ME) ⇒ sr-Latn-ME,
5068        minimize(de-Latn-DE) ⇒ de</td>
5069      </tr>
5070      <tr>
5071        <td><em>Data:</em></td>
5072        <td>supplementalMetadata.xml &lt;defaultContent&gt;</td>
5073      </tr>
5074      <tr>
5075        <td><em>Spec:</em></td>
5076        <td><strong>Part 6: Section 9.3&nbsp;<a href=
5077        "tr35-info.html#Default_Content">Default
5078        Content</a></strong></td>
5079      </tr>
5080      <tr>
5081        <td rowspan="4"><strong>LikelySubtags</strong></td>
5082        <td colspan="2">Provides most likely full subtag (script
5083        and region) in the absence of other information. A core
5084        component of LocaleMatching.</td>
5085      </tr>
5086      <tr>
5087        <td><em>Example:</em></td>
5088        <td>addLikelySubtags(zh) ⇒ zh-Hans-CN<br>
5089        addLikelySubtags(zh-TW) ⇒ zh-Hant-TW<br>
5090        minimize(zh-Hans, favorRegion) ⇒ zh-TW</td>
5091      </tr>
5092      <tr>
5093        <td><em>Data:</em></td>
5094        <td>likelySubtags.xml &lt;likelySubtags&gt;</td>
5095      </tr>
5096      <tr>
5097        <td><em>Spec:</em></td>
5098        <td><strong>Section <a href="#Likely_Subtags">4.3 Likely
5099        Subtags</a></strong></td>
5100      </tr>
5101      <tr>
5102        <td rowspan="4"><strong>LocaleMatching</strong></td>
5103        <td colspan="2">Provides the best match for the user’s
5104        language(s) among an application’s supported
5105        languages.</td>
5106      </tr>
5107      <tr>
5108        <td><em>Example:</em></td>
5109        <td>bestLocale(userLangs=&lt;en, fr&gt;,
5110        appLangs=&lt;fr-CA, ru&gt;) ⇒ fr-CA</td>
5111      </tr>
5112      <tr>
5113        <td><em>Data:</em></td>
5114        <td>languageInfo.xml &lt;languageMatching&gt;</td>
5115      </tr>
5116      <tr>
5117        <td><em>Spec:</em></td>
5118        <td><strong>Section <a href="#LanguageMatching">4.4
5119        Language Matching</a></strong></td>
5120      </tr>
5121    </table>
5122    <h3><a name="Likely_Subtags" href="#Likely_Subtags" id=
5123    "Likely_Subtags">4.3 Likely Subtags</a></h3>
5124    <p class="dtd">&lt;!ELEMENT likelySubtag EMPTY &gt;<br>
5125    &lt;!ATTLIST likelySubtag from NMTOKEN #REQUIRED&gt;<br>
5126    &lt;!ATTLIST likelySubtag to NMTOKEN #REQUIRED&gt;</p>
5127    <p>There are a number of situations where it is useful to be
5128    able to find the most likely language, script, or region. For
5129    example, given the language "zh" and the region "TW", what is
5130    the most likely script? Given the script "Thai" what is the
5131    most likely language or region? Given the region TW, what is
5132    the most likely language and script?</p>
5133    <p>Conversely, given a locale, it is useful to find out which
5134    fields (language, script, or region) may be superfluous, in the
5135    sense that they contain the likely tags. For example, "en_Latn"
5136    can be simplified down to "en" since "Latn" is the likely
5137    script for "en"; "ja_Jpan_JP" can be simplified down to
5138    "ja".</p>
5139    <p>The <i>likelySubtag</i> supplemental data provides default
5140    information for computing these values. This data is based on
5141    the default content data, the population data, and the
5142    suppress-script data in [<a href="#BCP47">BCP47</a>]. It is
5143    heuristically derived, and may change over time.</p>
5144    <p>For the relationship between Inheritance, DefaultContent,
5145    LikelySubtags, and LocaleMatching, see <strong><em>Section
5146    4.2.6 <a href="tr35.html#Inheritance_vs_Related">Inheritance vs
5147    Related Information</a></em></strong>.</p>
5148    <p>To look up data in the table, see if a locale matches one of
5149    the <b>from</b> attribute values. If so, fetch the
5150    corresponding <b>to</b> attribute value. For example, the
5151    Chinese data looks like the following:</p>
5152    <blockquote>
5153      <p class="example">&lt;likelySubtag from="zh"
5154      to="zh_Hans_CN"/&gt;<br>
5155      &lt;likelySubtag from="zh_HK" to="zh_Hant_HK"/&gt;<br>
5156      &lt;likelySubtag from="zh_Hani" to="zh_Hani_CN"/&gt;<br>
5157      &lt;likelySubtag from="zh_Hant" to="zh_Hant_TW"/&gt;<br>
5158      &lt;likelySubtag from="zh_MO" to="zh_Hant_MO"/&gt;<br>
5159      &lt;likelySubtag from="zh_TW" to="zh_Hant_TW"/&gt;</p>
5160    </blockquote>
5161    <p>So looking up "zh_TW" returns "zh_Hant_TW", while looking up
5162    "zh" returns "zh_Hans_CN".</p>
5163    <p>In more detail, the data is designed to be used in the
5164    following operations.</p>
5165    <p>Note that as of CLDR v24, any field present in the 'from'
5166    field, is also present in the 'to' field, so an input field
5167    will not change in "Add Likely Subtags" operation. The data and
5168    operations can also be used with language tags using [<a href=
5169    "#BCP47">BCP47</a>] syntax, with the appropriate changes. In
5170    addition, certain common 'denormalized' language subtags such
5171    as 'iw' (for 'he') may occur in both the 'from' and 'to'
5172    fields. This allows for implementations that use those
5173    denormalized subtags to use the data with only minor changes to
5174    the operations.</p>
5175    <p>An implementation may choose  exclude language tags with the language subtag &quot;und&quot; from the following operation. In such a case, only the canonicalization is done. An implementation can declare that it is doing the exclusion, or can take a parameter that controls whether or not to do it.</p>
5176    <p>&nbsp;</p>
5177    <p><i><b>Add Likely Subtags:</b></i> <em>Given a source locale
5178    X, to return a locale Y where the empty subtags have been
5179    filled in by the most likely subtags.</em> This is written as X
5180    ⇒ Y ("X maximizes to Y").</p>
5181    <p>A subtag is called <em>empty</em> if it is a missing script
5182    or region subtag, or it is a base language subtag with the
5183    value "und". In the description below, a subscript on a subtag
5184    <em>x</em> indicates which tag it is from:
5185    <em>x<sub>s</sub></em> is in the source,
5186    <em>x<sub>m</sub></em>is in a match, and <em>x<sub>r</sub></em>
5187    is in the final result.</p>
5188    <p>This operation is performed in the following way.</p>
5189    <ol>
5190      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5191        <strong>Canonicalize.</strong>
5192        <ol>
5193          <li>Make sure the input locale is in canonical form: uses
5194          the right separator, and has the right casing.</li>
5195          <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5196          Replace any deprecated subtags with their canonical
5197          values using the &lt;alias&gt; data in supplemental
5198          metadata. Use the first value in the replacement list, if
5199          it exists. Language tag replacements may have multiple
5200          parts, such as "sh" ➞ "sr_Latn" or mo" ➞ "ro_MD". In such
5201          a case, the original script and/or region are retained if
5202          there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not
5203          "sr_Latn_AQ".</li>
5204          <li>If the tag is a legacy language tag
5205          (marked as “Type: grandfathered” in BCP 47; see &lt;variable
5206          id="$grandfathered" type="choice"&gt; in the supplemental
5207          data), then return it.</li>
5208          <li>Remove the script code 'Zzzz' and the region code
5209          'ZZ' if they occur.</li>
5210          <li>Get the components of the cleaned-up source tag
5211          <em>(language<sub>s</sub>, script<sub>s</sub>,</em> and
5212          <em>region<sub>s</sub></em>), plus any variants and
5213          extensions.</li>
5214        </ol>
5215      </li>
5216      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5217        <strong>Lookup.</strong> Lookup each of the following in
5218        order, and stop on the first match:
5219        <ol>
5220          <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5221          <em>language<sub>s</sub>_script<sub>s</sub>_region<sub>s</sub></em></li>
5222          <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5223          <em>language<sub>s</sub>_region<sub>s</sub></em></li>
5224          <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5225          <em>language<sub>s</sub>_script<sub>s</sub></em></li>
5226          <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5227          <em><em>language<sub>s</sub></em></em></li>
5228          <li>und<em>_script<sub>s</sub></em>      </li>
5229        </ol>
5230      </li>
5231      <li>
5232        <strong>Return</strong>
5233        <ol>
5234          <li>If there is no match,either return
5235            <ol>
5236              <li>an error value, or</li>
5237              <li>the match for "und" (in APIs where a valid
5238              language tag is required).</li>
5239            </ol>
5240          </li>
5241          <li>Otherwise there is a match = <span style=
5242          "margin-top: 0.5em; margin-bottom: 0.5em"><em>language<sub>m</sub>_script<sub>m</sub>_region<sub>m</sub></em></span></li>
5243          <li>Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is
5244          not empty, and x<sub>m</sub> otherwise.</li>
5245          <li>R<span style=
5246          "margin-top: 0.5em; margin-bottom: 0.5em">eturn the
5247          language tag composed of <em>language<sub>r</sub> _
5248          script<sub>r</sub> _ region<sub>r</sub></em> + variants +
5249          extensions</span> .</li>
5250        </ol>
5251      </li>
5252    </ol>
5253    <p>The lookup can be optimized. For example, if any of the tags
5254    in Step 2 are the same as previous ones in that list, they do
5255    not need to be tested.</p>
5256    <p><i>Example1:</i></p>
5257    <ul>
5258      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5259        <p>Input is ZH-ZZZZ-SG.</p>
5260      </li>
5261      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5262        <p>Normalize to zh_SG.</p>
5263      </li>
5264      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5265        <p>Lookup in table. No match.</p>
5266      </li>
5267      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5268        <p>Lookup zh, and get the match (zh_Hans_CN). Substitute
5269        SG, and return zh_Hans_SG.</p>
5270      </li>
5271    </ul>
5272    <p>To find the most likely language for a country, or language
5273    for a script, use "und" as the language subtag. For example,
5274    looking up "und_TW" returns zh_Hant_TW.</p>
5275    <p>A goal of the algorithm is that if X ⇒ Y, and X' results
5276    from replacing an empty subtag in X by the corresponding
5277    subtag in Y, then X' ⇒ Y. For example, if und_AF ⇒ fa_Arab_AF,
5278    then:</p>
5279    <ul>
5280      <li>fa_Arab_AF ⇒ fa_Arab_AF</li>
5281      <li>und_Arab_AF ⇒ fa_Arab_AF</li>
5282      <li>fa_AF ⇒ fa_Arab_AF</li>
5283    </ul>
5284    <p>There are a small number of exceptions to this goal in the
5285    current data, where X ∈ {und_Bopo, und_Brai, und_Cakm,
5286    und_Limb, und_Shaw}.</p>
5287    <p><b><i>Remove</i></b> <i><b>Likely Subtags:</b> Given a
5288    locale, remove any fields that Add Likely Subtags would
5289    add.</i></p>
5290    <p>The reverse operation removes fields that would be added by
5291    the first operation.</p>
5292    <ol>
5293      <li style="margin-top: 0.5em; margin-bottom: 0.5em">First get
5294      max = AddLikelySubtags(inputLocale). If an error is signaled,
5295      return it.</li>
5296      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Remove
5297      the variants from max.</li>
5298      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Get the
5299	      components of the max (<em>language<sub>max</sub></em>,
5300	      <em>script<sub>max</sub></em>, <em>region<sub>max</sub></em>).</li>
5301      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Then for
5302      <i>trial</i> in {<em>language<sub>max</sub></em>,
5303	      <em>language<sub>max</sub>_region<sub>max</sub></em>,
5304	      <em>language<sub>max</sub>_script<sub>max</sub></em>}
5305        <ul>
5306          <li style="margin-top: 0.5em; margin-bottom: 0.5em">If
5307          AddLikelySubtags(<i>trial</i>) = max, then return
5308          <i>trial</i> + variants.</li>
5309        </ul>
5310      </li>
5311      <li style="margin-top: 0.5em; margin-bottom: 0.5em">If you do
5312      not get a match, return max + variants.</li>
5313    </ol>
5314    <p>Example:</p>
5315    <ul>
5316      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5317        <p>Input is zh_Hant. Maximize to get zh_Hant_TW.</p>
5318      </li>
5319      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5320        <p>zh =&gt; zh_Hans_CN. No match, so continue.</p>
5321      </li>
5322      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
5323        <p>zh_TW =&gt; zh_Hant_TW. Matches, so return zh_TW.</p>
5324      </li>
5325    </ul>
5326    <p>A variant of this favors the script over the region, thus
5327    using {language, language_script, language_region} in the
5328    above. If that variant is used, then the result in this example
5329    would be zh_Hant instead of zh_TW.</p>
5330    <h3><a name="LanguageMatching" href="#LanguageMatching" id=
5331    "LanguageMatching">4.4 Language Matching</a></h3>
5332    <p class="dtd">&lt;!ELEMENT languageMatching ( languageMatches*
5333    ) &gt;<br>
5334    &lt;!ELEMENT languageMatches ( paradigmLocales*,
5335    matchVariable*, languageMatch* ) &gt;<br>
5336    &lt;!ATTLIST languageMatches type NMTOKEN #REQUIRED &gt;</p>
5337    <p class="dtd">&lt;!ELEMENT languageMatch EMPTY &gt;<br>
5338    &lt;!ATTLIST languageMatch desired CDATA #REQUIRED &gt;<br>
5339    &lt;!ATTLIST languageMatch supported CDATA #REQUIRED &gt;<br>
5340    &lt;!ATTLIST languageMatch percent NMTOKEN #REQUIRED &gt;<br>
5341    &lt;!ATTLIST languageMatch distance NMTOKEN #IMPLIED &gt;<br>
5342    &lt;!ATTLIST languageMatch oneway ( true | false ) #IMPLIED
5343    &gt;</p>
5344    <p class="dtd">&lt;!ELEMENT languageMatches ( paradigmLocales*,
5345    matchVariable*, languageMatch* ) &gt;<br>
5346    &lt;!ATTLIST languageMatches type NMTOKEN #REQUIRED &gt;</p>
5347    <p class="dtd">&lt;!ELEMENT paradigmLocales EMPTY &gt;<br>
5348    &lt;!ATTLIST paradigmLocales locales NMTOKENS #REQUIRED
5349    &gt;</p>
5350    <p>Implementers are often faced with the issue of how to match
5351    the user's requested languages with their product's supported
5352    languages. For example, suppose that a product supports {ja-JP,
5353    de, zh-TW}. If the user understands written American English,
5354    German, French, Swiss German, and Italian, then
5355    <strong>de</strong> would be the best match; if s/he
5356    understands only Chinese (zh), then zh-TW would be the best
5357    match.</p>
5358    <p>The standard truncation-fallback algorithm does not work
5359    well when faced with the complexities of natural language. The
5360    language matching data is designed to fill that gap. Stated in
5361    those terms, language matching can have the effect of a more
5362    complex fallback, such as:</p>
5363    <p>sr-Cyrl-RS<br>
5364    sr-Cyrl<br>
5365    sr-Latn-RS<br>
5366    sr-Latn<br>
5367    sr<br>
5368    hr-Latn<br>
5369    hr</p>
5370    <p>Language matching is used to find the best supported locale
5371    ID given a requested list of languages. The requested list
5372    could come from different sources, such as such as the user's
5373    list of preferred languages in the OS Settings, or from a
5374    browser Accept-Language list. For example, if my native tongue
5375    is English, I can understand Swiss German and German, my French
5376    is rusty but usable, and Italian basic, ideally an
5377    implementation would allow me to select {gsw, de, fr} as my
5378    preferred list of languages, skipping Italian because my
5379    comprehension is not good enough for arbitrary content.</p>
5380    <p>Language Matching can also be used to get fallback data
5381    elements. In many cases, there may not be full data for a
5382    particular locale. For example, for a Breton speaker, the best
5383    fallback if data is unavailable might be French. That is,
5384    suppose we have found a Breton bundle, but it does not contain
5385    translation for the key "CN" (for the country China). It is
5386    best to return "chine", rather than falling back to the value
5387    default language such as Russian and getting "Кітай".&nbsp; The
5388    language matching data can be used to get the closest fallback
5389    locales (of those supported) to a given language.</p>
5390    <p>For the relationship between Inheritance, DefaultContent,
5391    LikelySubtags, and LocaleMatching, see <strong><em>Section
5392    4.2.6 <a href="tr35.html#Inheritance_vs_Related">Inheritance vs
5393    Related Information</a></em></strong>.</p>
5394    <p>When such fallback is used for inherited item lookup, the
5395    normal order of inheritance is used for inherited item lookup,
5396    except that before using any data from <strong>root</strong>,
5397    the data for the fallback locales would be used if available.
5398    Language matching does not interact with the fallback of
5399    resources&nbsp;<em>within the locale-parent chain</em>. For
5400    example, suppose that we are looking for the value for a
5401    particular path <strong>P</strong> in <strong>nb-NO</strong>.
5402    In the absence of aliases, normally the following lookup is
5403    used.</p>
5404    <blockquote>
5405      <p><strong>nb-NO</strong> → <strong>nb</strong> →
5406      <strong>root</strong></p>
5407    </blockquote>
5408    <p>That is, we first look in <strong>nb-NO</strong>. If there
5409    is no value for <strong>P</strong> there, then we look in
5410    <strong>nb</strong>. If there is no value for
5411    <strong>P</strong> there, we return the value for
5412    <strong>P</strong> in root (or a code value, if there is
5413    nothing there). Remember that if there is an alias element
5414    along this path, then the lookup may restart with a different
5415    path in <strong>nb-NO</strong> (or another locale).</p>
5416    <p>However, suppose that <strong>nb-NO</strong> has the
5417    fallback values <strong>[nn da sv en]</strong>, derived from
5418    language matching. In that case, an implementation <em>may</em>
5419    progressively lookup each of the listed locales, with the
5420    appropriate substitutions, returning the first value that is
5421    not found in <strong>root</strong>. This follows roughly the
5422    following pseudocode:</p>
5423    <ul>
5424      <li>value = lookup(P, nb-NO); if (locationFound != root)
5425      return value;</li>
5426      <li>value = lookup(P, nn-NO); if (locationFound != root)
5427      return value;</li>
5428      <li>value = lookup(P, da-NO); if (locationFound != root)
5429      return value;</li>
5430      <li>value = lookup(P, sv-NO); if (locationFound != root)
5431      return value;</li>
5432      <li>value = lookup(P, en-NO); return value;</li>
5433    </ul>
5434    <p>The locales in the fallback list are not used recursively.
5435    For example, for the lookup of a path in nb-NO, if
5436    <strong>fr</strong> were a fallback value for
5437    <strong>da</strong>, it would not matter for the above process.
5438    Only the original language matters.</p>
5439    <p>The language matching data is intended to be used according
5440    to the following algorithm. This is a logical description, and
5441    can be optimized for production in many ways. In this
5442    algorithm, the languageMatching data is interpreted as an
5443    ordered list.</p>
5444    <p>Distances between given pair of subtags can be larger or smaller than the typical distances. For example, the distance between en and en-GB can be greater than those between en-GB and en-IE. In some cases, language and/or script differences can be as small as the typical region difference. (Example: sr-Latn vs. sr-Cyrl).</p>
5445    <p>The distances resulting from the table are not linear, but are rather chosen to produce expected results. So a distance of 10 is not necessarily twice as &quot;bad&quot; as a distance of 5. Implementations may want to have a mode where script distances should swamp language distances. The tables are built such that this can be accomplished by multiplying the language distance by 0.25.</p>
5446    <p>The language matching algorithm takes a list of a user’s
5447    desired languages, and a list of the application’s supported
5448    languages.</p>
5449    <ul>
5450      <li>Set the best weighted distance BWD to ∞</li>
5451      <li>Set the best desired language BD to null</li>
5452      <li>Set the best supported language BS to null</li>
5453      <li>For each desired language D
5454        <ul>
5455			<li>Compute a demotion value F, based on the position in
5456          the list.
5457            <ul>
5458              <li>This demotion value is up to the implementation,
5459              but is typically a positive value that increases
5460              according to how far D is from the start of the
5461              desired language list.</li>
5462            </ul>
5463          </li>
5464          <li>For each supported language S
5465            <ul>
5466              <li>Find the matching distance MD as described
5467              below.</li>
5468              <li>Compute the weighted distance as F + MD</li>
5469              <li>If WD &lt; BD
5470                <ul>
5471                  <li>BWD = WD</li>
5472                  <li>BD = D</li>
5473                  <li>BS = S</li>
5474                </ul>
5475              </li>
5476            </ul>
5477          </li>
5478        </ul>
5479      </li>
5480      <li>If the BWD is less than a threshold, return &lt;BD, BS&gt;
5481        <ul>
5482        <li>The threshold is implementation-defined, typically
5483          set to greater than a default region difference, and less
5484        than a default script difference.</li>
5485        </ul>
5486      </li>
5487      <li>Otherwise BD = the default supported language (like
5488      English); return &lt;BD, null&gt;</li>
5489    </ul>
5490    <p>To find the matching distance MD between any two languages,
5491    perform the following steps.</p>
5492    <ol>
5493      <li>Maximize each language using Section 4.3 <a href=
5494      "#Likely_Subtags">Likely Subtags</a>.
5495        <ul>
5496          <li>und is a special case: see below.</li>
5497        </ul>
5498      </li>
5499      <li>Set the match-distance MD to 0</li>
5500      <li>For each subtag in {language, script, region}
5501<ol>
5502        <li>If respective subtags in each language tag are
5503          identical, remove the subtag from each (logically) and
5504        continue.</li>
5505        <li>Traverse the languageMatching data until a match is
5506          found.
5507          <ul>
5508            <li>* matches any field.</li>
5509            <li>If the oneway flag is false, then the match is
5510            symmetric; otherwise only match one direction.</li>
5511            <li>For region matching, use the mechanisms in <strong>Section 4.4.1 <a href=
5512              "#EnhancedLanguageMatching">Enhanced Language
5513            Matching</a></strong>.</li>
5514          </ul>
5515        </li>
5516		  <li>Add the <strong>distance</strong> attribute value  to MD.
5517		    <ul>
5518		      <li>This used to be a <strong>percent</strong> attribute value, which was 100 - the distance attribute value.</li>
5519	        </ul>
5520		  </li>
5521	    <li>Remove the subtag from each (logically)</li>
5522        </ol>
5523      </li>
5524      <li>Return MD</li>
5525    </ol>
5526    <p>It is typically useful to set the discount factor between
5527    successive elements of the desired languages list to be
5528    slightly greater than the default region difference. That
5529    avoids the following problem:<br></p>
5530    <p><em>Supported languages:</em> "de, fr, ja"<br></p>
5531    <p><em>User's desired languages:</em> "de-AT, fr"</p>
5532    <p>This user would expect to get "de", not "fr". In practice,
5533    when a user selects a list of preferred languages, they don't
5534    include all the regional variants ahead of their second base
5535    language. Yet while the user's desired languages really doesn't
5536    tell us the priority ranking among their languages, normally
5537    the fall-off between the user's languages is substantially
5538    greater than regional variants. But unless F is greater than
5539    the distance between de-AT and de-DE, then the user’s
5540    second-choice language would be returned.</p>
5541    <p>The base language subtag "und" is a special case. Suppose we
5542    have the following situation:</p>
5543    <ul>
5544      <li>desired languages: {und, it}</li>
5545      <li>supported languages: {en, it}</li>
5546      <li>resulting language: en<br></li>
5547    </ul>
5548    <p>Part of this is because 'und' has a special function in BCP
5549    47; it stands in for 'no supplied base language'. To prevent
5550    this from happening, if the desired base language is und, the
5551    language matcher should not apply likely subtags to
5552    it.&nbsp;</p>
5553    <p>Examples:</p>
5554    <p>For example, suppose that nn-DE and nb-FR are being
5555    compared. They are first maximized to nn-Latn-DE and
5556    nb-Latn-FR, respectively. The list is searched. The first match
5557    is with "*-*-*", for a match of 96%. The languages are
5558    truncated to nn-Latn and nb-Latn, then to nn and nb. The first
5559    match is also for a value of 96%, so the result is 92%.</p>
5560    <p>Note that language matching is orthogonal to the how closely
5561    two languages are related linguistically. For example, Breton
5562    is more closely related to Welsh than to French, but French is
5563    the better match (because it is more likely that a Breton
5564    reader will understand French than Welsh). This also
5565    illustrates that the matches are often asymmetric: it is not
5566    likely that a French reader will understand Breton.</p>
5567    <p>The "*" acts as a wild card, as shown in the following
5568    example:</p>
5569    <p class="example">&lt;languageMatch desired="es-*-ES"
5570    supported="es-*-ES" percent="100"/&gt;<br>
5571    &lt;!-- Latin American Spanishes are closer to each other.
5572    Approximate by having es-ES be further from everything
5573    else.--&gt;</p>
5574    <p>&nbsp;</p>
5575    <p class="example">&lt;languageMatch desired="es-*-ES"
5576    supported="es-*-*" percent="93"/&gt;</p>
5577    <p class="example"><br>
5578    &lt;languageMatch desired="*" supported="*"
5579    percent="1"/&gt;<br>
5580    &lt;!-- [Default value - must be at end!] Normally there is no
5581    comprehension of different languages.--&gt;</p>
5582    <p class="example"><br>
5583    &lt;languageMatch desired="*-*" supported="*-*"
5584    percent="20"/&gt;<br>
5585    &lt;!-- [Default value - must be at end!] Normally there is
5586    little comprehension of different scripts.--&gt;</p>
5587    <p class="example"><br>
5588    &lt;languageMatch desired="*-*-*" supported="*-*-*"
5589    percent="96"/&gt;<br>
5590    &lt;!-- [Default value - must be at end!] Normally there are
5591    small differences across regions.--&gt;</p>
5592    <p>When the language+region is not matched, and there is
5593    otherwise no reason to pick among the supported regions for
5594    that language, then some measure of geographic "closeness" can
5595    be used. The results may be more understandable by users.
5596    Looking for en-SK, for example, should fall back to something
5597    within Europe (eg en-GB) in preference to something far away
5598    and unrelated (eg en-SG). Such a closeness metric does not need
5599    to be exact; a small amount of data can be used to give an
5600    approximate distance between any two regions. However, any such
5601    data must be used carefully; although Hong Kong is closer to
5602    India than to the UK, it is unlikely that en-IN would be a
5603    better match to en-HK than en-GB would.</p>
5604    <h4><a name="EnhancedLanguageMatching" href=
5605    "#EnhancedLanguageMatching" id="EnhancedLanguageMatching">4.4.1
5606    Enhanced Language Matching</a></h4>
5607    <p>The enhanced format for language matching adds structure to
5608    enable better matching of languages. It is distinguished by
5609    having a suffix "_new" on the type, as in the example below.
5610    The extended structure allows matching to take into account
5611    broad similarities that would give better results. For example,
5612    for English the regions that are or inherit from US
5613    (AS|GU|MH|MP|PR|UM|VI|US) form a “cluster”. Each region in that
5614    cluster should be closer to each other than to any other
5615    region. And a region outside the cluster should be closer to
5616    another region outside that cluster than to one inside. We get
5617    this issue with the “world languages” like English, Spanish,
5618    Portuguese, Arabic, etc.</p>
5619    <p><em>Example:</em></p>
5620    <pre>
5621    &lt;languageMatches type="written_new"&gt;<br>  &lt;paradigmLocales locales="en en-GB es es-419 pt-BR pt-PT"/&gt;<br> &lt;matchVariable id="$enUS" value="AS+GU+MH+MP+PR+UM+US+VI"/&gt;<br>       &lt;matchVariable id="$cnsar" value="HK+MO"/&gt;<br>        &lt;matchVariable id="$americas" value="019"/&gt;<br>       &lt;matchVariable id="$maghreb" value="MA+DZ+TN+LY+MR+EH"/&gt;<br>  &lt;languageMatch desired="no" supported="nb" distance="1"/&gt;&lt;!-- no ⇒ nb --&gt;<br>…
5622        &lt;languageMatch desired="ar_*_$maghreb" supported="ar_*_$maghreb" distance="4"/&gt;
5623                &lt;!-- ar; *; $maghreb ⇒ ar; *; $maghreb --&gt;
5624        &lt;languageMatch desired="ar_*_$!maghreb"    supported="ar_*_$!maghreb"    distance="4"/&gt;
5625                &lt;!-- ar; *; $!maghreb ⇒ ar; *; $!maghreb --&gt;<br>…</pre>
5626    <p>The <strong>matchVariable</strong> allows for a rule to
5627    matche to multiple regions, as illustrated by
5628    <strong>$maghreb</strong>. The syntax is simple: it allows for
5629    + for <em>union</em> and - for <em>set difference</em>, but no
5630    precedence. So A+B-A+D is interpreted as (((A+B)-A)+D), not as
5631    (A+B)-(A+D). The variable <strong>id</strong> has a value of
5632    the form [$][a-zA-Z0-9]+. If $X is defined, then $!X
5633    automatically means all those regions that are not in $X.</p>
5634    <p dir="ltr">When the set is interpreted, then macrolanguages
5635    are (logically) transformed into a list of their contents, so
5636    “053+GB” → “AU+GB+NF+NZ”. This is done recursively, so 009 →
5637    “053+054+057+061+QO” → “AU+NF+NZ+FJ+NC+PG+SB +VU...”. Note that
5638    we use 019 for all of the Americas in the variables above,
5639    because en-US should be in the same cluster as es-419 and its
5640    contents.</p>
5641    <p>In the rules, the percent value (100..0) is replaced by a
5642    <strong>distance</strong> value, which is the inverse
5643    (0..100).</p>
5644    <p dir="ltr">These new variables and rules divide up the world
5645    into clusters, where items in the same clusters (for specific
5646    languages) get the normal regional difference, and items in
5647    different clusters get different weights.</p><br>
5648    <p dir="ltr">Each cluster can have one or more associated
5649    <strong>paradigmLocales</strong>. These are locales that are
5650    preferred within a cluster. So when matching desired=[en-SA]
5651    against [en-GU en en-IN en-GB], the value en-GB is returned.
5652    Both of {en-GU en} are in a different cluster. While {en-IN
5653    en-GB} are in the same cluster, and the same distance from
5654    en-SA, the preference is given to en-GB because it is in the
5655    paradigm locales. It would be possible to express this in
5656    rules, but using this mechanism handles these very common cases
5657    without bulking up the tables.<br></p>
5658    <p dir="ltr">The <strong>paradigmLocales</strong> also allow
5659    matching to macroregions. For example, desired=[es-419] should
5660    match to {es-MX} more closely than to {es}, and vice versa:
5661    {es-MX} should match more closely to {es-419} than to {es}. But
5662    es-MX should match more closely to es-419 than to any of the
5663    other es-419 sublocales. In general, in the absence of other
5664    distance data, there is a ‘paradigm’ in each cluster that the
5665    others should match more closely to: en(-US), en-GB, es(-ES),
5666    es-419, ru(-RU)...</p>
5667    <h2><a name="XML_Format" href="#XML_Format" id="XML_Format">5
5668    XML Format</a></h2>
5669    <p>There are two kinds of data that can be expressed in LDML:
5670    language-dependent data and supplementary data. In either case,
5671    data can be split across multiple files, which can be in
5672    multiple directory trees.</p>
5673    <p>For example, the language-dependent data for Japanese in
5674    CLDR is present in the following files:</p>
5675    <ul>
5676      <li>common/collation/ja.xml</li>
5677      <li>common/main/ja.xml</li>
5678      <li>common/rbnf/ja.xml</li>
5679      <li>common/segmentations/ja.xml</li>
5680    </ul>
5681    <p>Data for cased languages such as French are in files
5682    like:</p>
5683    <ul>
5684      <li>common/casing/fr.xml</li>
5685    </ul>
5686    <p>The status of the data is the same, whether or not data is
5687    split. That is, for the purpose of validation and lookup, all
5688    of the data for the above ja.xml files is treated as if it was
5689    in a single file. These files have the &lt;ldml&gt; root
5690    element and use ldml.dtd. The file name must match the identity
5691    element. For example, the &lt;ldml&gt; file pa_Arab_PK.xml must
5692    contain the following elements:</p>
5693    <pre>
5694                        <strong>&lt;ldml&gt;</strong><br>       &lt;identity&gt;<br>            …<br>           <strong>&lt;language type="pa"/&gt;<br>               &lt;script type="Arab"/&gt;<br>               &lt;territory type="PK"/&gt;</strong><br>     &lt;/identity&gt;
5695…</pre>
5696    <p>Supplemental data can have different root elements,
5697    currently: ldmlBCP47, supplementalData, keyboard, and platform.
5698    Keyboard and platform files are considered distinct. The
5699    ldmlBCP47 files and supplementalData files that have the same
5700    root are all logically part of the same file; they are simply
5701    split into separate files for convenience. Implementations may
5702    split the files in different ways, also for their convenience.
5703    The files in /properties are also supplemental data files, but
5704    are structured like UCD properties.</p>
5705    <p>For example, supplemental data relating to Japan or the
5706    Japanese writing are in:</p>
5707    <ul>
5708      <li>common/supplemental/ (in many files, such as
5709      supplementalData.xml)</li>
5710      <li>common/transforms/Hiragana-Katakana.xml</li>
5711      <li>common/transforms/Hiragana-Latin.xml</li>
5712      <li>common/properties/scriptMetadata.txt</li>
5713      <li>common/bcp47/calendar.xml</li>
5714      <li>uca/allkeys_CLDR.txt (sorting)</li>
5715      <li>/keyboards/chromeos/ja-t-k0-chromeos.xml</li>
5716      <li>...</li>
5717    </ul>
5718    <p>Like the &lt;ldml&gt; files, the keyboard file names must
5719    match internal data: in particular, the locale attribute on the
5720    keyboard element must have a value that corresponds to the file
5721    name, such as &lt;keyboard locale="af-t-k0-android"&gt; for the
5722    file af-t-k0-android.xml.</p>
5723    <p>The following sections describe the structure of the XML
5724    format for language-dependent data. The more precise syntax is
5725    in the ldml.dtd file<i>; however, the DTD does not describe all
5726    the constraints on the structure.</i></p>
5727    <p>To start with, the root element is &lt;ldml&gt;, with the
5728    following DTD entry:</p>
5729    <p class='dtd'>&lt;!ELEMENT ldml
5730    (identity,(alias|(fallback*,localeDisplayNames?,layout?,contextTransforms?,characters?,<br>
5731
5732    delimiters?,measurement?,dates?,numbers?,units?,listPatterns?,collations?,posix?,<br>
5733
5734    segmentations?,rbnf?,annotations?,metadata?,references?,special*)))&gt;</p>
5735    <p>The XML structure is stable over releases. Elements and
5736    attributes may be deprecated: they are retained in the DTD but
5737    their usage is strongly discouraged. In most cases, an
5738    alternate structure is provided for expressing the information.
5739    There is only one exception: newer DTDs cannot be used with
5740    version 1.1 files, without some modification.</p>
5741    <p>In general, all translatable text in this format is in
5742    element contents, while attributes are reserved for types and
5743    non-translated information (such as numbers or dates). The
5744    reason that attributes are not used for translatable text is
5745    that spaces are not preserved, and we cannot predict where
5746    spaces may be significant in translated material.</p>
5747    <p>There are two kinds of elements in LDML: <i>rule</i>
5748    elements and <i>structure</i> elements. For structure elements,
5749    there are restrictions to allow for effective inheritance and
5750    processing:</p>
5751    <ol>
5752      <li>There is no "mixed" content: if an element has textual
5753      content, then it cannot contain any elements.</li>
5754      <li>The [<a href="#XPath">XPath</a>] leading to the content
5755      is unique; no two different pieces of textual content have
5756      the same [<a href="#XPath">XPath</a>].</li>
5757    </ol>
5758    <p>Rule elements do not have this restriction, but also do not
5759    inherit, except as an entire block. The rule elements are
5760    listed in serialElements in the supplemental metadata. See also
5761    <i><a href="#Inheritance_and_Validity">Section 4.2 Inheritance
5762    and Validity</a></i>. For more technical details, see <a href=
5763    "http://cldr.unicode.org/development/updating-dtds">Updating-DTDs</a>.</p>
5764    <p>Note that the data in examples given below is purely
5765    illustrative, and does not match any particular language. For a
5766    more detailed example of this format, see [<a href=
5767    "#LDML">Example</a>]. There is also a DTD for this format, but
5768    <i>remember that the DTD alone is not sufficient to understand
5769    the semantics, the constraints, nor&nbsp; the
5770    interrelationships between the different elements and
5771    attributes</i>. You may wish to have copies of each of these to
5772    hand as you proceed through the rest of this document.</p>
5773    <p>In particular, all elements allow for draft versions to
5774    coexist in the file at the same time. Thus most elements are
5775    marked in the DTD as allowing multiple instances. However,
5776    unless an element is listed as a serialElement, or has a
5777    distinguishing attribute, it can only occur once as a
5778    subelement of a given element. Thus, for example, the following
5779    is illegal even though allowed by the DTD:</p>
5780    <p>&lt;languages&gt;<br>
5781    &nbsp; &lt;language type="aa"&gt;...&lt;/language&gt;<br>
5782    &nbsp; &lt;language type="aa"&gt;..&lt;/language&gt;</p>
5783    <p>There must be only one instance of these per parent, unless
5784    there are other distinguishing attributes (such as an alt
5785    element).</p>
5786    <p>In general, LDML data should be in NFC format. However,
5787    certain elements may need to contain characters that are not in
5788    NFC, including exemplars, transforms, segmentations, and
5789    p/s/t/i/pc/sc/tc/ic rules in collation. These elements must not
5790    be normalized (either to NFC or NFD), or their meaning may be
5791    changed. Thus LDML documents must not be normalized as a whole.
5792    To prevent problems with normalization, no element value can
5793    start with a combining slash (U+0338 COMBINING LONG SOLIDUS
5794    OVERLAY).</p>
5795    <p>Lists, such as <span class=
5796    "attribute">singleCountries</span> are space-delimited. That
5797    means that they are separated by one or more XML whitespace
5798    characters,</p>
5799    <ul>
5800      <li>singleCountries</li>
5801      <li>preferenceOrdering</li>
5802      <li>references</li>
5803    </ul>
5804    <h3><a name="Common_Elements" href="#Common_Elements" id=
5805    "Common_Elements">5.1 Common Elements</a></h3>
5806    <p>At any level in any element, two special elements are
5807    allowed.</p>
5808    <h4><a name="special" href="#special" id="special">5.1.1
5809    Element special</a></h4>
5810    <p>This element is designed to allow for arbitrary additional
5811    annotation and data that is product-specific. It has one
5812    required attribute <span class="attribute">xmlns</span>, which
5813    specifies the XML <a href=
5814    "https://www.w3.org/TR/REC-xml-names/">namespace</a> of the
5815    special data. For example, the following used the version 1.0
5816    POSIX special element.</p>
5817    <pre>&lt;!DOCTYPE ldml SYSTEM "<span style=
5818    "color: blue">https://unicode.org/cldr/dtd/1.0/ldml.dtd</span>" [
5819    &lt;!ENTITY % posix SYSTEM "<span style=
5820"color: blue">https://unicode.org/cldr/dtd/1.0/ldmlPOSIX.dtd</span>"&gt;
5821<span style="color: blue">%posix;</span>
5822]&gt;
5823&lt;ldml&gt;
5824...
5825&lt;special xmlns:posix="<span style=
5826"color: blue">https://www.opengroup.org/regproducts/xu.htm</span>"&gt;
5827        <span style=
5828"color: green">&lt;!-- old abbreviations for pre-GUI days --&gt;</span>
5829        &lt;posix:messages&gt;
5830            &lt;posix:yesstr&gt;<span style=
5831"color: blue">Yes</span>&lt;/posix:yesstr&gt;
5832            &lt;posix:nostr&gt;<span style=
5833"color: blue">No</span>&lt;/posix:nostr&gt;
5834            &lt;posix:yesexpr&gt;<span style=
5835"color: blue">^[Yy].*</span>&lt;/posix:yesexpr&gt;
5836            &lt;posix:noexpr&gt;<span style=
5837"color: blue">^[Nn].*</span>&lt;/posix:noexpr&gt;
5838        &lt;/posix:messages&gt;
5839    &lt;/special&gt;
5840&lt;/ldml&gt;
5841</pre>
5842    <h5><a name="Sample_Special_Elements" href=
5843    "#Sample_Special_Elements" id="Sample_Special_Elements">5.1.1.1
5844    Sample Special Elements</a></h5>
5845    <p>The elements in this section are <i><b>not</b></i> part of
5846    the Locale Data Markup Language 1.0 specification. Instead,
5847    they are special elements used for application-specific data to
5848    be stored in the Common Locale Repository. They may change or
5849    be removed future versions of this document, and are present
5850    her more as examples of how to extend the format. (Some of
5851    these items may move into a future version of the Locale Data
5852    Markup Language specification.)</p>
5853    <ul>
5854      <li><a href=
5855      "https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd">https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</a></li>
5856      <li><a href=
5857      "https://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd">https://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd</a></li>
5858    </ul>
5859    <p>The above examples are old versions: consult the
5860    documentation for the specific application to see which should
5861    be used.</p>
5862    <p>These DTDs use namespaces and the special element. To
5863    include one or more, use the following pattern to import the
5864    special DTDs that are used in the file:</p>
5865    <pre>&lt;?xml version="<span style=
5866    "color: blue">1.0</span>" encoding="<span style=
5867    "color: blue">UTF-8</span>" ?&gt;
5868&lt;!DOCTYPE ldml SYSTEM "<span style=
5869"color: blue">https://unicode.org/cldr/dtd/1.1/ldml.dtd</span>" [
5870    &lt;!ENTITY % <span style=
5871"color: blue">icu</span> SYSTEM "<span style=
5872"color: blue">https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</span>"&gt;
5873    &lt;!ENTITY % <span style=
5874"color: blue">openOffice</span> SYSTEM "<span style=
5875"color: blue">https://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd</span>"&gt;
5876<span style="color: blue">%icu;
5877%openOffice;
5878</span>]&gt;</pre>
5879    <p>Thus to include just the ICU DTD, one uses:</p>
5880    <pre>&lt;?xml version="<span style=
5881    "color: blue">1.0</span>" encoding="<span style=
5882    "color: blue">UTF-8</span>" ?&gt;
5883&lt;!DOCTYPE ldml SYSTEM "<span style=
5884"color: blue">https://unicode.org/cldr/dtd/1.1/ldml.dtd</span>" [
5885    &lt;!ENTITY % icu SYSTEM "<span style=
5886"color: blue">https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</span>"&gt;
5887<span style="color: blue">%icu;
5888</span>]&gt;</pre>
5889    <blockquote>
5890      <p><b>Note:</b> A previous version of this document contained
5891      a special element for <a href=
5892      "http://www.open-std.org/jtc1/sc22/wg20/docs/n897-14652w25.pdf">
5893      ISO TR 14652</a> compatibility data. That element has been
5894      withdrawn, pending further investigation, since 14652 is a
5895      Type 1 TR: "when the required support cannot be obtained for
5896      the publication of an International Standard, despite
5897      repeated effort". See the ballot comments on <a href=
5898      "http://www.open-std.org/jtc1/sc22/wg20/docs/n948-J1N6769-14652.pdf">
5899      14652 Comments</a> for details on the 14652 defects. For
5900      example, most of these patterns make little provision for
5901      substantial changes in format when elements are empty, so are
5902      not particularly useful in practice. Compare, for example,
5903      the mail-merge capabilities of production software such as
5904      Microsoft Word or OpenOffice.</p>
5905      <p><b>Note:</b> While the CLDR specification guarantees
5906      backwards compatibility, the definition of specials is up to
5907      other organizations. Any assurance of backwards compatibility
5908      is up to those organizations.</p>
5909    </blockquote>
5910    <p>A number of the elements above can have extra information
5911    for <a name="OpenOffice" href="#OpenOffice" id=
5912    "OpenOffice">openoffice.org</a>, such as the following
5913    example:</p>
5914    <pre>    &lt;special xmlns:openOffice="<span style=
5915    "color: blue">https://www.openoffice.org</span>"&gt;
5916        &lt;openOffice:search&gt;
5917            &lt;openOffice:searchOptions&gt;
5918                &lt;openOffice:transliterationModules&gt;<span style="color: blue">IGNORE_CASE</span>&lt;/openOffice:transliterationModules&gt;
5919            &lt;/openOffice:searchOptions&gt;
5920        &lt;/openOffice:search&gt;
5921    &lt;/special&gt;
5922</pre>
5923    <h4><a name="Alias_Elements" href="#Alias_Elements" id=
5924    "Alias_Elements">5.1.2 Element alias</a></h4>
5925    <p class="dtd">&lt;!ELEMENT alias (special*) &gt;<br>
5926    &lt;!ATTLIST alias source NMTOKEN #REQUIRED &gt;<br>
5927    &lt;!ATTLIST alias path CDATA #IMPLIED&gt;</p>
5928    <p>The contents of any element in root can be replaced by an
5929    alias, which points to the path where the data can be
5930    found.</p>
5931    <p>Aliases will only ever appear in root with the form
5932    //ldml/.../alias[@source="locale"][@path="..."].</p>
5933    <p>Consider the following example in root:</p>
5934    <pre>
5935      &lt;calendar type="gregorian"&gt;<br> &lt;months&gt;<br>      &lt;default choice="format"/&gt;<br>      &lt;monthContext type="format"&gt;<br>            &lt;default choice="wide"/&gt;<br>            &lt;monthWidth type="abbreviated"&gt;<br>             <strong>&lt;alias source="locale" path="../monthWidth[@type='wide']"/&gt;</strong><br>                      &lt;/monthWidth&gt;</pre>
5936    <p>If the locale "de_DE" is being accessed for a month name for
5937    format/abbreviated, then a resource bundle at "de_DE" will be
5938    searched for a resource element at the that path. If not found
5939    there, then the resource bundle at "de" will be searched, and
5940    so on. When the alias is found in root, then the search is
5941    restarted, but searching for format/<strong>wide</strong>
5942    element instead of format/abbreviated.</p>
5943    <p>If the <b>path</b> attribute is present, then its value is
5944    an [<a href="#XPath">XPath</a>] that points to a different node
5945    in the tree. For example:</p>
5946    <pre>
5947    &lt;alias source="locale" path="../monthWidth[@type='wide']"/&gt;</pre>
5948    <p>The default value if the path is not present is the same
5949    position in the tree. All of the attributes in the [<a href=
5950    "#XPath">XPath</a>] must be <i>distinguishing</i> elements. For
5951    more details, see <a href="#Inheritance_and_Validity">Section
5952    4.2 Inheritance and Validity</a>.</p>
5953    <p>There is a special value for the source attribute, the
5954    constant <b>source="locale"</b>. This special value is
5955    equivalent to the locale being resolved. For example, consider
5956    the following example, where locale data for 'de' is being
5957    resolved:</p>
5958    <div align="center">
5959      <center>
5960        <table border="1" cellpadding="0" cellspacing="1">
5961          <caption>
5962            <a name="Inheritance_with_source_locale_" href=
5963            "#Inheritance_with_source_locale_" id=
5964            "Inheritance_with_source_locale_">Inheritance with
5965            source="locale"</a>
5966          </caption>
5967          <tr>
5968            <th>Root</th>
5969            <th>de</th>
5970            <th bgcolor="#C0C0C0">Resolved</th>
5971          </tr>
5972          <tr>
5973            <td><code>&lt;x&gt;<br>
5974            &nbsp; &lt;a&gt;1&lt;/a&gt;<br>
5975            &nbsp; &lt;b&gt;2&lt;/b&gt;<br>
5976            &nbsp; &lt;c&gt;3&lt;/c&gt;<br>
5977            <br>
5978            &lt;/x&gt;</code></td>
5979            <td><code>&lt;x&gt;<br>
5980            &nbsp;&lt;a&gt;11&lt;/a&gt;<br>
5981            &nbsp;&lt;b&gt;12&lt;/b&gt;<br>
5982            <br>
5983            &nbsp;&lt;d&gt;14&lt;/d&gt;<br>
5984            &lt;/x&gt;</code></td>
5985            <td bgcolor="#C0C0C0"><code>&lt;x&gt;<br>
5986            &nbsp;&lt;a&gt;11&lt;/a&gt;<br>
5987            &nbsp;&lt;b&gt;12&lt;/b&gt;<br>
5988            &nbsp;<span style=
5989            "background-color: #FFFF00"><span class=
5990            "inherited"><span style=
5991            "font-weight: 400;">&lt;c&gt;3&lt;/c&gt;</span></span></span><br>
5992
5993            &nbsp;&lt;d&gt;14&lt;/d&gt;<br>
5994            &lt;/x&gt;</code></td>
5995          </tr>
5996          <tr>
5997            <td><code>&lt;y&gt;<br>
5998            &nbsp;&lt;alias source="locale" path="../x"&gt;<br>
5999            &lt;/y&gt;</code></td>
6000            <td><code>&lt;y&gt;<br>
6001            <br>
6002            &nbsp;&lt;b&gt;22&lt;/b&gt;<br>
6003            <br>
6004            <br>
6005            &nbsp;&lt;e&gt;25&lt;/e&gt;<br>
6006            &lt;/y&gt;</code></td>
6007            <td bgcolor="#C0C0C0"><code>&lt;y&gt;<br>
6008            &nbsp;<span style=
6009            "background-color: #FFFF00"><span class=
6010            "inherited"><span style=
6011            "font-weight: 400;">&lt;a&gt;11&lt;/a&gt;</span></span></span><br>
6012
6013            &nbsp;&lt;b&gt;22&lt;/b&gt;<br>
6014            &nbsp;<span style=
6015            "background-color: #FFFF00"><span class=
6016            "inherited"><span style=
6017            "font-weight: 400;">&lt;c&gt;3&lt;/c&gt;</span></span></span><br>
6018
6019            &nbsp;<span style=
6020            "background-color: #FFFF00"><span class=
6021            "inherited"><span style=
6022            "font-weight: 400;">&lt;d&gt;14&lt;/d&gt;</span></span></span><br>
6023
6024            &nbsp;&lt;e&gt;25&lt;/e&gt;<br>
6025            &lt;/y&gt;</code></td>
6026          </tr>
6027        </table>
6028      </center>
6029    </div>
6030    <p>The first row shows the inheritance within the &lt;x&gt;
6031    element, whereby &lt;c&gt; is inherited from root. The second
6032    shows the inheritance within the &lt;y&gt; element, whereby
6033    &lt;a&gt;, &lt;c&gt;, and &lt;d&gt; are inherited also from
6034    root, but from an alias there. The alias in root is logically
6035    replaced not by the elements in root itself, but by elements in
6036    the 'target' locale.</p>
6037    <p>For more details on data resolution, see <a href=
6038    "#Inheritance_and_Validity">Section 4.2 Inheritance and
6039    Validity</a>.</p>
6040    <p>Aliases must be resolved recursively. An alias may point to
6041    another path that results in another alias being found, and so
6042    on. For example, looking up Thai buddhist abbreviated months
6043    for the locale <strong>xx-YY</strong> may result in the
6044    following chain of aliases being followed:</p>
6045    <blockquote>
6046      <p>
6047      ../../calendar[@type="buddhist"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"]</p>
6048      <p>xx-YY → xx → root // finds alias that changes path to:</p>
6049      <p>
6050      ../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"]</p>
6051      <p>xx-YY → xx → root // finds alias that changes path to:</p>
6052      <p>
6053      ../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="wide"]</p>
6054      <p>xx-YY → xx // finds value here</p>
6055    </blockquote>
6056    <p>It is an error to have a circular chain of aliases. That is,
6057    a collection of LDML XML documents must not have situations
6058    where a sequence of alias lookups (including inheritance and
6059    lateral inheritance) can be followed indefinitely without
6060    terminating.</p>
6061    <h4><a name="Element_displayName" href="#Element_displayName"
6062    id="Element_displayName">5.1.3 Element displayName</a></h4>
6063    <p>Many elements can have a display name. This is a translated
6064    name that can be presented to users when discussing the
6065    particular service. For example, a number format, used to
6066    format numbers using the conventions of that locale, can have
6067    translated name for presentation in GUIs.</p>
6068    <pre>  &lt;numberFormat&gt;
6069    &lt;displayName&gt;<span style=
6070"color: blue">Prozentformat</span>&lt;/displayName&gt;
6071...
6072  &lt;numberFormat&gt;</pre>
6073    <p>Where present, the display names must be unique; that is,
6074    two distinct code would not get the same display name.&nbsp;
6075    (There is one exception to this: in time zones, where parsing
6076    results would give the same GMT offset, the standard and
6077    daylight display names can be the same across different time
6078    zone IDs.) Any translations should follow customary practice
6079    for the locale in question. For more information, see [<a href=
6080    "#DataFormats">Data Formats</a>].</p>
6081    <h4><a name="Escaping_Characters" href="#Escaping_Characters"
6082    id="Escaping_Characters">5.1.4 Escaping Characters</a></h4>
6083    <p>Unfortunately, XML does not have the capability to contain
6084    all Unicode code points. Due to this, in certain instances
6085    extra syntax is required to represent those code points that
6086    cannot be otherwise represented in element content. The
6087    escaping syntax is only defined on a few types of elements,
6088    such as in collation or exemplar sets, and uses the appropriate
6089    syntax for that type.</p>
6090    <p>The element &lt;cp&gt;, which was formerly used for this
6091    purpose, has been deprecated.</p>
6092    <h3><a name="Common_Attributes" href="#Common_Attributes" id=
6093    "Common_Attributes">5.2 Common Attributes</a></h3>
6094    <h4><a name="Attribute_type" href="#Attribute_type" id=
6095    "Attribute_type">5.2.1 Attribute type</a></h4>
6096    <p>The attribute <i>type</i> is also used to indicate an
6097    alternate resource that can be selected with a matching
6098    type=option in the locale id modifiers, or be referenced by a
6099    default element. For example:</p>
6100    <pre>&lt;ldml&gt;
6101  ...
6102  &lt;currencies&gt;
6103    &lt;currency&gt;<span style=
6104"color: blue">...</span>&lt;/currency&gt;
6105    &lt;currency type="<span style=
6106"color: blue">preEuro</span>"&gt;<span style=
6107"color: blue">...</span>&lt;/currency&gt;
6108  &lt;/currencies&gt;
6109&lt;/ldml&gt;</pre>
6110    <h4><a name="Attribute_draft" href="#Attribute_draft" id=
6111    "Attribute_draft">5.2.2 Attribute draft</a></h4>
6112    <p>If this attribute is present, it indicates the status of all
6113    the data in this element and any subelements (unless they have
6114    a contrary <i>draft</i> value), as per the following:</p>
6115    <ul>
6116      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
6117      <i>approved:</i> fully approved by the technical committee
6118      (equals the CLDR 1.3 value of <i>false</i>, or an absent
6119      <i>draft</i> attribute). This does not mean that the data is
6120      guaranteed to be error-free—this is the best judgment of the
6121      committee.</li>
6122      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
6123      <i>contributed</i>: partially approved by the technical
6124      committee.</li>
6125      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
6126      <i>provisional</i>: partially confirmed. Implementations may
6127      choose to accept the provisional data, especially if there is
6128      no translated alternative.</li>
6129      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
6130      <i>unconfirmed</i>: no confirmation available.</li>
6131    </ul>
6132    <p>For more information on precisely how these values are
6133    computed for any given release, see
6134	<a href=
6135    "http://cldr.unicode.org/index/process#TOC-Data--Submission-and-Vetting">
6136    Data Submission and Vetting Process</a> on the CLDR
6137    website.</p>
6138    <p>The draft attribute should only occur on "leaf" elements,
6139    and is deprecated elsewhere. For a more formal description of
6140    how elements are inherited, and what their draft status is, see
6141    <i><a href="#Inheritance_and_Validity">Section 4.2 Inheritance
6142    and Validity</a></i>.</p>
6143    <h4><a name="alt_attribute" href="#alt_attribute" id=
6144    "alt_attribute">5.2.3 Attribute alt</a></h4>
6145    <p>This attribute labels an alternative value for an element.
6146    The value is a <i>descriptor</i> indicates what kind of
6147    alternative it is, and takes one of the following</p>
6148    <ul>
6149      <li><i>variantname</i> meaning that the value is a variant of
6150      the normal value, and may be used in its place in certain
6151      circumstances. If a variant value is absent for a particular
6152      locale, the normal value is used. The variant mechanism
6153      should only be used when such a fallback is acceptable.</li>
6154      <li><span style="color: blue">proposed</span>, optionally
6155      followed by a number, indicating that the value is a proposed
6156      replacement for an existing value.</li>
6157      <li><i>variantname</i><span style=
6158      "color: blue">-proposed</span>, optionally followed by a
6159      number, indicating that the value is a proposed replacement
6160      variant value.</li>
6161    </ul>
6162    <p>"<span style="color: blue">proposed</span>" should only be
6163    present if the draft status is not "approved". It indicates
6164    that the data is proposed replacement data that has been added
6165    provisionally until the differences between it and the other
6166    data can be vetted. For example, suppose that the translation
6167    for September for some language is "Settembru", and a bug
6168    report is filed that that should be "Settembro". The new data
6169    can be entered in, but marked as <i>alt="proposed"</i> until it
6170    is vetted.</p>
6171    <pre>...
6172&lt;month type="9"&gt;Settembru&lt;/month&gt;
6173&lt;month type="9" draft="unconfirmed" alt="proposed"&gt;Settembro&lt;/month&gt;
6174&lt;month type="10"&gt;...</pre>
6175    <p>Now assume another bug report comes in, saying that the
6176    correct form is actually "Settembre". Another alternative can
6177    be added:</p>
6178    <pre>...
6179&lt;month type="9" draft="unconfirmed" alt="proposed2"&gt;Settembre&lt;/month&gt;
6180...</pre>
6181    <p>The values for <i>variantname</i> at this time include
6182    "<span style="color: blue">variant</span>", "<span style=
6183    "color: blue">list</span>", "<span style=
6184    "color: blue">email</span>", "<span style=
6185    "color: blue">www</span>", "<span class=
6186    "attributeValue">short</span>", and "<span style=
6187    "color: blue">secondary</span>".</p>
6188    <p>For a more complete description of how draft applies to
6189    data, see <i><a href="#Inheritance_and_Validity">Section 4.2
6190    Inheritance and Validity</a></i>.</p>
6191    <p class="element2">Attribute <a name="references_attribute"
6192    href="#references_attribute" id=
6193    "references_attribute">references</a></p>
6194    <p>The value of this attribute is a token representing a
6195    reference for the information in the element, including
6196    standards that it may conform to. &lt;references&gt;. (In older
6197    versions of CLDR, the value of the attribute was freeform text.
6198    That format is deprecated.)</p>
6199    <p><i>Example:</i></p>
6200    <p class="example">&lt;territory type="UM"
6201    references="R222"&gt;USAs yttre öar&lt;/territory&gt;</p>
6202    <p>The reference element may be inherited. Thus, for example,
6203    R222 may be used in sv_SE.xml even though it is not defined
6204    there, if it is defined in sv.xml.</p>
6205    <p>&lt;... allow="verbatim" ...&gt; (deprecated)</p>
6206    <p>This attribute was originally intended for use in marking
6207    display names whose capitalization differed from what was
6208    indicated by the now-deprecated &lt;inText&gt; element
6209    (perhaps, for example, because the names included a proper
6210    noun). It was never supported in the dtd and is not needed for
6211    use with the new &lt;contextTransforms&gt; element.</p>
6212    <h3><a name="Common_Structures" href="#Common_Structures" id=
6213    "Common_Structures">5.3 Common Structures</a></h3>
6214    <h4><a name="Date_Ranges" href="#Date_Ranges" id=
6215    "Date_Ranges">5.3.1 Date and Date Ranges</a></h4>
6216    <p>When attribute specify date ranges, it is usually done with
6217    attributes <i>from</i> and <i>to</i>. The <i>from</i> attribute
6218    specifies the starting point, and the <i>to</i> attribute
6219    specifies the end point. The deprecated <i>time</i> attribute
6220    was formerly used to specify time with the deprecated
6221    weekEndStart and weekEndEnd elements, which were themselves
6222    inherently <i>from</i> or <i>to</i>.</p>
6223    <p>The data format is a restricted ISO 8601 format, restricted
6224    to the fields <i>year, month, day, hour, minute,</i> and
6225    <i>second</i> in that order, with "-" used as a separator
6226    between date fields, a space used as the separator between the
6227    date and the time fields, and ":" used as a separator between
6228    the time fields. If the minute or minute and second are absent,
6229    they are interpreted as zero. If the hour is also missing, then
6230    it is interpreted based on whether the attribute is <i>from</i>
6231    or <i>to</i>.</p>
6232    <ul>
6233      <li>
6234        <p class="note"><i>from</i> defaults to "00:00:00"
6235        (midnight at the start of the day).</p>
6236      </li>
6237      <li>
6238        <p class="note"><i>to</i> defaults to "24:00:00" (midnight
6239        at the end of the day).</p>
6240      </li>
6241    </ul>
6242    <p class="note">That is, Friday at 24:00:00 is the same time as
6243    Saturday at 00:00:00. Thus when the hour is missing, the
6244    <i>from and to</i> are interpreted inclusively: the range
6245    includes all of the day mentioned.</p>
6246    <p class="note">For example, the following are equivalent:</p>
6247    <table style="margin-top: 0.5em; margin-bottom: 0.5em" id=
6248    "table25">
6249      <tr>
6250        <td>&lt;usesMetazone from="1991-10-27" to="2006-04-02"
6251        .../&gt;</td>
6252      </tr>
6253      <tr>
6254        <td>&lt;usesMetazone from="1991-10-27 00:00:00"
6255        to="2006-04-02 24:00:00" .../&gt;</td>
6256      </tr>
6257      <tr>
6258        <td>&lt;usesMetazone from="1991-10-<font color=
6259        "#FF0000"><b>26 24</b></font>:00:00"
6260        to="2006-04-<font color="#FF0000"><b>03
6261        00</b></font>:00:00" .../&gt;</td>
6262      </tr>
6263    </table>
6264    <p>If the <i>from</i> element is missing, it is assumed to be
6265    as far backwards in time as there is data for; if the <i>to</i>
6266    element is missing, then it is from this point onwards, with no
6267    known end point.</p>
6268    <p>The dates and times are specified in local time, unless
6269    otherwise noted. (In particular, the metazone values are in UTC
6270    (also known as GMT).</p>
6271    <h4><a name="Text_Directionality" href="#Text_Directionality"
6272    id="Text_Directionality">5.3.2 Text Directionality</a></h4>
6273    <p>The content of certain elements, such as date or number
6274    formats, may consist of several sub-elements with an inherent
6275    order (for example, the year, month, and day for dates). In
6276    some cases, the order of these sub-elements may be changed
6277    depending on the bidirectional context in which the element is
6278    embedded.</p>
6279    <p>For example, short date formats in languages such as Arabic
6280    may contain neutral or weak characters at the beginning or end
6281    of the element content. In such a case, the overall order of
6282    the sub-elements may change depending on the surrounding
6283    text.</p>
6284    <p>Element content whose display may be affected in this way
6285    should include an explicit direction mark, such as U+200E
6286    LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK, at the
6287    beginning or end of the element content, or both.</p>
6288    <h4><a name="Unicode_Sets" href="#Unicode_Sets" id=
6289    "Unicode_Sets">5.3.3 Unicode Sets</a></h4>
6290    <p>Some attribute values or element contents use
6291    <em>UnicodeSet</em> notation. A UnicodeSet represents a finite
6292    set of Unicode code points and strings, and is defined by lists
6293    of code points and strings, Unicode property sets, and set
6294    operators, all bounded by square brackets. In this context, a
6295    code point means a string consisting of exactly one code
6296    point.</p>
6297    <p>A UnicodeSet implements the semantics in <i>UTS #18: Unicode
6298    Regular Expressions</i> [<a href=
6299    "https://www.unicode.org/reports/tr41/#UTS18">UTS18</a>] Levels
6300    1 &amp; 2 that are relevant to determining sets of characters.
6301    Note however that it may deviate from the syntax provided in
6302    [<a href=
6303    "https://www.unicode.org/reports/tr41/#UTS18">UTS18</a>], which
6304    is illustrative rather than a requirement. There is one
6305    exception to the supported semantics, Section <a href=
6306    "https://unicode.org/reports/tr18/#RL2.6">RL2.6</a>
6307    <em>Wildcards in Property Values</em>. That feature can be
6308    supported in clients such as ICU by implementing a “hook” as is
6309    done in the <a href=
6310    "https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bname%3D%2FAPPLE%2F%7D">
6311    online UnicodeSet utilities</a>.</p>
6312    <p>A UnicodeSet may be cited in specifications outside of the
6313    domain of LDML. In such a case, the specification may specify a
6314    subset of the syntax provided here.</p>
6315    <p>The following provides EBNF syntax for a UnicodeSet:</p>
6316    <div align='center'>
6317      <table class='simple'>
6318        <tr>
6319          <th>Symbol</th>
6320          <th>Expression</th>
6321          <th>Examples</th>
6322        </tr>
6323        <tr>
6324          <th>root</th>
6325          <td><code>= prop<br>
6326          | '[-]'<br>
6327          | '[' [\-\^]? s seq+ ']'</code></td>
6328          <td>\p{x=y},<br>
6329          [abc]</td>
6330        </tr>
6331        <tr>
6332          <th>seq</th>
6333          <td><code>= root (s [\&amp;\-] s root)* s<br>
6334          | range s</code></td>
6335          <td>[abc]-[cde], a<br></td>
6336        </tr>
6337        <tr>
6338          <th>range</th>
6339          <td><code>= char ('-' char)?<br>
6340          | '{' (s char)+ s '}'</code></td>
6341          <td>a, a-c, {abc}</td>
6342        </tr>
6343        <tr>
6344          <th>prop</th>
6345          <td><code>= '\' [pP] '{' propName ([≠=] s value1+)?
6346          '}'<br>
6347          | '[:' '^'? propName ([≠=] s value2+)? ':]'</code></td>
6348          <td>\p{x=y}, [:x=y:]<br></td>
6349        </tr>
6350        <tr>
6351          <th>propName</th>
6352          <td><code>= s [A-Za-z0-9] [A-Za-z0-9_\x20]* s</code></td>
6353          <td>General_Category,<br>
6354          General Category</td>
6355        </tr>
6356        <tr>
6357          <th>value1</th>
6358          <td><code>= [^\}]<br>
6359          | '\' quoted</code></td>
6360          <td>Lm,<br>
6361          \n,<br>
6362          \}</td>
6363        </tr>
6364        <tr>
6365          <th>value2</th>
6366          <td><code>= [^:]<br>
6367          | '\' quoted</code></td>
6368          <td>Lm,<br>
6369          \n,<br>
6370          \:</td>
6371        </tr>
6372        <tr>
6373          <th>char</th>
6374          <td><code>= [^\&amp; \- \[ \[ \] \\ \} \{ [:Pat_WS:]]<br>
6375          | '\' quoted</code></td>
6376          <td>a, b, c, \n</td>
6377        </tr>
6378        <tr>
6379          <th>quoted</th>
6380          <td><code>= 'u' (hex{4} | bracketedHex)<br>
6381          | 'x' (hex{2} | bracketedHex)<br>
6382          | 'U00' ('0' hex{5} | '10' hex{4})<br>
6383          | 'N{' propName '}'<br>
6384          | [[\u0000-\U00010FFFF]-[uxUN]]</code></td>
6385          <td><em><strong>error</strong> if lengths not exact</em></td>
6386        </tr>
6387        <tr>
6388          <th>charName</th>
6389          <td><code>= s [A-Za-z0-9] [-A-Za-z0-9_\x20]* s</code></td>
6390          <td>TIBETAN LETTER -A</td>
6391        </tr>
6392        <tr>
6393          <th>bracketedHex</th>
6394          <td><code>= '{' s hexCodePoint (s hexCodePoint)* s
6395          '}'</code></td>
6396          <td>{61 2019 62}</td>
6397        </tr>
6398        <tr>
6399          <th>hexCodePoint</th>
6400          <td><code>= hex{1,5} | '10' hex{4}</code></td>
6401          <td>&nbsp;</td>
6402        </tr>
6403        <tr>
6404          <th>hex</th>
6405          <td><code>= [0-9A-Fa-f]</code></td>
6406          <td>&nbsp;</td>
6407        </tr>
6408        <tr>
6409          <th>s</th>
6410          <td><code>= [:Pattern_White_Space:]*</code></td>
6411          <td>optional whitespace</td>
6412        </tr>
6413      </table>
6414    </div>
6415    <p>Some constraints on UnicodeSet syntax are not captured by
6416    this EBNF. Notably, property names and values are restricted to
6417    those supported by the implementation, and have additional constraints imposed by
6418    [<a href="https://unicode.org/reports/tr41/#UAX44">UAX44</a>]. In addition, quoted
6419    values that resolve to more than one code point are disallowed in ranges of the form
6420    <code>char '-' char</code>.</p>
6421    <p>The syntax characters are listed in the table below:</p>
6422    <table>
6423      <tbody>
6424        <tr>
6425          <th>Char</th>
6426          <th>Hex</th>
6427          <th>Name</th>
6428          <th>Usage</th>
6429        </tr>
6430        <tr>
6431          <td>$</td>
6432          <td>U+0024</td>
6433          <td>DOLLAR SIGN</td>
6434          <td>Equivalent of \uFFFF (This is for implementations
6435          that return \uFFFF when accessing before the first or
6436          after the last character)</td>
6437        </tr>
6438        <tr>
6439          <td>&amp;</td>
6440          <td>U+0026</td>
6441          <td>AMPERSAND</td>
6442          <td>Intersecting UnicodeSets</td>
6443        </tr>
6444        <tr>
6445          <td>-</td>
6446          <td>U+002D</td>
6447          <td>HYPHEN-MINUS</td>
6448          <td>Ranges of characters; also set difference.</td>
6449        </tr>
6450        <tr>
6451          <td>:</td>
6452          <td>U+003A</td>
6453          <td>COLON</td>
6454          <td>POSIX-style property syntax</td>
6455        </tr>
6456        <tr>
6457          <td>[</td>
6458          <td>U+005B</td>
6459          <td>LEFT SQUARE BRACKET</td>
6460          <td>Grouping; POSIX property syntax</td>
6461        </tr>
6462        <tr>
6463          <td>]</td>
6464          <td>U+005D</td>
6465          <td>RIGHT SQUARE BRACKET</td>
6466          <td>Grouping; POSIX property syntax</td>
6467        </tr>
6468        <tr>
6469          <td>\</td>
6470          <td>U+005C</td>
6471          <td>REVERSE SOLIDUS</td>
6472          <td>Escaping</td>
6473        </tr>
6474        <tr>
6475          <td>^</td>
6476          <td>U+005E</td>
6477          <td>CIRCUMFLEX ACCENT</td>
6478          <td>Posix negation syntax</td>
6479        </tr>
6480        <tr>
6481          <td>{</td>
6482          <td>U+007B</td>
6483          <td>LEFT CURLY BRACKET</td>
6484          <td>Strings in set; Perl property syntax</td>
6485        </tr>
6486        <tr>
6487          <td>}</td>
6488          <td>U+007D</td>
6489          <td>RIGHT CURLY BRACKET</td>
6490          <td>Strings in set; Perl property syntax</td>
6491        </tr>
6492        <tr>
6493          <td>&nbsp;</td>
6494          <td>U+0020 U+0009..U+000D U+0085<br>
6495          U+200E U+200F<br>
6496          U+2028 U+2029</td>
6497          <td>ASCII whitespace,<br>
6498          LRM, RLM,<br>
6499          LINE/PARAGRAPH SEPARATOR</td>
6500          <td>Ignored except when escaped</td>
6501        </tr>
6502      </tbody>
6503    </table><br>
6504    <h5><a href="#Lists_of_Code_Points" name="Lists_of_Code_Points"
6505    id="Lists_of_Code_Points">5.3.3.1 Lists of Code Points</a></h5>
6506    <p>Lists are a sequence of strings that may include ranges,
6507    which are indicated by a '-' between two code points, as in
6508    "a-z". The sequence <em>start-end</em> specifies the range of
6509    all code points from the start to end, inclusive, in Unicode
6510    order. For example, <b>[a c d-f m]</b> is equivalent to <b>[a c
6511    d e f m]</b>. Whitespace can be freely used for clarity, as
6512    <b>[a c d-f m]</b> means the same as <b>[acd-fm]</b>.</p>
6513    <p>A string with multiple code points is represented in a list
6514    by being surrounded by curly braces, such as in <strong>[a-z
6515    {ch}]</strong>. It can be used with the range notation, as
6516    described in <em>Section <a href="#String_Range">5.3.4 String
6517    Range</a></em> . There is an additional restriction on string
6518    ranges in a UnicodeSet: the number of codepoints in the first
6519    string of the range must be identical to the number in the
6520    second. Thus [{ab}-{c}] and [{ab}-c] are invalid.</p>
6521    <p>In UnicodeSets, there are two ways to quote syntax code
6522    points:</p>
6523    <p><a name="Backslash_Escapes" id=
6524    "Backslash_Escapes"></a>Outside of single quotes, certain
6525    backslashed code point sequences can be used to quote code
6526    points:</p>
6527    <table class='simple'>
6528      <tr>
6529        <td>\x{h...h}<br>
6530        \u{h...h}</td>
6531        <td>list of 1-6 hex digits ([0-9A-Fa-f]), separated by
6532        spaces</td>
6533      </tr>
6534      <tr>
6535        <td>\xhh</td>
6536        <td>2 hex digits</td>
6537      </tr>
6538      <tr>
6539        <td>\uhhhh</td>
6540        <td>Exactly 4 hex digits</td>
6541      </tr>
6542      <tr>
6543        <td>\Uhhhhhhhh</td>
6544        <td>Exactly 8 hex digits</td>
6545      </tr>
6546      <tr>
6547        <td>\a</td>
6548        <td>U+0007 (BEL / ALERT)</td>
6549      </tr>
6550      <tr>
6551        <td>\b</td>
6552        <td>U+0008 (BACKSPACE)</td>
6553      </tr>
6554      <tr>
6555        <td>\t</td>
6556        <td>U+0009 (TAB / CHARACTER TABULATION)</td>
6557      </tr>
6558      <tr>
6559        <td>\n</td>
6560        <td>U+000A (LINE FEED)</td>
6561      </tr>
6562      <tr>
6563        <td>\v</td>
6564        <td>U+000B (LINE TABULATION)</td>
6565      </tr>
6566      <tr>
6567        <td>\f</td>
6568        <td>U+000C (FORM FEED)</td>
6569      </tr>
6570      <tr>
6571        <td>\r</td>
6572        <td>U+000D (CARRIAGE RETURN)</td>
6573      </tr>
6574      <tr>
6575        <td>\\</td>
6576        <td>U+005C (BACKSLASH / REVERSE SOLIDUS)</td>
6577      </tr>
6578      <tr>
6579        <td>\N{name}</td>
6580        <td>The Unicode code point named "name".</td>
6581      </tr>
6582      <tr>
6583        <td>\p{…},\P{…}</td>
6584        <td>Unicode property (see below)</td>
6585      </tr>
6586    </table><br>
6587    <p>Anything else following a backslash is mapped to itself,
6588    except the property syntax described below, or in an
6589    environment where it is defined to have some special
6590    meaning.</p>
6591    <p>Any code point formed as the result of a backslash escape
6592    loses any special meaning and is treated as a literal. In
6593    particular, note that \x, \u and \U escapes create literal code
6594    points. (In contrast, Java treats Unicode escapes as just a way
6595    to represent arbitrary code points in an ASCII source file, and
6596    any resulting code points are <i><b>not</b></i> tagged as
6597    literals.)</p>
6598    <p>Unicode property sets are defined as described as described
6599    in <i>UTS #18: Unicode Regular Expressions</i> [<a href=
6600    "https://www.unicode.org/reports/tr41/#UTS18">UTS18</a>], Level
6601    1 and RL2.5, including the syntax where given. For an example
6602    of a concrete implementation of this, see [<a href=
6603    "#ICUUnicodeSet">ICUUnicodeSet</a>].</p>
6604    <h5><a href="#Unicode_Properties" name="Unicode_Properties" id=
6605    "Unicode_Properties">5.3.3.2 Unicode Properties</a></h5>
6606    <p>Briefly, Unicode property sets are specified by any Unicode
6607    property and a value of that property, such as
6608    <b>[:General_Category=Letter:]</b>. for Unicode letters or
6609    <b>\p{uppercase}</b> is the set of upper case letters in
6610    Unicode. The property names are defined by the
6611    PropertyAliases.txt file and the property values by the
6612    PropertyValueAliases.txt file. For more information, see
6613    [<a href="https://unicode.org/reports/tr41/#UAX44">UAX44</a>].
6614    The syntax for specifying the property sets is an extension of
6615    either POSIX or Perl syntax, by the addition of
6616    "=&lt;value&gt;". For example, you can match letters by using
6617    the POSIX-style syntax:</p>
6618    <p><b>[:General_Category=Letter:]</b></p>
6619    <p>or by using the Perl-style syntax</p>
6620    <p><b>\p{General_Category=Letter}</b>.</p>
6621    <p>Property names and values are case-insensitive, and
6622    whitespace, "-", and "_" are ignored. The property name can be
6623    omitted for the <strong>General_Category</strong> and
6624    <strong>Script</strong> properties, but is required for other
6625    properties. If the property value is omitted, it is assumed to
6626    represent a boolean property with the value "true". Thus
6627    <b>[:Letter:]</b> is equivalent to
6628    <b>[:General_Category=Letter:]</b>, and <b>[:Wh-ite-s
6629    pa_ce:]</b> is equivalent to <b>[:Whitespace=true:]</b>.</p>
6630    <p>The table below shows the two kinds of syntax: POSIX and
6631    Perl style. Also, the table shows the "Negative" version, which
6632    is a property that excludes all code points of a given kind.
6633    For example, <b>[:^Letter:]</b> matches all code points that
6634    are not <b>[:Letter:]</b>.</p>
6635    <table>
6636      <tr>
6637        <th>&nbsp;</th>
6638        <th>Positive</th>
6639        <th>Negative</th>
6640      </tr>
6641      <tr>
6642        <td>POSIX-style Syntax</td>
6643        <td>[:type=value:]</td>
6644        <td>[:^type=value:]</td>
6645      </tr>
6646      <tr>
6647        <td>Perl-style Syntax</td>
6648        <td>\p{type=value}</td>
6649        <td>\P{type=value}</td>
6650      </tr>
6651    </table>
6652    <h5><a href="#Boolean_Operations" name="Boolean_Operations" id=
6653    "Boolean_Operations">5.3.3.3 Boolean Operations</a></h5>
6654    <p>The low-level lists or properties then can be freely
6655    combined with the normal set operations (union, inverse,
6656    difference, and intersection):</p>
6657    <ul>
6658      <li>To union two sets, simply concatenate them. For example,
6659      <b>[[:letter:] [:number:]]</b></li>
6660      <li>To intersect two sets, use the '&amp;' operator. For
6661      example, <b>[[:letter:] &amp; [a-z]]</b></li>
6662      <li>To take the set-difference of two sets, use the '-'
6663      operator. For example, <b>[[:letter:] - [a-z]]</b></li>
6664      <li>To invert a set, place a '^' immediately after the
6665      opening '['. For example, <b>[^a-z]</b>. In any other
6666      location, the '^' does not have a special meaning. The
6667      inversion [^X] is equivalent to [[\x{0}-\x{10FFFF}]-[X]].
6668      Thus multi-code point strings are discarded.</li>
6669      <li>Symmetric difference (~) is not supported.</li>
6670    </ul>
6671    <p>The binary operators '&amp;', '-', and the implicit union
6672    have equal precedence and bind left-to-right. Thus
6673    <b>[[:letter:]-[a-z]-[\u0100-\u01FF]]</b> is equal to
6674    <b>[[[:letter:]-[a-z]]-[\u0100-\u01FF]]</b>. Another example is
6675    the set <b>[[ace][bdf] - [abc][def]]</b>, which is not the
6676    empty set, but instead equal to <b>[[[[ace] [bdf]] - [abc]]
6677    [def]]</b>, which equals <b>[[[abcdef] - [abc]] [def]]</b>,
6678    which equals <b>[[def] [def]]</b>, which equals
6679    <b>[def]</b>.</p>
6680    <p><strong>One caution:</strong> the '&amp;' and '-' operators
6681    operate between sets. That is, they must be immediately
6682    preceded and immediately followed by a set. For example, the
6683    pattern <b>[[:Lu:]-A]</b> is illegal, since it is interpreted
6684    as the set <b>[:Lu:]</b> followed by the incomplete range
6685    <b>-A</b>. To specify the set of upper case letters except for
6686    'A', enclose the 'A' in brackets: <b>[[:Lu:]-[A]]</b>.</p>
6687    <h5><a href="#UnicodeSet_Examples" name="UnicodeSet_Examples"
6688    id="UnicodeSet_Examples">5.3.3.4 UnicodeSet Examples</a></h5>
6689    <p>The following table summarizes the syntax that can be
6690    used.</p>
6691    <table style="margin-top: 0.5em; margin-bottom: 0.5em" id=
6692    "table18">
6693      <tr>
6694        <th>Example</th>
6695        <th>Description</th>
6696      </tr>
6697      <tr>
6698        <td nowrap>[a]</td>
6699        <td>The set containing 'a' alone</td>
6700      </tr>
6701      <tr>
6702        <td nowrap>[a-z]</td>
6703        <td>The set containing 'a' through 'z' and all letters in
6704        between, in Unicode order.<br>
6705        Thus it is the same as [\u0061-\u007A].</td>
6706      </tr>
6707      <tr>
6708        <td nowrap>[^a-z]</td>
6709        <td>The set containing all code points but 'a' through
6710        'z'.<br>
6711        Thus it is the same as [\u0000-\u0060
6712        \u007B-\x{10FFFF}].</td>
6713      </tr>
6714      <tr>
6715        <td nowrap>[[pat1][pat2]]</td>
6716        <td>The union of sets specified by pat1 and pat2</td>
6717      </tr>
6718      <tr>
6719        <td nowrap>[[pat1]&amp;[pat2]]</td>
6720        <td>The intersection of sets specified by pat1 and
6721        pat2</td>
6722      </tr>
6723      <tr>
6724        <td nowrap>[[pat1]-[pat2]]</td>
6725        <td>The asymmetric difference of sets specified by pat1 and
6726        pat2</td>
6727      </tr>
6728      <tr>
6729        <td nowrap>[a {ab} {ac}]</td>
6730        <td>The code point 'a' and the multi-code point strings
6731        "ab" and "ac"</td>
6732      </tr>
6733      <tr>
6734        <td nowrap>[x\u{61 2019 62}y]</td>
6735        <td>Equivalent to [x\u0061\u2019\u0062y] (= [xa’by])</td>
6736      </tr>
6737      <tr>
6738        <td nowrap>[{ax}-{bz}]</td>
6739        <td>The set containing [{ax} {ay} {az} {bx} {by} {bz}],
6740        using the range syntax to get all the strings from {ax} to
6741        {bz} as described in <em>Section <a href=
6742        "#String_Range">5.3.4 String Range</a></em>.</td>
6743      </tr>
6744      <tr>
6745        <td nowrap>[:Lu:]</td>
6746        <td>The set of code points with a given property value, as
6747        defined by PropertyValueAliases.txt. In this case, these
6748        are the Unicode upper case letters. The long form for this
6749        is <b>[:General_Category=Uppercase_Letter:]</b>.</td>
6750      </tr>
6751      <tr>
6752        <td nowrap>[:L:]</td>
6753        <td>The set of code points belonging to all Unicode
6754        categories starting with 'L', that is,
6755        <b>[[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]]</b>. The long form for
6756        this is <b>[:General_Category=Letter:]</b>.</td>
6757      </tr>
6758    </table><br>
6759    <h4><a name="String_Range" href="#String_Range" id=
6760    "String_Range">5.3.4 String Range</a></h4>
6761    <p>A String Range is a compact format for specifying a list of
6762    strings.</p>
6763    <p><strong>Syntax:<br></strong></p>
6764    <blockquote>
6765      <p>X <em>sep</em> Y<br></p>
6766    </blockquote>
6767    <p>The separator and the format of strings X, Y may vary
6768    depending on the domain. For example,</p>
6769    <ul>
6770      <li>for the validity files the separator is ~,</li>
6771      <li>for UnicodeSet the separator is -, and any
6772      multi-codepoint string is enclosed in {…}.</li>
6773    </ul>
6774    <p><strong>Validity:&nbsp;<br></strong></p>
6775    <blockquote>
6776      <p>A string range X <em>sep</em> Y is valid iff len(X) ≥
6777      len(Y) &gt; 0, where len(X) is the length of X in code
6778      points.</p>
6779      <p><em>There may be additional, domain-specific requirements
6780      for validity of the expansion of the string range.</em></p>
6781    </blockquote>
6782    <p><strong>Interpretation:<br></strong></p>
6783    <ol>
6784      <li>Break X into P and S, where len(S) = len(Y)
6785        <ul>
6786          <li>Note that P will be an empty string if the lengths of
6787          X and Y are equal.</li>
6788        </ul>
6789      </li>
6790      <li>Form the combinations of all
6791      P+(s₀..y₀)+(s₁..y₁)+...(sₙ..yₙ)
6792        <ul>
6793          <li>s₀ is the first code point in S, etc.</li>
6794        </ul>
6795      </li>
6796    </ol>
6797    <p><strong>Examples:</strong></p>
6798    <table>
6799      <tbody>
6800        <tr>
6801          <td>ab-ad</td>
6802          <td>→</td>
6803          <td>ab ac ad</td>
6804        </tr>
6805        <tr>
6806          <td>ab-d</td>
6807          <td>→</td>
6808          <td>ab ac ad</td>
6809        </tr>
6810        <tr>
6811          <td>ab-cd</td>
6812          <td>→</td>
6813          <td>ab ac ad bb bc bd cb cc cd</td>
6814        </tr>
6815        <tr>
6816          <td>����-����</td>
6817          <td>→</td>
6818          <td>���� ���� ���� ���� ����</td>
6819        </tr>
6820        <tr>
6821          <td>����-��</td>
6822          <td>→</td>
6823          <td>���� ���� ���� ���� ����</td>
6824        </tr>
6825      </tbody>
6826    </table><br>
6827    <h3><a name="Identity_Elements" href="#Identity_Elements" id=
6828    "Identity_Elements">5.4 Identity Elements</a></h3>
6829    <p class="dtd">&lt;!ELEMENT identity (alias | (version,
6830    generation?, language, script?, territory?, variant?, special*)
6831    ) &gt;</p>
6832    <p>The identity element contains information identifying the
6833    target locale for this data, and general information about the
6834    version of this data.</p>
6835    <p class="element2">&lt;version number="<u>$</u>Revision: 1.227
6836    <u>$</u>"&gt;</p>
6837    <p>The version element provides, in an attribute, the version
6838    of this file.&nbsp; The contents of the element can contain
6839    textual notes about the changes between this version and the
6840    last. For example:</p>
6841    <blockquote>
6842      <pre>&lt;version number="<span style=
6843      "color: blue">1.1</span>"&gt;<span style=
6844      "color: blue">Various notes and changes in version 1.1</span>&lt;/version&gt;</pre>
6845      <p>This is not to be confused with the version attribute on
6846      the ldml element, which tracks the dtd version.</p>
6847    </blockquote>
6848    <p class="element2">&lt;generation date="<u>$</u>Date:
6849    2007/07/17 23:41:16 <u>$</u>" /&gt;</p>
6850    <p>The generation element is now deprecated. It was used to
6851    contain the last modified date for the data. This could be in
6852    two formats: ISO 8601 format, or CVS format (illustrated by the
6853    example above).</p>
6854    <p class="element2">&lt;language type="<span style=
6855    "color: blue">en</span>"/&gt;</p>
6856    <p>The language code is the primary part of the specification
6857    of the locale id, with values as described above.</p>
6858    <p class="element2">&lt;script type="<span style=
6859    "color: blue">Latn</span>" /&gt;</p>
6860    <p>The script code may be used in the identification of written
6861    languages, with values described above.</p>
6862    <p class="element2">&lt;territory type="<span style=
6863    "color: blue">US</span>"/&gt;</p>
6864    <p>The territory code is a common part of the specification of
6865    the locale id, with values as described above.</p>
6866    <p class="element2">&lt;variant type="<span class=
6867    "attributeValue">NYNORSK</span>"/&gt;</p>
6868    <p>The variant code is the tertiary part of the specification
6869    of the locale id, with values as described above.</p>
6870    <p>When combined according to the rules described in
6871    <i><a href="#Unicode_Language_and_Locale_Identifiers">Section
6872    3, Unicode Language and Locale Identifiers</a></i>, the
6873    language element, along with any of the optional script,
6874    territory, and variant elements, must identify a known, stable
6875    locale identifier. Otherwise, it is an error.</p>
6876    <h3><a name="Valid_Attribute_Values" href=
6877    "#Valid_Attribute_Values" id="Valid_Attribute_Values">5.5 Valid
6878    Attribute Values</a></h3>
6879	      <p>The <a href="#DTD_Annotations">DTD Annotations</a> in Section 5.7 are used to determine whether elements, attributes, or attribute values are valid (or deprecated).</p>
6880
6881    <h3><a name="Canonical_Form" href="#Canonical_Form" id=
6882    "Canonical_Form">5.6 Canonical Form</a></h3>
6883    <p>The following are restrictions on the format of LDML files
6884    to allow for easier parsing and comparison of files.</p>
6885    <p>Peer elements have consistent order. That is, if the DTD or
6886    this specification requires the following order in an element
6887    <strong>foo</strong>:</p>
6888    <pre>&lt;foo&gt;
6889  &lt;pattern&gt;
6890  &lt;somethingElse&gt;
6891&lt;/foo&gt;</pre>
6892    <p>It can never require the reverse order in a different
6893    element <strong>bar</strong>.</p>
6894    <pre>&lt;bar&gt;
6895  &lt;somethingElse&gt;
6896  &lt;pattern&gt;
6897&lt;/bar&gt;</pre>
6898    <p>Note that there was one case that had to be corrected in
6899    order to make this true. For that reason, pattern occurs twice
6900    under currency:</p>
6901    <pre class="dtd">
6902    &lt;!ELEMENT currency (alias | (pattern*, displayName?, symbol?, pattern*,
6903decimal?, group?, special*)) &gt;</pre>
6904    <p><a href="https://www.w3.org/TR/REC-xml/">XML</a> files can
6905    have a wide variation in textual form, while representing
6906    precisely the same data. By putting the LDML files in the
6907    repository into a canonical form, this allows us to use the
6908    simple diff tools used widely (and in CVS) to detect
6909    differences when vetting changes, without those tools being
6910    confused. This is not a requirement on other uses of LDML; just
6911    simply a way to manage repository data more easily.</p>
6912    <h4><a name="Content" href="#Content" id="Content">5.6.1
6913    Content</a></h4>
6914    <ol>
6915      <li>All start elements are on their own line, indented by
6916      <i>depth</i> tabs.</li>
6917      <li>All end elements (except for leaf nodes) are on their own
6918      line, indented by <i>depth</i> tabs.</li>
6919      <li>Any leaf node with empty content is in the form
6920      &lt;foo/&gt;.</li>
6921      <li>There are no blank lines except within comments or
6922      content.</li>
6923      <li>Spaces are used within a start element. There are no
6924      extra spaces within elements.
6925        <ul>
6926          <li><code>&lt;version number="1.2"/&gt;</code>, not
6927          <code>&lt;version&nbsp; number = "1.2" /&gt;</code></li>
6928          <li><code>&lt;/identity&gt;</code>, not
6929          <code>&lt;/identity &gt;</code></li>
6930        </ul>
6931      </li>
6932      <li>All attribute values use double quote ("), not single
6933      (').</li>
6934      <li>There are no CDATA sections, and no escapes except those
6935      absolutely required.
6936        <ul>
6937          <li>no &amp;apos; since it is not necessary</li>
6938          <li>no '&amp;#x61;', it would be just 'a'</li>
6939        </ul>
6940      </li>
6941      <li>All attributes with defaulted values are suppressed.</li>
6942      <li>The draft and alt="proposed.*" attributes are only on
6943      leaf elements.</li>
6944      <li>The tzid are canonicalized in the following way:
6945        <ol>
6946          <li type="a">All tzids as of as CLDR 1.1 (2004.06.08) in
6947          zone.tab are canonical.</li>
6948          <li>After that point, the first time a tzid is
6949          introduced, that is the canonical form.</li>
6950        </ol>
6951        <p>That is, new IDs are added, but existing ones keep the
6952        original form. The <i>TZ</i> timezone database keeps a set
6953        of equivalences in the "backward" file. These are used to
6954        map other tzids to the canonical form. For example, when
6955        <code>America/Argentina/Catamarca</code> was introduced as
6956        the new name for the previous
6957        <code>America/Catamarca</code> , a link was added in the
6958        backward file.</p>
6959        <p><code>Link America/Argentina/Catamarca
6960        America/Catamarca</code></p>
6961      </li>
6962    </ol>
6963    <p><i>Example:</i></p>
6964    <pre>&lt;ldml draft="unconfirmed" &gt;
6965        &lt;identity&gt;
6966                &lt;version number="1.2"/&gt;
6967                &lt;language type="en"/&gt;
6968                &lt;territory type="AS"/&gt;
6969        &lt;/identity&gt;
6970        &lt;numbers&gt;
6971                &lt;currencyFormats&gt;
6972                        &lt;currencyFormatLength&gt;
6973                                &lt;currencyFormat&gt;
6974                                        &lt;pattern&gt;¤#,##0.00;(¤#,##0.00)&lt;/pattern&gt;
6975                                &lt;/currencyFormat&gt;
6976                        &lt;/currencyFormatLength&gt;
6977                &lt;/currencyFormats&gt;
6978        &lt;/numbers&gt;
6979&lt;/ldml&gt;</pre>
6980    <h4><a name="Ordering" href="#Ordering" id="Ordering">5.6.2
6981    Ordering</a></h4>
6982    <p>An element is ordered first by the element name, and then if
6983    the element names are identical, by the sorted set of
6984    attribute-value pairs. For the latter, compare the first pair
6985    in each (in sorted order by attribute pair). If not identical,
6986    go to the second pair, and so on.</p>
6987    <p>Elements and attributes are ordered according to their order
6988    in the respective DTDs. Attribute value comparison is a bit
6989    more complicated, and may depend on the attribute and type.
6990    This is currently done with specific ordering tables.</p>
6991    <p>Any future additions to the DTD must be structured so as to
6992    allow compatibility with this ordering. See also <a href=
6993    "#Valid_Attribute_Values">Section 5.5 Valid Attribute
6994    Values.</a></p>
6995    <h4><a name="Comments" href="#Comments" id="Comments">5.6.3
6996    Comments</a></h4>
6997    <ol>
6998      <li>Comments are of the form &lt;!-- <i>stuff</i>
6999      --&gt;.</li>
7000      <li>They are logically attached to a node. There are 4 kinds:
7001        <ol>
7002          <li>Inline always appear after a leaf node, on the same
7003          line at the end. These are a single line.</li>
7004          <li>Preblock comments always precede the attachment node,
7005          and are indented on the same level.</li>
7006          <li>Postblock comments always follow the attachment node,
7007          and are indented on the same level.</li>
7008          <li>Final comment, after &lt;/ldml&gt;</li>
7009        </ol>
7010      </li>
7011      <li>Multiline comments (except the final comment) have each
7012      line after the first indented to one deeper level.</li>
7013    </ol>
7014    <p><b>Examples:</b></p>
7015    <pre>&lt;eraAbbr&gt;
7016        &lt;era type="0"&gt;BC&lt;/era&gt; &lt;!-- might add alternate BDE in the future --&gt;
7017...
7018&lt;timeZoneNames&gt;
7019        &lt;!-- Note: zones that do not use daylight time need further work --&gt;
7020        &lt;zone type="America/Los_Angeles"&gt;
7021        ...
7022        &lt;!-- Note: the following is known to be sparse,
7023                and needs to be improved in the future --&gt;
7024        &lt;zone type="Asia/Jerusalem"&gt;</pre>
7025    <h3><a name="DTD_Annotations" href="#DTD_Annotations" id=
7026    "DTD_Annotations">5.7 DTD Annotations</a></h3>
7027    <p>The information in a standard DTD is insufficient for use in
7028    CLDR. To make up for that, DTD annotations are added. These are
7029    of the form<br>
7030    &lt;!--@...--&gt;<br>
7031    and are included below the !ELEMENT or !ATTLIST line that they
7032    apply to. The current annotations are:</p>
7033    <table>
7034      <tr>
7035        <th>Type</th>
7036        <th>Description</th>
7037      </tr>
7038      <tr>
7039        <td>&lt;!--@VALUE--&gt;</td>
7040        <td>The attribute is not distinguishing, and is treated
7041        like an element value</td>
7042      </tr>
7043      <tr>
7044        <td>&lt;!--@METADATA--&gt;</td>
7045        <td>The attribute is a “comment” on the data, like the
7046        draft status. It is not typically used in
7047        implementations.</td>
7048      </tr>
7049      <tr>
7050        <td>&lt;!--@ORDERED--&gt;</td>
7051        <td>The element's children are ordered, and do not
7052        inherit.</td>
7053      </tr>
7054      <tr>
7055        <td>&lt;!--@DEPRECATED--&gt;</td>
7056        <td>The element or attribute is deprecated, and should not
7057        be used.</td>
7058      </tr>
7059      <tr>
7060        <td>&lt;!--@DEPRECATED: attribute-value1,
7061        attribute-value2--&gt;</td>
7062        <td>The attribute values are deprecated, and should not be
7063        used. Spaces between tokens are not significant.</td>
7064      </tr>
7065      <tr>
7066        <td>&lt;!--@MATCH:{attribute value constraint}--&gt;</td>
7067        <td>Requires the attribute value to match the constraint.</td>
7068      </tr>
7069    </table>
7070    <p>There is additional information in the
7071    attributeValueValidity.xml file that is used internally for
7072    testing. For example, the following line indicates that the
7073    'currency' element in the ldml dtd must have values from the
7074    bcp47 'cu' type.</p>
7075    <p class='example'>&lt;attributeValues dtds='ldml'
7076    elements='currency'
7077    attributes='type'&gt;$_bcp47_cu&lt;/attributeValues&gt;</p>
7078    <p>The element values may be literals, regular expressions, or
7079    variables (some of which are set programmatically according to
7080    other CLDR data, such as the above. However, the information as
7081    this point does not cover all attribute values, is used only
7082    for testing, and should not be used in implementations since
7083    the structure may change without notice.</p>
7084    <h4>5.7.1<a href="#match_expressions" name="match_expressions">Attribute Value Constraints</a></h4>
7085    <p>The following are constraints on the attribute values. Note: in future versions, the format may change, and/or the constaints may be tightened.</p>
7086    <table class='simple'>
7087      <tbody>
7088        <tr>
7089          <th>Constraint</th>
7090          <th colspan="2">Comments</th>
7091        </tr>
7092        <tr>
7093          <td>any</td>
7094          <td colspan="2">any string value</td>
7095        </tr>
7096        <tr>
7097          <td>any/TODO</td>
7098          <td colspan="2">placeholder for future constraints</td>
7099        </tr>
7100        <tr>
7101          <td>bcp47/anykey</td>
7102          <td colspan="2">any bcp47 key or tkey</td>
7103        </tr>
7104        <tr>
7105          <td>bcp47/anyvalue</td>
7106          <td colspan="2">any bcp47 value (type) or tvalue</td>
7107        </tr>
7108        <tr>
7109          <td>literal/{literal values}</td>
7110          <td colspan="2">comma separated</td>
7111        </tr>
7112        <tr>
7113          <td>regex/{regex expression}</td>
7114          <td colspan="2">valid regex expression</td>
7115        </tr>
7116        <tr>
7117          <td>bcp47/{key or tkey}</td>
7118          <td colspan="2">matches possible values for that key or tkey</td>
7119        </tr>
7120        <tr>
7121          <td>metazone</td>
7122          <td colspan="2">valid metazone</td>
7123        </tr>
7124        <tr>
7125          <td>range/{start_number~{end_number}}</td>
7126          <td colspan="2">number between (inclusive) start and end</td>
7127        </tr>
7128        <tr>
7129          <td>time/{time or date or date-time pattern}</td>
7130          <td colspan="2">eg HH:mm</td>
7131        </tr>
7132        <tr>
7133          <td>unicodeset/{unicodeset pattern}</td>
7134          <td colspan="2">valid unicodeset</td>
7135        </tr>
7136        <tr>
7137          <td rowspan="4">validity/{field}</td>
7138          <td colspan="2">currency, language, locale, region, script, subdivision, short-unit, unit, variant</td>
7139        </tr>
7140        <tr>
7141          <td colspan="2">The field can be qualified by particular enums, such as:</td>
7142        </tr>
7143        <tr>
7144          <td>validity/unit/regular deprecated</td>
7145          <td>matches only <em>deprecated</em> and <em>regular</em></td>
7146        </tr>
7147        <tr>
7148          <td>validity/unit/!deprecated</td>
7149          <td>matches all but <em>deprecated</em></td>
7150        </tr>
7151        <tr>
7152          <td>version</td>
7153          <td colspan="2">1 to 4 digit field version, such as 35.3.9</td>
7154        </tr>
7155        <tr>
7156          <td>set/{match}</td>
7157          <td colspan="2">set of elements that match {match}</td>
7158        </tr>
7159        <tr>
7160          <td>or/{match1}XX{match2}…</td>
7161          <td colspan="2">matches at least one of {match1}, etc</td>
7162        </tr>
7163      </tbody>
7164    </table><br>
7165    <h2><a name="Property_Data" href="#Property_Data" id=
7166    "Property_Data">6 Property Data</a></h2>
7167    <p>Some data in CLDR does not use an XML format, but rather a
7168    semicolon-delimited format derived from that of the Unicode
7169    Character Database. That is because the data is more likely to
7170    be parsed by implementations that already parse UCD data. Those
7171    files are present in the common/properties directory.</p>
7172    <p>Each file has a header that explains the format and usage of
7173    the data.</p>
7174    <h3><a name="Script_Metadata" href="#Script_Metadata" id=
7175    "Script_Metadata">6.1 Script Metadata</a></h3>
7176    <p><code>scriptMetadata.txt</code></p>
7177    <p>This file provides general information about scripts that
7178    may be useful to implementations processing text. The
7179    information is the best currently available, and may change
7180    between versions of CLDR. The format is similar to Unicode
7181    Character Database property file, and is documented in the
7182    header of the data file.</p>
7183    <h3><a name="Extended_Pictographic" href=
7184    "#Extended_Pictographic" id="Extended_Pictographic">6.2
7185    Extended Pictographic</a></h3>
7186    <p><code>ExtendedPictographic.txt</code></p>
7187    <p>This file was used to define the ExtendedPictographic data
7188    used for “future-proofing” emoji behavior, especially in
7189    segmentation. As of Emoji version 11.0, the set of
7190    Extended_Pictographic is incorporated into the emoji data files
7191    found at <a href=
7192    "https://unicode.org/Public/emoji/">unicode.org/Public/emoji/</a>.</p>
7193    <h3><a name="Labels.txt" href="#Labels.txt" id="Labels.txt">6.3
7194    Labels.txt</a></h3>
7195    <p><code>labels.txt</code></p>
7196    <p>This file provides general information about associations of
7197    labels to characters that may be useful to implementations of
7198    character-picking applications. The information is the best
7199    currently available, and may change between versions of CLDR.
7200    The format is similar to Unicode Character Database property
7201    file, and is documented in the header of the data file.</p>
7202    <p>Initially, the contents are focused on emoji, but may be
7203    expanded in the future to other types of characters. Note that
7204    a character may have multiple labels.</p>
7205    <h3><a name="Segmentation_Tests" href="#Segmentation_Tests">6.4
7206      Segmentation Tests</a></h3>
7207    <p>CLDR provides a tailoring to the <a href="https://unicode.org/reports/tr29/">Grapheme Cluster Break (gcb)</a> algorithm to avoid splitting Indic aksaras. The corresponding test files for that are located in common/properties/segments/, along with a readme.txt that provides more details. There are also specific test files for the supported Indic scripts in the unittest directory.</p>
7208    <h2><a name="Format_Parse_Issues" href="#Format_Parse_Issues"
7209    id="Format_Parse_Issues">7 Issues in Formatting and
7210    Parsing</a></h2>
7211    <h3><a name="Lenient_Parsing" href="#Lenient_Parsing" id=
7212    "Lenient_Parsing">7.1 Lenient Parsing</a></h3>
7213    <h4><a name="Motivation" href="#Motivation" id=
7214    "Motivation">7.1.1 Motivation</a></h4>
7215    <p>User input is frequently messy. Attempting to parse it by
7216    matching it exactly against a pattern is likely to be
7217    unsuccessful, even when the meaning of the input is clear to a
7218    human being. For example, for a date pattern of "MM/dd/yy", the
7219    input "June 1, 2006" will fail.</p>
7220    <p>The goal of lenient parsing is to accept user input whenever
7221    it is possible to decipher what the user intended. Doing so
7222    requires using patterns as data to guide the parsing process,
7223    rather than an exact template that must be matched. This
7224    informative section suggests some heuristics that may be useful
7225    for lenient parsing of dates, times, and numbers.</p>
7226    <h4><a name="Loose_Matching" href="#Loose_Matching" id=
7227    "Loose_Matching">7.1.2 Loose Matching</a></h4>
7228    <p>Loose matching ignores attributes of the strings being
7229    compared that are not important to matching. It involves the
7230    following steps:</p>
7231    <ul>
7232      <li>Remove "." from currency symbols and other fields used
7233      for matching, and also from the input string unless:
7234        <ul>
7235          <li>"." is in the decimal set, and</li>
7236          <li>its position in the input string is immediately
7237          before a decimal digit</li>
7238        </ul>
7239      </li>
7240      <li>Ignore all format characters: in particular, ignore any
7241      RLM, LRM or ALM used to control BIDI formatting.</li>
7242      <li>Ignore all characters in [:Zs:] unless they occur between
7243      letters. (In the heuristics below, even those between letters
7244      are ignored except to delimit fields)</li>
7245      <li>Map all characters in [:Dash:] to U+002D
7246      HYPHEN-MINUS</li>
7247      <li>Use the data in the &lt;character-fallback&gt; element to
7248      map equivalent characters (for example, curly to straight
7249      apostrophes). Other apostrophe-like characters should also be
7250      treated as equivalent, especially if the character actually
7251      used in a format may be unavailable on some keyboards. For
7252      example:
7253        <ul>
7254          <li>U+02BB MODIFIER LETTER TURNED COMMA (ʻ) might be
7255          typed instead as U+2018 LEFT SINGLE QUOTATION MARK
7256          (‘).</li>
7257          <li>U+02BC MODIFIER LETTER APOSTROPHE (ʼ) might be typed
7258          instead as U+2019 RIGHT SINGLE QUOTATION MARK (’), U+0027
7259          APOSTROPHE, etc.</li>
7260          <li>U+05F3 HEBREW PUNCTUATION GERESH (‎׳) might be typed
7261          instead as U+0027 APOSTROPHE.</li>
7262        </ul>
7263      </li>
7264      <li>Apply mappings particular to the domain (i.e., for dates
7265      or for numbers, discussed in more detail below)</li>
7266      <li>Apply case folding (possibly including language-specific
7267      mappings such as Turkish i)</li>
7268      <li>Normalize to NFKC; thus <i>no-break space</i> will map to
7269      <i>space</i>; half-width <i>katakana</i> will map to
7270      full-width.</li>
7271    </ul>
7272    <p>Loose matching involves (logically) applying the above
7273    transform to both the input text and to each of the field
7274    elements used in matching, before applying the specific
7275    heuristics below. For example, if the input number text is " -
7276    NA f. 1,000.00", then it is mapped to "-naf1,000.00" before
7277    processing. The currency signs are also transformed, so "NA f."
7278    is converted to "naf" for purposes of matching. As with other
7279    Unicode algorithms, this is a logical statement of the process;
7280    actual implementations can optimize, such as by applying the
7281    transform incrementally during matching.</p>
7282    <h3><a name="Invalid_Patterns" href="#Invalid_Patterns" id=
7283    "Invalid_Patterns">7.2 Handling Invalid Patterns</a></h3>
7284    <p>Processes sometimes encounter invalid number or date
7285    patterns, such as a number pattern with “¤¤¤¤¤” (valid pattern
7286    character but invalid length in current CLDR), a date pattern
7287    with “nn” (invalid pattern character in current CLDR), or a
7288    date pattern with “MMMMMM” (invalid length in current CLDR).
7289    The recommended behavior for handling such an invalid pattern
7290    field is:</p>
7291    <ul>
7292      <li>For a field using a currently-invalid length for a valid
7293      pattern character:
7294        <ul>
7295          <li>In <strong>formatting,</strong> emit U+FFFD
7296          REPLACEMENT CHARACTER for the invalid field.</li>
7297          <li>In <strong>parsing,</strong> the field may be parsed
7298          as if it had a valid length.</li>
7299        </ul>
7300      </li>
7301      <li>For a pattern that contains a currently-invalid pattern
7302      character (applies only to date patterns, for which A-Za-z
7303      are reserved as pattern characters but not all defined as
7304      valid):
7305        <ul>
7306          <li>Produce an error (set an error code or throw an
7307          exception) when an attempt is made to create a formatter
7308          with such a pattern or to apply such a pattern to an
7309          existing formatter.</li>
7310        </ul>
7311      </li>
7312    </ul>
7313    <h2><a name="Deprecated_Structure" href="#Deprecated_Structure"
7314    id="Deprecated_Structure">Annex A Deprecated Structure</a></h2>
7315    <p>The <a href="#DTD_Annotations">DTD Annotations</a> in Section 5.7 are used to determine whether elements, attributes, or attribute values are deprecated.</p>
7316    <p>While valid LDML, they are strongly
7317    discouraged, and no longer used in CLDR.</p>
7318    <p>The remainder of this section describes selected cases of
7319    deprecated structure that were present in previous versions of
7320    CLDR.</p>
7321    <h3><a name="Fallback_Elements" href="#Fallback_Elements" id=
7322    "Fallback_Elements">A.1 Element fallback</a></h3>
7323    <p class="dtd">&lt;!ELEMENT fallback (#PCDATA) &gt;</p>
7324    <p>The fallback element is deprecated. Implementations should
7325    use instead the information in <em><a href=
7326    "#LanguageMatching">Section 4.4 Language Matching</a></em> for
7327    doing language fallback.</p>
7328    <h3><a name="BCP47_Keyword_Mapping" href=
7329    "#BCP47_Keyword_Mapping" id="BCP47_Keyword_Mapping">A.2 BCP 47
7330    Keyword Mapping</a></h3>
7331    <p><b>Note:</b> <i>This structure is deprecated and replaced
7332    with <a href="#Unicode_Locale_Extension_Data_Files">Section
7333    3.6.4 U Extension Data Files</a>.</i></p>
7334    <p class="dtd">&lt;!ELEMENT bcp47KeywordMappings ( mapKeys?,
7335    mapTypes* ) &gt;<br>
7336    &lt;!ELEMENT mapKeys ( keyMap* ) &gt;<br>
7337    &lt;!ELEMENT keyMap EMPTY &gt;<br>
7338    &lt;!ATTLIST keyMap type NMTOKEN #REQUIRED &gt;<br>
7339    &lt;!ATTLIST keyMap bcp47 NMTOKEN #REQUIRED &gt;<br>
7340    &lt;!ELEMENT mapTypes ( typeMap* ) &gt;<br>
7341    &lt;!ATTLIST mapTypes type NMTOKEN #REQUIRED &gt;<br>
7342    &lt;!ELEMENT typeMap EMPTY &gt;<br>
7343    &lt;!ATTLIST typeMap type CDATA #REQUIRED &gt;<br>
7344    &lt;!ATTLIST typeMap bcp47 NMTOKEN #REQUIRED &gt;<br></p>
7345    <p>This section defines mappings between old Unicode locale
7346    identifier key/type values and their BCP 47 'u' extension
7347    subtag representations. The 'u' extension syntax described in
7348    <a href="#u_Extension">Section 3.6 Unicode BCP 47 U
7349    Extension</a> restricts a key to two ASCII alphanumerics and a
7350    type to three to eight ASCII alphanumerics. A key or a type
7351    which does not meet that syntax requirement is converted
7352    according to the mapping data defined by the mapKeys or
7353    mapTypes elements. For example, a keyword "collation=phonebook"
7354    is converted to BCP 47 'u' extension subtags "co-phonebk" by
7355    the mapping data below:</p>
7356    <pre>    &lt;mapKeys&gt;
7357        ...
7358        &lt;keyMap type="collation" bcp47="co"/&gt;
7359        ...
7360    &lt;/mapKeys&gt;
7361    &lt;mapTypes type="collation"&gt;
7362        ...
7363        &lt;typeMap type="phonebook" bcp47="phonebk"/&gt;
7364        ...
7365    &lt;/mapTypes&gt;
7366        </pre>
7367    <h3><a name="Choice_Patterns" href="#Choice_Patterns" id=
7368    "Choice_Patterns">A.3 Choice Patterns</a></h3>
7369    <p><b>Note:</b> <i>This structure is deprecated and replaced
7370    with count attributes.</i></p>
7371    <p>A choice pattern is a string that chooses among a number of
7372    strings, based on numeric value. It has the following form:</p>
7373    <p>&lt;choice_pattern&gt; = &lt;choice&gt; ( '|' &lt;choice&gt;
7374    )*<br>
7375    &lt;choice&gt; =
7376    &lt;number&gt;&lt;relation&gt;&lt;string&gt;<br>
7377    &lt;number&gt; = ('+' | '-')? (<font size="3">'∞' | [0-9]+ ('.'
7378    [0-9]+)?)<br>
7379    &lt;relation&gt; = '&lt;' | '</font> <span style=
7380    "color: blue">≤'</span></p>
7381    <p>The interpretation of a choice pattern is that given a
7382    number N, the pattern is scanned from right to left, for each
7383    choice evaluating &lt;number&gt; &lt;relation&gt; N. The first
7384    choice that matches results in the corresponding string. If no
7385    match is found, then the first string is used. For example:</p>
7386    <table border="1" cellpadding="0" cellspacing="0">
7387      <tr>
7388        <td width="33%">Pattern</td>
7389        <td width="33%">N</td>
7390        <td width="34%">Result</td>
7391      </tr>
7392      <tr>
7393        <td width="33%" rowspan="4">0≤Rf|1≤Ru|1&lt;Re</td>
7394        <td width="33%">-<font size="3">∞,</font> -3, -1,
7395        -0.000001</td>
7396        <td width="34%">Rf (defaulted to first string)</td>
7397      </tr>
7398      <tr>
7399        <td width="33%">0, 0.01, 0.9999</td>
7400        <td width="34%">Rf</td>
7401      </tr>
7402      <tr>
7403        <td width="33%">1</td>
7404        <td width="34%">Ru</td>
7405      </tr>
7406      <tr>
7407        <td width="33%">1.00001, 5, 99, <font size=
7408        "3">∞</font></td>
7409        <td width="34%">Re</td>
7410      </tr>
7411    </table>
7412    <p>Quoting is done using ' characters, as in date or number
7413    formats.</p>
7414    <h3><a name="Element_default" href="#Element_default" id=
7415    "Element_default">A.4 Element default</a></h3>
7416    <p><b>Note:</b> <i>This structure is deprecated.</i> Use
7417    replacement structure instead, for example:</p>
7418    <ul>
7419      <li>For &lt;collations&gt;, now use the
7420      &lt;defaultCollation&gt; element.</li>
7421      <li>For &lt;calendars&gt;, the default calendar type for a
7422      locale is now specified by <i><a href=
7423      "tr35-dates.html#Calendar_Preference_Data">Calendar
7424      Preference Data</a></i>.</li>
7425    </ul>
7426    <p>In some cases, a number of elements are present. The default
7427    element can be used to indicate which of them is the default,
7428    in the absence of other information. The value of the choice
7429    attribute is to match the value of the type attribute for the
7430    selected item.</p>
7431    <pre>&lt;timeFormats&gt;
7432  &lt;default choice="<span style="color: red">medium</span>" /&gt;
7433  &lt;timeFormatLength type="<span style=
7434"color: blue">full</span>"&gt;
7435    &lt;timeFormat type="<span style=
7436"color: blue">standard</span>"&gt;
7437      &lt;pattern type="<span style=
7438"color: blue">standard</span>"&gt;<span style=
7439"color: blue">h:mm:ss a z</span>&lt;/pattern&gt;
7440    &lt;/timeFormat&gt;
7441  &lt;/timeFormatLength&gt;
7442  &lt;timeFormatLength type="<span style=
7443"color: blue">long</span>"&gt;
7444    &lt;timeFormat type="<span style=
7445"color: blue">standard</span>"&gt;
7446      &lt;pattern type="<span style=
7447"color: blue">standard</span>"&gt;<span style=
7448"color: blue">h:mm:ss a z</span>&lt;/pattern&gt;
7449    &lt;/timeFormat&gt;
7450  &lt;/timeFormatLength&gt;
7451  &lt;timeFormatLength type="<span style=
7452"color: red">medium</span>"&gt;
7453    &lt;timeFormat type="<span style=
7454"color: blue">standard</span>"&gt;
7455      &lt;pattern type="<span style=
7456"color: blue">standard</span>"&gt;<span style=
7457"color: blue">h:mm:ss a</span>&lt;/pattern&gt;
7458    &lt;/timeFormat&gt;
7459  &lt;/timeFormatLength&gt;
7460...</pre>
7461    <p>Like all other elements, the &lt;default&gt; element is
7462    inherited. Thus, it can also refer to inherited resources. For
7463    example, suppose that the above resources are present in fr,
7464    and that in fr_BE we have the following:</p>
7465    <pre>&lt;timeFormats&gt;
7466  &lt;default choice="<span style="color: red">long</span>"/&gt;
7467&lt;/timeFormats&gt;</pre>
7468    <p>In that case, the default time format for fr_BE would be the
7469    inherited "long" resource from fr. Now suppose that we had in
7470    fr_CA:</p>
7471    <pre>  &lt;timeFormatLength type="<span style=
7472    "color: red">medium</span>"&gt;
7473    &lt;timeFormat type="<span style=
7474"color: blue">standard</span>"&gt;
7475      &lt;pattern type="<span style=
7476"color: blue">standard</span>"&gt;<span style=
7477"color: blue">...</span>&lt;/pattern&gt;
7478    &lt;/timeFormat&gt;
7479  &lt;/timeFormatLength&gt;
7480    </pre>
7481    <p>In this case, the &lt;default&gt; is inherited from fr, and
7482    has the value "medium". It thus refers to this new "medium"
7483    pattern in this resource bundle.</p>
7484    <h3><a name="Deprecated_Common_Attributes" href=
7485    "#Deprecated_Common_Attributes" id=
7486    "Deprecated_Common_Attributes">A.5 Deprecated Common
7487    Attributes</a></h3>
7488    <h4><a name="Attribute_standard" href="#Attribute_standard" id=
7489    "Attribute_standard">A.5.1 Attribute standard</a></h4>
7490    <p class="element2"><b>Note:</b> This attribute is deprecated.
7491    Instead, use a reference element with the attribute
7492    standard="true".</p>
7493    <p>The value of this attribute is a list of strings
7494    representing standards: international, national, organization,
7495    or vendor standards. The presence of this attribute indicates
7496    that the data in this element is compliant with the indicated
7497    standards. Where possible, for uniqueness, the string should be
7498    a URL that represents that standard. The strings are separated
7499    by commas; leading or trailing spaces on each string are not
7500    significant. Examples:</p>
7501    <p><code>&lt;collation standard="<span style="color: blue">MSA
7502    200:2002</span>"&gt;<br>
7503    ...<br>
7504    &lt;dateFormatStyle
7505    standard=”https://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=26780&amp;amp;ICS1=1&amp;amp;ICS2=140&amp;amp;ICS3=30”&gt;</code></p>
7506    <h4><a name="Attribute_draft_nonLeaf" href=
7507    "#Attribute_draft_nonLeaf" id="Attribute_draft_nonLeaf">A.5.2
7508    Attribute draft in non-leaf elements</a></h4>
7509    <p>The draft attribute is deprecated except in leaf elements
7510    (elements that do not have any subelements)</p>
7511    <h3><a name="Element_base" href="#Element_base" id=
7512    "Element_base">A.6 Element base</a></h3>
7513    <p><b>Note:</b> <i>This element is deprecated.</i> Use the
7514    collation &lt;import&gt; element instead.</p>
7515    <p>The optional base element <code>&lt;base&gt;<span style=
7516    "color: blue">...</span>&lt;/base&gt;</code> , contains an
7517    alias element that points to another data source that defines a
7518    <i>base</i> collation. If present, it indicates that the
7519    settings and rules in the collation are modifications applied
7520    on <i>top of the</i> respective elements in the base collation.
7521    That is, any successive settings, where present, override what
7522    is in the base as described in <a href=
7523    "tr35-collation.html#Setting_Options">Setting Options</a>. Any
7524    successive rules are concatenated to the end of the rules in
7525    the base. The results of multiple rules applying to the same
7526    characters is covered in <a href=
7527    "tr35-collation.html#Orderings">Orderings</a>.</p>
7528    <h3><a name="Element_rules" href="#Element_rules" id=
7529    "Element_rules">A.7 Element rules</a></h3>
7530    <p><b>Note:</b> <i>The XML collation syntax is deprecated; this
7531    includes the &lt;rules&gt; element and its subelements, except
7532    that the &lt;import&gt; element has been moved up to be a
7533    subelement of &lt;collation&gt;.</i> Use the basic collation
7534    syntax with the <a href="tr35-collation.html#Rules">&lt;cr&gt;
7535    element</a> instead.</p>
7536    <p class="dtd">&lt;!ELEMENT rules (alias | ( ( reset | import
7537    ), ( reset | import | p | pc | s | sc | t | tc | i | ic | x)*
7538    )) &gt;</p>
7539    <h3><a name="Deprecated_subelements_of_dates" href=
7540    "#Deprecated_subelements_of_dates" id=
7541    "Deprecated_subelements_of_dates">A.8 Deprecated subelements of
7542    &lt;dates&gt;</a></h3>
7543    <ul>
7544      <li>&lt;localizedPatternChars&gt;</li>
7545      <li>&lt;dateRangePattern&gt;, replaced by
7546      &lt;intervalFormats&gt;.</li>
7547    </ul>
7548    <h3><a name="Deprecated_subelements_of_calendars" href=
7549    "#Deprecated_subelements_of_calendars" id=
7550    "Deprecated_subelements_of_calendars">A.9 Deprecated
7551    subelements of &lt;calendars&gt;</a></h3>
7552    <ul>
7553      <li>&lt;monthNames&gt; and &lt;monthAbbr&gt;; month name
7554      forms are specified in the &lt;months&gt; element. The older
7555      monthNames, monthAbbr are equivalent to: using the months
7556      element with the context type="<span style=
7557      "color: blue">format</span>" and the width type="<span style=
7558      "color: blue">wide</span>" (for ...Names) and
7559      type="<span style="color: blue">narrow</span>" (for ...Abbr),
7560      respectively.</li>
7561      <li>&lt;dayNames&gt; and &lt;dayAbbr&gt;; weekday name forms
7562      are specified in the &lt;days&gt; element. The older
7563      dayNames, dayAbbr are equivalent to: using the days element
7564      with the context type="<span style=
7565      "color: blue">format</span>" and the width type="<span style=
7566      "color: blue">wide</span>" (for ...Names) and
7567      type="<span style="color: blue">narrow</span>" (for ...Abbr),
7568      respectively.</li>
7569      <li><a name="week" href="#week" id="week">&lt;week&gt;</a> is
7570      deprecated in the main LDML files, because the data is more
7571      appropriately organized as connected to territories, not to
7572      linguistic data. Use the supplemental &lt;weekData&gt;
7573      element instead.</li>
7574      <li>&lt;am&gt; and &lt;pm&gt;; these are now included as part
7575      of the &lt;dayPeriods&gt; element</li>
7576      <li>&lt;fields&gt; is deprecated as a subelement of
7577      &lt;calendars&gt; instead, a &lt;fields&gt; element should be
7578      located just under a &lt;dates&gt; element. See <a href=
7579      "tr35-dates.html#Calendar_Fields">Calendar Fields</a>.</li>
7580    </ul>
7581    <h3><a name="Deprecated_subelements_of_timeZoneNames" href=
7582    "#Deprecated_subelements_of_timeZoneNames" id=
7583    "Deprecated_subelements_of_timeZoneNames">A.10 Deprecated
7584    subelements of &lt;timeZoneNames&gt;</a></h3>
7585    <ul>
7586      <li>&lt;hoursFormat&gt; e.g. "{0}/{1}" for "-0800/-0700"</li>
7587      <li><a name="fallbackRegionFormat" href=
7588      "#fallbackRegionFormat" id=
7589      "fallbackRegionFormat">&lt;fallbackRegionFormat&gt;</a>
7590      (deprecated), e.g. "{0}&nbsp;Time ({1})" for "United States
7591      Time (New York)"</li>
7592      <li>&lt;abbreviationFallback&gt;</li>
7593      <li>&lt;preferenceOrdering&gt;, a preference ordering among
7594      modern zones; use metazones instead.</li>
7595      <li>&lt;singleCountries&gt;, use <a href=
7596      "tr35-dates.html#Primary_Zones">Primary Zones</a></li>
7597    </ul>
7598    <h3><a name="Deprecated_subelements_of_zone_metazone" href=
7599    "#Deprecated_subelements_of_zone_metazone" id=
7600    "Deprecated_subelements_of_zone_metazone">A.11 Deprecated
7601    subelements of &lt;zone&gt; and &lt;metazone&gt;</a></h3>
7602    <ul>
7603      <li>&lt;commonlyUsed&gt;, formerly used to indicate whether a
7604      zone was commonly used in the locale.</li>
7605    </ul>
7606    <h3><a name=
7607    "Renamed_attribute_values_for_contextTransformUsage" href=
7608    "#Renamed_attribute_values_for_contextTransformUsage" id=
7609    "Renamed_attribute_values_for_contextTransformUsage">A.12
7610    Renamed attribute values for &lt;contextTransformUsage&gt;
7611    element</a></h3>
7612    <p>The &lt;contextTransformUsage&gt; element was introduced in
7613    CLDR 21. The values for its <em>type</em> attribute are
7614    documented in <a href=
7615    "tr35-general.html#contextTransformUsage_type_attribute_values">
7616    &lt;contextTransformUsage&gt; type attribute values</a>. In
7617    CLDR 25, some of these values were renamed from their previous
7618    values for improved clarity:</p>
7619    <ul>
7620      <li>"type" was renamed to "keyValue"</li>
7621      <li>"displayName" was renamed to "currencyName"</li>
7622      <li>"displayName-count" was renamed to
7623      "currencyName-count"</li>
7624      <li>"tense" was renamed to "relative"</li>
7625    </ul>
7626    <h3><a name="Deprecated_subelements_of_segmentations" href=
7627    "#Deprecated_subelements_of_segmentations" id=
7628    "Deprecated_subelements_of_segmentations">A.13 Deprecated
7629    subelements of &lt;segmentations&gt;</a></h3>
7630    <ul>
7631      <li>&lt;exceptions&gt; and &lt;exceptions&gt; were deprecated
7632      and replaced with &lt;suppressions&gt; and
7633      &lt;suppression&gt;.</li>
7634    </ul>
7635    <h3><a name="Element_cp" href="#Element_cp" id=
7636    "Element_cp">A.14 Element cp</a></h3>
7637    <p>The cp element was used to escape characters that cannot be
7638    represented in XML, even with NCRs. These escapes were only
7639    allowed in certain elements, according to the DTD.</p>
7640    <p>However, this mechanism is very clumsy, and was replaced by
7641    specialized syntax.</p>
7642    <table>
7643      <tr>
7644        <th>Code Point</th>
7645        <th>XML Example</th>
7646      </tr>
7647      <tr>
7648        <td><code>U+0000</code></td>
7649        <td><code>&lt;cp hex="0"&gt;</code></td>
7650      </tr>
7651    </table>
7652    <p>&nbsp;</p>
7653    <h3><a name="validSubLocales" href="#validSubLocales" id=
7654    "validSubLocales">A.15 Attribute validSubLocales</a></h3>
7655    <p>The attribute <i>validSubLocales</i> allowed sublocales in a
7656    given tree to be treated as though a file for them were present
7657    when there was not one. It only had an effect for locales that
7658    inherit from the current file where a file is missing.</p>
7659    <p><b>Example 1.</b> Suppose that in a particular LDML tree,
7660    there are no region locales for German, for example, there is a
7661    de.xml file, but no files for de_AT.xml, de_CH.xml, or
7662    de_DE.xml. Then no elements are valid for any of those region
7663    locales. If we want to mark one of those files as having valid
7664    elements, then we introduce an empty file, such as the
7665    following.</p>
7666    <p><code>&lt;ldml version="1.1"&gt;<br>
7667    &nbsp;&lt;identity&gt;<br>
7668    &nbsp; &lt;version number="1.1" /&gt;<br>
7669    &nbsp; &lt;language type="de" /&gt;<br>
7670    &nbsp; &lt;territory type="AT" /&gt;<br>
7671    &nbsp;&lt;/identity&gt;<br>
7672    &lt;/ldml&gt;</code></p>
7673    <p>With the <i>validSubLocales</i> attribute, instead of adding
7674    the empty files for de_AT.xml, de_CH.xml, and de_DE.xml, in the
7675    de file we could add to the parent locale a list of the child
7676    locales that should behave as if files were present.</p>
7677    <p><code>&lt;ldml version="1.1" validSubLocales="de_AT de_CH
7678    de_DE"&gt;<br>
7679    &nbsp;&lt;identity&gt;<br>
7680    &nbsp; &lt;version number="1.1" /&gt;<br>
7681    &nbsp; &lt;language type="de" /&gt;<br>
7682    &nbsp;&lt;/identity&gt;<br>
7683    ...<br>
7684    &lt;/ldml&gt;</code></p>
7685    <p>Now that the <i>validSubLocales</i> attribute has been
7686    deprecated, it is recommended to simply add empty files to
7687    specify which sublocales are valid. This convention is used
7688    throughout the CLDR.</p>
7689    <h3><a name="postCodeElements" href="#postCodeElements" id=
7690    "postCodeElements">A.16 Elements postalCodeData,
7691    postCodeRegex</a></h3>
7692    <p>The postal code validation data has been deprecated. Please
7693    see other services that are kept up to date, such as:</p>
7694    <ul>
7695      <li><a href=
7696      "https://i18napis.appspot.com/address/data/US">https://i18napis.appspot.com/address/data/US</a></li>
7697      <li><a href=
7698      "https://i18napis.appspot.com/address/data/CH">https://i18napis.appspot.com/address/data/CH</a></li>
7699      <li>...</li>
7700    </ul>
7701    <p>See <a href="tr35-info.html#Postal_Code_Validation">Postal
7702    Code Validation</a></p>
7703    <h3><a name="telephoneCodeData" href="#telephoneCodeData" id=
7704    "telephoneCodeData">A.17 Element telephoneCodeData</a></h3>
7705    <p>The element &lt;telephoneCodeData&gt; and its subelements
7706    have been deprecated and the data removed.</p>
7707    <hr>
7708    <h2><a name="Links_to_Other_Parts" href="#Links_to_Other_Parts"
7709    id="Links_to_Other_Parts">Annex B Links to Other Parts</a></h2>
7710    <p>The LDML specification is split into several <a href=
7711    "#Parts">parts</a> by topic, with one HTML document per part.
7712    The following tables provide redirects for links to specific
7713    topics. Please update your links and bookmarks.</p>
7714    <p>Part 1 Links: Core (this document): No redirects needed.</p>
7715    <table cellspacing="0" cellpadding="2" border="1" width="100%">
7716      <caption>
7717        <a href="#Part_2_Links" name="Part_2_Links" id=
7718        "Part_2_Links">Part 2 Links</a>: <a href=
7719        "tr35-general.html">General</a> (display names &amp;
7720        transforms, etc.)
7721      </caption>
7722      <tr>
7723        <th>Old section</th>
7724        <th>Section in new part</th>
7725      </tr>
7726      <tr>
7727        <td>5.4 <a name="Display_Name_Elements" href=
7728        "#Display_Name_Elements" id="Display_Name_Elements">Display
7729        Name Elements</a></td>
7730        <td>1 <a href=
7731        "tr35-general.html#Display_Name_Elements">Display Name
7732        Elements</a></td>
7733      </tr>
7734      <tr>
7735        <td>5.5 <a name="Layout_Elements" href="#Layout_Elements"
7736        id="Layout_Elements">Layout Elements</a></td>
7737        <td>2 <a href="tr35-general.html#Layout_Elements">Layout
7738        Elements</a></td>
7739      </tr>
7740      <tr>
7741        <td>5.6 <a name="Character_Elements" href=
7742        "#Character_Elements" id="Character_Elements">Character
7743        Elements</a></td>
7744        <td>3 <a href=
7745        "tr35-general.html#Character_Elements">Character
7746        Elements</a></td>
7747      </tr>
7748      <tr>
7749        <td>5.6.1 <a name="ExemplarSyntax" href="#ExemplarSyntax"
7750        id="ExemplarSyntax">Exemplar Syntax</a></td>
7751        <td>3.1 <a href="tr35-general.html#ExemplarSyntax">Exemplar
7752        Syntax</a></td>
7753      </tr>
7754      <tr>
7755        <td>5.6.2 Restrictions</td>
7756        <td>3.1 <a href="tr35-general.html#ExemplarSyntax">Exemplar
7757        Syntax</a></td>
7758      </tr>
7759      <tr>
7760        <td>5.6.3 Mapping</td>
7761        <td>3.2 <a href=
7762        "tr35-general.html#Character_Mapping">Mapping</a></td>
7763      </tr>
7764      <tr>
7765        <td>5.6.4 <a name="IndexLabels" href="#IndexLabels" id=
7766        "IndexLabels">Index Labels</a></td>
7767        <td>3.3 <a href="tr35-general.html#IndexLabels">Index
7768        Labels</a></td>
7769      </tr>
7770      <tr>
7771        <td>5.6.5 Ellipsis</td>
7772        <td>3.4 <a href=
7773        "tr35-general.html#Ellipsis">Ellipsis</a></td>
7774      </tr>
7775      <tr>
7776        <td>5.6.6 More Information</td>
7777        <td>3.5 <a href=
7778        "tr35-general.html#Character_More_Info">More
7779        Information</a></td>
7780      </tr>
7781      <tr>
7782        <td>5.7 <a name="Delimiter_Elements" href=
7783        "#Delimiter_Elements" id="Delimiter_Elements">Delimiter
7784        Elements</a></td>
7785        <td>4 <a href=
7786        "tr35-general.html#Delimiter_Elements">Delimiter
7787        Elements</a></td>
7788      </tr>
7789      <tr>
7790        <td>C.6 <a name="Measurement_System_Data" href=
7791        "#Measurement_System_Data" id=
7792        "Measurement_System_Data">Measurement System Data</a></td>
7793        <td>5 <a href=
7794        "tr35-general.html#Measurement_System_Data">Measurement
7795        System Data</a></td>
7796      </tr>
7797      <tr>
7798        <td>5.8 <a name="Measurement_Elements" href=
7799        "#Measurement_Elements" id=
7800        "Measurement_Elements">Measurement Elements
7801        (deprecated)</a></td>
7802        <td>5.1 <a href=
7803        "tr35-general.html#Measurement_Elements">Measurement
7804        Elements (deprecated)</a></td>
7805      </tr>
7806      <tr>
7807        <td>5.11 <a name="Unit_Elements" href="#Unit_Elements" id=
7808        "Unit_Elements">Unit Elements</a></td>
7809        <td>6 <a href="tr35-general.html#Unit_Elements">Unit
7810        Elements</a></td>
7811      </tr>
7812      <tr>
7813        <td>5.12 <a name="POSIX_Elements" href="#POSIX_Elements"
7814        id="POSIX_Elements">POSIX Elements</a></td>
7815        <td>7 <a href="tr35-general.html#POSIX_Elements">POSIX
7816        Elements</a></td>
7817      </tr>
7818      <tr>
7819        <td>5.13 <a name="Reference_Elements" href=
7820        "#Reference_Elements" id="Reference_Elements">Reference
7821        Element</a></td>
7822        <td>8 <a href=
7823        "tr35-general.html#Reference_Elements">Reference
7824        Element</a></td>
7825      </tr>
7826      <tr>
7827        <td>5.15 <a name="Segmentations" href="#Segmentations" id=
7828        "Segmentations">Segmentations</a></td>
7829        <td>9 <a href=
7830        "tr35-general.html#Segmentations">Segmentations</a></td>
7831      </tr>
7832      <tr>
7833        <td>5.15.1 <a name="Segmentation_Inheritance" href=
7834        "#Segmentation_Inheritance" id=
7835        "Segmentation_Inheritance">Segmentation
7836        Inheritance</a></td>
7837        <td>9.1 <a href=
7838        "tr35-general.html#Segmentation_Inheritance">Segmentation
7839        Inheritance</a></td>
7840      </tr>
7841      <tr>
7842        <td>5.16 <a name="Transforms" href="#Transforms" id=
7843        "Transforms">Transforms</a></td>
7844        <td>10 <a href=
7845        "tr35-general.html#Transforms">Transforms</a></td>
7846      </tr>
7847      <tr>
7848        <td>N <a name="Transform_Rules" href="#Transform_Rules" id=
7849        "Transform_Rules">Transform Rules</a></td>
7850        <td>10.3 <a href=
7851        "tr35-general.html#Transform_Rules_Syntax">Transform Rules
7852        Syntax</a></td>
7853      </tr>
7854      <tr>
7855        <td>5.18 <a name="ListPatterns" href="#ListPatterns" id=
7856        "ListPatterns">List Patterns</a></td>
7857        <td>11 <a href="tr35-general.html#ListPatterns">List
7858        Patterns</a></td>
7859      </tr>
7860      <tr>
7861        <td>C.20 <a name="List_Gender" href="#List_Gender" id=
7862        "List_Gender">Gender of Lists</a></td>
7863        <td>11.1 <a href="tr35-general.html#List_Gender">Gender of
7864        Lists</a></td>
7865      </tr>
7866      <tr>
7867        <td>5.19 <a name="Context_Transform_Elements" href=
7868        "#Context_Transform_Elements" id=
7869        "Context_Transform_Elements">ContextTransform
7870        Elements</a></td>
7871        <td>12 <a href=
7872        "tr35-general.html#Context_Transform_Elements">ContextTransform
7873        Elements</a></td>
7874      </tr>
7875      <tr>
7876        <td></td>
7877        <td><a href="tr35-general.html#"></a></td>
7878      </tr>
7879    </table>
7880    <table cellspacing="0" cellpadding="2" border="1" width="100%">
7881      <caption>
7882        <a href="#Part_3_Links" name="Part_3_Links" id=
7883        "Part_3_Links">Part 3 Links</a>: <a href=
7884        "tr35-numbers.html">Numbers</a> (number &amp; currency
7885        formatting)
7886      </caption>
7887      <tr>
7888        <th>Old section</th>
7889        <th>Section in new part</th>
7890      </tr>
7891      <tr>
7892        <td>C.13 <a name="Numbering_Systems" href=
7893        "#Numbering_Systems" id="Numbering_Systems">Numbering
7894        Systems</a></td>
7895        <td>1 <a href=
7896        "tr35-numbers.html#Numbering_Systems">Numbering
7897        Systems</a></td>
7898      </tr>
7899      <tr>
7900        <td>5.10 <a name="Number_Elements" href="#Number_Elements"
7901        id="Number_Elements">Number Elements</a></td>
7902        <td>2 <a href="tr35-numbers.html#Number_Elements">Number
7903        Elements</a></td>
7904      </tr>
7905      <tr>
7906        <td>5.10.1 <a name="Number_Symbols" href="#Number_Symbols"
7907        id="Number_Symbols">Number Symbols</a></td>
7908        <td>2.3 <a href="tr35-numbers.html#Number_Symbols">Number
7909        Symbols</a></td>
7910      </tr>
7911      <tr>
7912        <td>G <a name="Number_Format_Patterns" href=
7913        "#Number_Format_Patterns" id=
7914        "Number_Format_Patterns">Number Format Patterns</a></td>
7915        <td>3 <a href=
7916        "tr35-numbers.html#Number_Format_Patterns">Number Format
7917        Patterns</a></td>
7918      </tr>
7919      <tr>
7920        <td>5.10.2 <a name="Currencies" href="#Currencies" id=
7921        "Currencies">Currencies</a></td>
7922        <td>4 <a href=
7923        "tr35-numbers.html#Currencies">Currencies</a></td>
7924      </tr>
7925      <tr>
7926        <td>C.1 <a name="Supplemental_Currency_Data" href=
7927        "#Supplemental_Currency_Data" id=
7928        "Supplemental_Currency_Data">Supplemental Currency
7929        Data</a></td>
7930        <td>4.1 <a href=
7931        "tr35-numbers.html#Supplemental_Currency_Data">Supplemental
7932        Currency Data</a></td>
7933      </tr>
7934      <tr>
7935        <td>C.11 <a name="Language_Plural_Rules" href=
7936        "#Language_Plural_Rules" id=
7937        "Language_Plural_Rules">Language Plural Rules</a></td>
7938        <td>5 <a href=
7939        "tr35-numbers.html#Language_Plural_Rules">Language Plural
7940        Rules</a></td>
7941      </tr>
7942      <tr>
7943        <td>5.17 <a name="Rule-Based_Number_Formatting" href=
7944        "#Rule-Based_Number_Formatting" id=
7945        "Rule-Based_Number_Formatting">Rule-Based Number
7946        Formatting</a></td>
7947        <td>6 <a href=
7948        "tr35-numbers.html#Rule-Based_Number_Formatting">Rule-Based
7949        Number Formatting</a></td>
7950      </tr>
7951    </table>
7952    <table cellspacing="0" cellpadding="2" border="1" width="100%">
7953      <caption>
7954        <a href="#Part_4_Links" name="Part_4_Links" id=
7955        "Part_4_Links">Part 4 Links</a>: <a href=
7956        "tr35-dates.html">Dates</a> (date, time, time zone
7957        formatting)
7958      </caption>
7959      <tr>
7960        <th>Old section</th>
7961        <th>Section in new part</th>
7962      </tr>
7963      <tr>
7964        <td><a name="Date_Elements" href="#Date_Elements" id=
7965        "Date_Elements">5.9 Date Elements</a></td>
7966        <td>1 <a href=
7967        "tr35-dates.html#Overview_Dates_Element_Supplemental">Overview:
7968        Dates Element, Supplemental Date and Calendar
7969        Information</a></td>
7970      </tr>
7971      <tr>
7972        <td><a name="Calendar_Elements" href="#Calendar_Elements"
7973        id="Calendar_Elements">5.9.1 Calendar Elements</a></td>
7974        <td>2 <a href="tr35-dates.html#Calendar_Elements">Calendar
7975        Elements</a></td>
7976      </tr>
7977      <tr>
7978        <td><a name="months_days_quarters_eras" href=
7979        "#months_days_quarters_eras" id=
7980        "months_days_quarters_eras">Elements months, days,
7981        quarters, eras</a></td>
7982        <td>2.1 <a href=
7983        "tr35-dates.html#months_days_quarters_eras">Elements
7984        months, days, quarters, eras</a></td>
7985      </tr>
7986      <tr>
7987        <td><a name="monthPatterns_cyclicNameSets" href=
7988        "#monthPatterns_cyclicNameSets" id=
7989        "monthPatterns_cyclicNameSets">Elements monthPatterns,
7990        cyclicNameSets</a></td>
7991        <td>2.2 <a href=
7992        "tr35-dates.html#monthPatterns_cyclicNameSets">Elements
7993        monthPatterns, cyclicNameSets</a></td>
7994      </tr>
7995      <tr>
7996        <td><a name="dayPeriods" href="#dayPeriods" id=
7997        "dayPeriods">Element dayPeriods</a></td>
7998        <td>2.3 <a href="tr35-dates.html#dayPeriods">Element
7999        dayPeriods</a></td>
8000      </tr>
8001      <tr>
8002        <td><a name="dateFormats" href="#dateFormats" id=
8003        "dateFormats">Element dateFormats</a></td>
8004        <td>2.4 <a href="tr35-dates.html#dateFormats">Element
8005        dateFormats</a></td>
8006      </tr>
8007      <tr>
8008        <td><a name="timeFormats" href="#timeFormats" id=
8009        "timeFormats">Element timeFormats</a></td>
8010        <td>2.5 <a href="tr35-dates.html#timeFormats">Element
8011        timeFormats</a></td>
8012      </tr>
8013      <tr>
8014        <td><a name="dateTimeFormats" href="#dateTimeFormats" id=
8015        "dateTimeFormats">Element dateTimeFormats</a></td>
8016        <td>2.6 <a href="tr35-dates.html#dateTimeFormats">Element
8017        dateTimeFormats</a></td>
8018      </tr>
8019      <tr>
8020        <td><a name="Calendar_Fields" href="#Calendar_Fields" id=
8021        "Calendar_Fields">5.9.2 Calendar Fields</a></td>
8022        <td>3 <a href="tr35-dates.html#Calendar_Fields">Calendar
8023        Fields</a></td>
8024      </tr>
8025      <tr>
8026        <td>5.9.3 <a name="Timezone_Names" href="#Timezone_Names"
8027        id="Timezone_Names">Time Zone Names</a></td>
8028        <td>5 <a href="tr35-dates.html#Time_Zone_Names">Time Zone
8029        Names</a></td>
8030      </tr>
8031      <tr>
8032        <td><a name="Supplemental_Calendar_Data" href=
8033        "#Supplemental_Calendar_Data" id=
8034        "Supplemental_Calendar_Data">C.5 Supplemental Calendar
8035        Data</a></td>
8036        <td>4 <a href=
8037        "tr35-dates.html#Supplemental_Calendar_Data">Supplemental
8038        Calendar Data</a></td>
8039      </tr>
8040      <tr>
8041        <td><a name="Supplemental_Timezone_Data" href=
8042        "#Supplemental_Timezone_Data" id=
8043        "Supplemental_Timezone_Data">C.7 Supplemental Time Zone
8044        Data</a></td>
8045        <td>6 <a href=
8046        "tr35-dates.html#Supplemental_Time_Zone_Data">Supplemental
8047        Time Zone Data</a></td>
8048      </tr>
8049      <tr>
8050        <td><a name="Calendar_Preference_Data" href=
8051        "#Calendar_Preference_Data" id=
8052        "Calendar_Preference_Data">C.15 Calendar Preference
8053        Data</a></td>
8054        <td>4.2 <a href=
8055        "tr35-dates.html#Calendar_Preference_Data">Calendar
8056        Preference Data</a></td>
8057      </tr>
8058      <tr>
8059        <td><a name="DayPeriodRules" href="#DayPeriodRules" id=
8060        "DayPeriodRules">C.17 DayPeriod Rules</a></td>
8061        <td>4.5 <a href="tr35-dates.html#Day_Period_Rules">Day
8062        Period Rules</a></td>
8063      </tr>
8064      <tr>
8065        <td><a name="Date_Format_Patterns" href=
8066        "#Date_Format_Patterns" id="Date_Format_Patterns">Appendix
8067        F: Date Format Patterns</a></td>
8068        <td>8 <a href="tr35-dates.html#Date_Format_Patterns">Date
8069        Format Patterns</a></td>
8070      </tr>
8071      <tr>
8072        <td><a name="Date_Field_Symbol_Table" href=
8073        "#Date_Field_Symbol_Table" id=
8074        "Date_Field_Symbol_Table">Date Field Symbol Table</a></td>
8075        <td><a href="tr35-dates.html#Date_Field_Symbol_Table">Date
8076        Field Symbol Table</a></td>
8077      </tr>
8078      <tr>
8079        <td><a name="Localized_Pattern_Characters" href=
8080        "#Localized_Pattern_Characters" id=
8081        "Localized_Pattern_Characters">F.1 Localized Pattern
8082        Characters (deprecated)</a></td>
8083        <td>8.1 <a href=
8084        "tr35-dates.html#Localized_Pattern_Characters">Localized
8085        Pattern Characters (deprecated)</a></td>
8086      </tr>
8087      <tr>
8088        <td><a name="Time_Zone_Fallback" href="#Time_Zone_Fallback"
8089        id="Time_Zone_Fallback">Appendix J: Time Zone Display
8090        Names</a></td>
8091        <td>7 <a href="tr35-dates.html#Using_Time_Zone_Names">Using
8092        Time Zone Names</a></td>
8093      </tr>
8094      <tr>
8095        <td><a name="fallbackFormat" href="#fallbackFormat" id=
8096        "fallbackFormat"><b>fallbackFormat</b>:</a></td>
8097        <td><a href=
8098        "tr35-dates.html#fallbackFormat"><b>fallbackFormat</b>:</a></td>
8099      </tr>
8100      <tr>
8101        <td>O.4 Parsing Dates and Times</td>
8102        <td>9 <a href="tr35-dates.html#Parsing_Dates_Times">Parsing
8103        Dates and Times</a></td>
8104      </tr>
8105    </table>
8106    <table cellspacing="0" cellpadding="2" border="1" width="100%">
8107      <caption>
8108        <a href="#Part_5_Links" name="Part_5_Links" id=
8109        "Part_5_Links">Part 5 Links</a>: <a href=
8110        "tr35-collation.html">Collation</a> (sorting, searching,
8111        grouping)
8112      </caption>
8113      <tr>
8114        <th>Old section</th>
8115        <th>Section in new part</th>
8116      </tr>
8117      <tr>
8118        <td>5.14 <a name="Collation_Elements" href=
8119        "#Collation_Elements" id="Collation_Elements">Collation
8120        Elements</a></td>
8121        <td>3 <a href=
8122        "tr35-collation.html#Collation_Tailorings">Collation
8123        Tailorings</a></td>
8124      </tr>
8125      <tr>
8126        <td>5.14.1 <a name="Collation_Version" href=
8127        "#Collation_Version" id=
8128        "Collation_Version">Version</a></td>
8129        <td>3.1 <a href=
8130        "tr35-collation.html#Collation_Version">Version</a></td>
8131      </tr>
8132      <tr>
8133        <td>5.14.2 <a name="Collation_Element" href=
8134        "#Collation_Element" id="Collation_Element">Collation
8135        Element</a></td>
8136        <td>3.2 <a href=
8137        "tr35-collation.html#Collation_Element">Collation
8138        Element</a></td>
8139      </tr>
8140      <tr>
8141        <td>5.14.3 <a name="Setting_Options" href=
8142        "#Setting_Options" id="Setting_Options">Setting
8143        Options</a></td>
8144        <td>3.3 <a href=
8145        "tr35-collation.html#Setting_Options">Setting
8146        Options</a></td>
8147      </tr>
8148      <tr>
8149        <td>Table <a name="Collation_Settings" href=
8150        "#Collation_Settings" id="Collation_Settings">Collation
8151        Settings</a></td>
8152        <td>Table <a href=
8153        "tr35-collation.html#Collation_Settings">Collation
8154        Settings</a></td>
8155      </tr>
8156      <tr>
8157        <td>5.14.4 <a name="Rules" href="#Rules" id=
8158        "Rules">Collation Rule Syntax</a></td>
8159        <td>3.4 <a href="tr35-collation.html#Rules">Collation Rule
8160        Syntax</a></td>
8161      </tr>
8162      <tr>
8163        <td>5.14.5 <a name="Orderings" href="#Orderings" id=
8164        "Orderings">Orderings</a></td>
8165        <td>3.5 <a href=
8166        "tr35-collation.html#Orderings">Orderings</a></td>
8167      </tr>
8168      <tr>
8169        <td>5.14.6 <a name="Contractions" href="#Contractions" id=
8170        "Contractions">Contractions</a></td>
8171        <td>3.6 <a href=
8172        "tr35-collation.html#Contractions">Contractions</a></td>
8173      </tr>
8174      <tr>
8175        <td>5.14.7 <a name="Expansions" href="#Expansions" id=
8176        "Expansions">Expansions</a></td>
8177        <td>3.7 <a href=
8178        "tr35-collation.html#Expansions">Expansions</a></td>
8179      </tr>
8180      <tr>
8181        <td>5.14.8 <a name="Context_Before" href="#Context_Before"
8182        id="Context_Before">Context Before</a></td>
8183        <td>3.8 <a href=
8184        "tr35-collation.html#Context_Before">Context
8185        Before</a></td>
8186      </tr>
8187      <tr>
8188        <td>5.14.9 <a name="Placing_Characters_Before_Others" href=
8189        "#Placing_Characters_Before_Others" id=
8190        "Placing_Characters_Before_Others">Placing Characters
8191        Before Others</a></td>
8192        <td>3.9 <a href=
8193        "tr35-collation.html#Placing_Characters_Before_Others">Placing
8194        Characters Before Others</a></td>
8195      </tr>
8196      <tr>
8197        <td>5.14.10 <a name="Logical_Reset_Positions" href=
8198        "#Logical_Reset_Positions" id=
8199        "Logical_Reset_Positions">Logical Reset Positions</a></td>
8200        <td>3.10 <a href=
8201        "tr35-collation.html#Logical_Reset_Positions">Logical Reset
8202        Positions</a></td>
8203      </tr>
8204      <tr>
8205        <td>5.14.11 <a name="Special_Purpose_Commands" href=
8206        "#Special_Purpose_Commands" id=
8207        "Special_Purpose_Commands">Special-Purpose
8208        Commands</a></td>
8209        <td>3.11 <a href=
8210        "tr35-collation.html#Special_Purpose_Commands">Special-Purpose
8211        Commands</a></td>
8212      </tr>
8213      <tr>
8214        <td>5.14.12 <a name="Script_Reordering" href=
8215        "#Script_Reordering" id="Script_Reordering">Collation
8216        Reordering</a></td>
8217        <td>3.12 <a href=
8218        "tr35-collation.html#Script_Reordering">Collation
8219        Reordering</a></td>
8220      </tr>
8221      <tr>
8222        <td>5.14.13 <a name="Case_Parameters" href=
8223        "#Case_Parameters" id="Case_Parameters">Case
8224        Parameters</a></td>
8225        <td>3.13 <a href="tr35-collation.html#Case_Parameters">Case
8226        Parameters</a></td>
8227      </tr>
8228      <tr>
8229        <td>Definition: <a name="UncasedExceptions" href=
8230        "#UncasedExceptions" id=
8231        "UncasedExceptions">UncasedExceptions</a></td>
8232        <td>removed: see 3.13 <a href=
8233        "tr35-collation.html#Case_Parameters">Case
8234        Parameters</a></td>
8235      </tr>
8236      <tr>
8237        <td>Definition: <a name="LowerExceptions" href=
8238        "#LowerExceptions" id=
8239        "LowerExceptions">LowerExceptions</a></td>
8240        <td>removed: see 3.13 <a href=
8241        "tr35-collation.html#Case_Parameters">Case
8242        Parameters</a></td>
8243      </tr>
8244      <tr>
8245        <td>Definition: <a name="UpperExceptions" href=
8246        "#UpperExceptions" id=
8247        "UpperExceptions">UpperExceptions</a></td>
8248        <td>removed: see 3.13 <a href=
8249        "tr35-collation.html#Case_Parameters">Case
8250        Parameters</a></td>
8251      </tr>
8252      <tr>
8253        <td>5.14.14 <a name="Visibility" href="#Visibility" id=
8254        "Visibility">Visibility</a></td>
8255        <td>3.14 <a href=
8256        "tr35-collation.html#Visibility">Visibility</a></td>
8257      </tr>
8258    </table>
8259    <table cellspacing="0" cellpadding="2" border="1" width="100%">
8260      <caption>
8261        <a href="#Part_6_Links" name="Part_6_Links" id=
8262        "Part_6_Links">Part 6 Links</a>: <a href=
8263        "tr35-info.html">Supplemental</a> (supplemental data)
8264      </caption>
8265      <tr>
8266        <th>Old section</th>
8267        <th>Section in new part</th>
8268      </tr>
8269      <tr>
8270        <td>C <a name="Supplemental_Data" href="#Supplemental_Data"
8271        id="Supplemental_Data">Supplemental Data</a></td>
8272        <td>Introduction <a href=
8273        "tr35-info.html#Supplemental_Data">Supplemental
8274        Data</a></td>
8275      </tr>
8276      <tr>
8277        <td>C.2 <a name="Supplemental_Territory_Containment" href=
8278        "#Supplemental_Territory_Containment" id=
8279        "Supplemental_Territory_Containment">Supplemental Territory
8280        Containment</a></td>
8281        <td>1.1 <a href=
8282        "tr35-info.html#Supplemental_Territory_Containment">Supplemental
8283        Territory Containment</a></td>
8284      </tr>
8285      <tr>
8286        <td>C.4 <a name="Supplemental_Territory_Information" href=
8287        "#Supplemental_Territory_Information" id=
8288        "Supplemental_Territory_Information">Supplemental Territory
8289        Information</a></td>
8290        <td>1.2 <a href=
8291        "tr35-info.html#Supplemental_Territory_Information">Supplemental
8292        Territory Information</a></td>
8293      </tr>
8294      <tr>
8295        <td>C.3 <a name="Supplemental_Language_Data" href=
8296        "#Supplemental_Language_Data" id=
8297        "Supplemental_Language_Data">Supplemental Language
8298        Data</a></td>
8299        <td>2 <a href=
8300        "tr35-info.html#Supplemental_Language_Data">Supplemental
8301        Language Data</a></td>
8302      </tr>
8303      <tr>
8304        <td>C.9 <a name="Supplemental_Code_Mapping" href=
8305        "#Supplemental_Code_Mapping" id=
8306        "Supplemental_Code_Mapping">Supplemental Code
8307        Mapping</a></td>
8308        <td>4 <a href=
8309        "tr35-info.html#Supplemental_Code_Mapping">Supplemental
8310        Code Mapping</a></td>
8311      </tr>
8312      <tr>
8313        <td>C.12 <a name="Telephone_Code_Data" href=
8314        "#Telephone_Code_Data" id="Telephone_Code_Data">Telephone
8315        Code Data</a></td>
8316        <td>5 <a href=
8317        "tr35-info.html#Telephone_Code_Data">Telephone Code
8318        Data</a></td>
8319      </tr>
8320      <tr>
8321        <td>C.14 <a name="Postal_Code_Validation" href=
8322        "#Postal_Code_Validation" id=
8323        "Postal_Code_Validation">Postal Code Validation</a></td>
8324        <td>6 <a href=
8325        "tr35-info.html#Postal_Code_Validation">Postal Code
8326        Validation</a></td>
8327      </tr>
8328      <tr>
8329        <td>C.8 <a name="Supplemental_Character_Fallback_Data"
8330        href="#Supplemental_Character_Fallback_Data" id=
8331        "Supplemental_Character_Fallback_Data">Supplemental
8332        Character Fallback Data</a></td>
8333        <td>7 <a href=
8334        "tr35-info.html#Supplemental_Character_Fallback_Data">Supplemental
8335        Character Fallback Data</a></td>
8336      </tr>
8337      <tr>
8338        <td>M <a name="Coverage_Levels" href="#Coverage_Levels" id=
8339        "Coverage_Levels">Coverage Levels</a></td>
8340        <td>8 <a href="tr35-info.html#Coverage_Levels">Coverage
8341        Levels</a></td>
8342      </tr>
8343      <tr>
8344        <td>5.20 <a name="Metadata_Elements" href=
8345        "tr35-info.html#Metadata_Elements" id=
8346        "Metadata_Elements">Metadata Elements</a></td>
8347        <td>10 <a href="tr35-info.html#Metadata_Elements">Locale
8348        Metadata Element</a></td>
8349      </tr>
8350      <tr>
8351        <td>P <a name="Appendix_Supplemental_Metadata" href=
8352        "tr35-info.html#Appendix_Supplemental_Metadata" id=
8353        "Appendix_Supplemental_Metadata">Supplemental
8354        Metadata</a><br>
8355        P.1 <a name="Supplemental_Alias_Information" href=
8356        "tr35-info.html#Supplemental_Alias_Information" id=
8357        "Supplemental_Alias_Information">Supplemental Alias
8358        Information</a><br>
8359        P.2 <a name="Supplemental_Deprecated_Information" href=
8360        "tr35-info.html#Supplemental_Deprecated_Information" id=
8361        "Supplemental_Deprecated_Information">Supplemental
8362        Deprecated Information</a><br>
8363        P.3 <a name="Default_Content" href=
8364        "tr35-info.html#Default_Content" id=
8365        "Default_Content">Default Content</a></td>
8366        <td>9 <a href=
8367        "tr35-info.html#Appendix_Supplemental_Metadata">Supplemental
8368        Metadata</a><br>
8369        9.1 <a href=
8370        "tr35-info.html#Supplemental_Alias_Information">Supplemental
8371        Alias Information</a><br>
8372        9.2 <a href=
8373        "tr35-info.html#Supplemental_Deprecated_Information">Supplemental
8374        Deprecated Information</a><br>
8375        9.3 <a href="tr35-info.html#Default_Content">Default
8376        Content</a></td>
8377      </tr>
8378    </table>
8379    <table cellspacing="0" cellpadding="2" border="1" width="100%">
8380      <caption>
8381        <a href="#Part_7_Links" name="Part_7_Links" id=
8382        "Part_7_Links">Part 7 Links</a>: <a href=
8383        "tr35-keyboards.html">Keyboards</a> (keyboard mappings)
8384      </caption>
8385      <tr>
8386        <th>Old section</th>
8387        <th>Section in new part</th>
8388      </tr>
8389      <tr>
8390        <td>S <a name="Keyboards" href="#Keyboards" id=
8391        "Keyboards">Keyboards</a></td>
8392        <td>1 <a href=
8393        "tr35-keyboards.html#Keyboards">Keyboards</a></td>
8394      </tr>
8395      <tr>
8396        <td>S <a name="Goals_and_Nongoals" href=
8397        "#Goals_and_Nongoals" id="Goals_and_Nongoals">Goals and
8398        Nongoals</a></td>
8399        <td><a href="tr35-keyboards.html#Goals_and_Nongoals">Goals
8400        and Nongoals</a></td>
8401      </tr>
8402      <tr>
8403        <td>S <a name="File_and_Dir_Structure" href=
8404        "#File_and_Dir_Structure" id="File_and_Dir_Structure">File
8405        and Directory Structure</a></td>
8406        <td><a href=
8407        "tr35-keyboards.html#File_and_Dir_Structure">File and
8408        Directory Structure</a></td>
8409      </tr>
8410      <tr>
8411        <td>S <a name="Element_Heirarchy_Layout_File" href=
8412        "#Element_Heirarchy_Layout_File" id=
8413        "Element_Heirarchy_Layout_File">Element Hierarchy - Layout
8414        File</a></td>
8415        <td><a href=
8416        "tr35-keyboards.html#Element_Heirarchy_Layout_File">Element
8417        Hierarchy - Layout File</a></td>
8418      </tr>
8419      <tr>
8420        <td>S <a name="Element_Heirarchy_Platform_File" href=
8421        "#Element_Heirarchy_Platform_File" id=
8422        "Element_Heirarchy_Platform_File">Element Hierarchy -
8423        Platform File</a></td>
8424        <td><a href=
8425        "tr35-keyboards.html#Element_Heirarchy_Platform_File">Element
8426        Hierarchy - Platform File</a></td>
8427      </tr>
8428      <tr>
8429        <td>S <a name="Invariants" href="#Invariants" id=
8430        "Invariants">Invariants</a></td>
8431        <td><a href=
8432        "tr35-keyboards.html#Invariants">Invariants</a></td>
8433      </tr>
8434      <tr>
8435        <td>S <a name="Data_Sources" href="#Data_Sources" id=
8436        "Data_Sources">Data Sources</a></td>
8437        <td><a href="tr35-keyboards.html#Data_Sources">Data
8438        Sources</a></td>
8439      </tr>
8440      <tr>
8441        <td>S <a name="Keyboard_IDs" href="#Keyboard_IDs" id=
8442        "Keyboard_IDs">Keyboard IDs</a></td>
8443        <td><a href="tr35-keyboards.html#Keyboard_IDs">Keyboard
8444        IDs</a></td>
8445      </tr>
8446      <tr>
8447        <td>S <a name="Platform_Behaviors_in_Edge_Cases" href=
8448        "#Platform_Behaviors_in_Edge_Cases" id=
8449        "Platform_Behaviors_in_Edge_Cases">Platform Behaviors in
8450        Edge Cases</a></td>
8451        <td><a href=
8452        "tr35-keyboards.html#Platform_Behaviors_in_Edge_Cases">Platform
8453        Behaviors in Edge Cases</a></td>
8454      </tr>
8455      <tr>
8456        <td>S <a name="Element_Keyboard" href="#Element_Keyboard"
8457        id="Element_Keyboard">Element: keyboard</a></td>
8458        <td><a href="tr35-keyboards.html#Element_Keyboard">Element:
8459        keyboard</a></td>
8460      </tr>
8461      <tr>
8462        <td>S <a name="Element_version" href="#Element_version" id=
8463        "Element_version">Element: version</a></td>
8464        <td><a href="tr35-keyboards.html#Element_version">Element:
8465        version</a></td>
8466      </tr>
8467      <tr>
8468        <td>S <a name="Element_generation" href=
8469        "#Element_generation" id="Element_generation">Element:
8470        generation</a></td>
8471        <td><a href=
8472        "tr35-keyboards.html#Element_generation">Element:
8473        generation</a></td>
8474      </tr>
8475      <tr>
8476        <td>S <a name="Element_names" href="#Element_names" id=
8477        "Element_names">Element: names</a></td>
8478        <td><a href="tr35-keyboards.html#Element_names">Element:
8479        names</a></td>
8480      </tr>
8481      <tr>
8482        <td>S <a name="Element_name" href="#Element_name" id=
8483        "Element_name">Element: name</a></td>
8484        <td><a href="tr35-keyboards.html#Element_name">Element:
8485        name</a></td>
8486      </tr>
8487      <tr>
8488        <td>S <a name="Element_settings" href="#Element_settings"
8489        id="Element_settings">Element: settings</a></td>
8490        <td><a href="tr35-keyboards.html#Element_settings">Element:
8491        settings</a></td>
8492      </tr>
8493      <tr>
8494        <td>S <a name="Element_keyMap" href="#Element_keyMap" id=
8495        "Element_keyMap">Element: keyMap</a></td>
8496        <td><a href="tr35-keyboards.html#Element_keyMap">Element:
8497        keyMap</a></td>
8498      </tr>
8499      <tr>
8500        <td>S <a name="Element_map" href="#Element_map" id=
8501        "Element_map">Element: map</a></td>
8502        <td><a href="tr35-keyboards.html#Element_map">Element:
8503        map</a></td>
8504      </tr>
8505      <tr>
8506        <td>S <a name="Element_transforms" href=
8507        "#Element_transforms" id="Element_transforms">Element:
8508        transforms</a></td>
8509        <td><a href=
8510        "tr35-keyboards.html#Element_transforms">Element:
8511        transforms</a></td>
8512      </tr>
8513      <tr>
8514        <td>S <a name="Element_transform" href="#Element_transform"
8515        id="Element_transform">Element: transform</a></td>
8516        <td><a href=
8517        "tr35-keyboards.html#Element_transform">Element:
8518        transform</a></td>
8519      </tr>
8520      <tr>
8521        <td>S <a name="Element_platform" href="#Element_platform"
8522        id="Element_platform">Element: platform</a></td>
8523        <td><a href="tr35-keyboards.html#Element_platform">Element:
8524        platform</a></td>
8525      </tr>
8526      <tr>
8527        <td>S <a name="Element_hardwareMap" href=
8528        "#Element_hardwareMap" id="Element_hardwareMap">Element:
8529        hardwareMap</a></td>
8530        <td><a href=
8531        "tr35-keyboards.html#Element_hardwareMap">Element:
8532        hardwareMap</a></td>
8533      </tr>
8534      <tr>
8535        <td>S <a name="Principles_for_Keyboard_Ids" href=
8536        "#Principles_for_Keyboard_Ids" id=
8537        "Principles_for_Keyboard_Ids">Principles for Keyboard
8538        Ids</a></td>
8539        <td><a href=
8540        "tr35-keyboards.html#Principles_for_Keyboard_Ids">Principles
8541        for Keyboard Ids</a></td>
8542      </tr>
8543    </table>
8544    <hr>
8545
8546	  <h2><a href="#LocaleId_Canonicalization" name="LocaleId_Canonicalization">Annex C. LocaleId Canonicalization</a></h2>
8547	  <p>&nbsp;</p>
8548		  <p>The languageAlias, scriptAlias, territoryAlias, and variantAlias elements are used as rules to transform an input <em>source localeId</em>. The first step is to transform the <em>languageId</em> portion of the localeId. <br>
8549      </p>
8550		  <blockquote>Note: in the following discussion, the separator '-' is used. That is also used in examples of XML alias data, even though for compatibility reasons that alias data actually uses '_' as a separator. The processing can also be applied to syntax while maintaining the separator '_', <em>mutatis mutandis</em>. CLDR also uses &ldquo;territory&rdquo; and &ldquo;region&rdquo; interchangeably.</blockquote>
8551	  <h3 >Definitions</h3>
8552	  <h4 >1. Multimap interpretation</h4>
8553		  <p>Interpret each languageId as a multimap from a <em>fieldId</em> (language, script, region, variants) to a <strong>set</strong> of field values.</p>
8554	  <p><em>Examples:</em></p>
8555		  <a ></a><a ></a>
8556		  <table class='simple'>
8557		    <tbody>
8558		      <tr>
8559		        <td colspan="1" rowspan="2"><p> </p>
8560		          <p><strong>Source</strong></p></td>
8561		        <td colspan="4" rowspan="1"><p><strong>Fields</strong></p></td>
8562	          </tr>
8563		      <tr>
8564		        <td colspan="1" rowspan="1"><p><strong>Language</strong></p></td>
8565		        <td colspan="1" rowspan="1"><p><strong>Script</strong></p></td>
8566		        <td colspan="1" rowspan="1"><p><strong>Region</strong></p></td>
8567		        <td colspan="1" rowspan="1"><p><strong>Variants</strong></p></td>
8568	          </tr>
8569		      <tr>
8570		        <td colspan="1" rowspan="1"><p>en-GB</p></td>
8571		        <td colspan="1" rowspan="1"><p>{en}</p></td>
8572		        <td colspan="1" rowspan="1"><p>{}</p></td>
8573		        <td colspan="1" rowspan="1"><p>{GB}</p></td>
8574		        <td colspan="1" rowspan="1"><p>{}</p></td>
8575	          </tr>
8576		      <tr>
8577		        <td colspan="1" rowspan="1"><p>und-GB</p></td>
8578		        <td colspan="1" rowspan="1"><p>{}</p></td>
8579		        <td colspan="1" rowspan="1"><p>{}</p></td>
8580		        <td colspan="1" rowspan="1"><p>{GB}</p></td>
8581		        <td colspan="1" rowspan="1"><p>{}</p></td>
8582	          </tr>
8583		      <tr>
8584		        <td colspan="1" rowspan="1"><p>ja-Latn-YU-hepburn-heploc</p></td>
8585		        <td colspan="1" rowspan="1"><p>{ja}</p></td>
8586		        <td colspan="1" rowspan="1"><p>{Latn}</p></td>
8587		        <td colspan="1" rowspan="1"><p>{YU}</p></td>
8588		        <td colspan="1" rowspan="1"><p>{hepburn, heploc}</p></td>
8589	          </tr>
8590	        </tbody>
8591      </table>
8592		  <p> </p>
8593		  <ul>
8594		    <li>This can be represented as an abbreviated format: {L={ja}, S={Latn}, R={YU}, V={hepburn, heploc}}, skipping empty sets.</li>
8595		    <li>&ldquo;und&rdquo; is a special language code that is treated as an empty set.</li>
8596		    <li>Of course, only the Variants can contain more than one item: the others are either empty or contain exactly 1 item.</li>
8597      </ul>
8598	  <h4 >2. Alias elements</h4>
8599		  <p>For the languageAlias elements, the <em>type</em> and <em>replacements</em> are languageIds.</p>
8600		  <p>For the script-, territory- (aka region), and variant- Alias elements, the type and replacements are interpreted as a languageIds, <em>after</em> prefixing with &ldquo;und-&rdquo;. Thus</p>
8601		  <code>&lt;territoryAlias type="AN" replacement="CW SX BQ" reason="deprecated"/&gt;</code>
8602		  <p>is interpreted as:</p>
8603		  <code>&lt;territoryAlias type="und-AN" replacement="und-CW und-SX und-BQ" reason="deprecated"/&gt;</code>
8604		  <p>Note that for the case of territoryAlias, there may be multiple replacement values separated by spaces in the text (such as replacement="und-CW und-SX und-BQ"); other rules only ever have a single replacement value.</p>
8605		  <p> </p>
8606	  <h4 >3. Matches</h4>
8607		  <p>A rule matches a source if and only for all fields, each <em>source</em> field ⊇ <em>type</em> field.</p>
8608		  <blockquote>
8609		  <p><em>Examples:</em></p>
8610		  <p>source=&ldquo;ja-heploc-hepburn&rdquo; and type=&rdquo;und-hepburn&rdquo;</p>
8611		  <table class='simple'>
8612		    <tbody>
8613		      <tr>
8614		        <td colspan="1" rowspan="1"><p>{ja} ⊇ {} </p></td>
8615		        <td colspan="1" rowspan="1"><p>success, und = {}</p></td>
8616	          </tr>
8617		      <tr>
8618		        <td colspan="1" rowspan="1"><p>{hepburn, heploc} ⊇ {hepburn}</p></td>
8619		        <td colspan="1" rowspan="1"><p><strong>success</strong></p></td>
8620	          </tr>
8621	        </tbody>
8622		    </table>
8623		  <p>so the rule matches the source. (Note that order of variants is immaterial to matching)</p>
8624		  <p>&nbsp;</p>
8625		  <p> </p>
8626		  <p>source=&ldquo;ja-hepburn&rdquo; and type=&rdquo;und-hepburn-heploc&rdquo;</p>
8627		  <table class='simple'>
8628		    <tbody>
8629		      <tr>
8630		        <td colspan="1" rowspan="1"><p>{ja} ⊇ {} </p></td>
8631		        <td colspan="1" rowspan="1"><p>success, und = {}</p></td>
8632	          </tr>
8633		      <tr>
8634		        <td colspan="1" rowspan="1"><p>{hepburn} ⊉ {hepburn, heploc}</p></td>
8635		        <td colspan="1" rowspan="1"><p><strong>failure</strong></p></td>
8636	          </tr>
8637	        </tbody>
8638		    </table>
8639		  <p>so the rule does not match the source.</p></blockquote>
8640	  <h4 >4. Replacement</h4>
8641	  <p>A matching rule can be used to transform the source fields as follows</p>
8642		  <ul>
8643		    <li>if type.field ≠ {}
8644		      <ul>
8645		        <li>source.field = (source.field - type.field) ∪ replacement.field</li>
8646	          </ul>
8647		    </li>
8648
8649		    <li>else if source.field = {} and replacement.field ≠ {}
8650		      <ul>
8651		        <li>source.field = replacement.field</li>
8652	          </ul>
8653		    </li>
8654      </ul>
8655		  <p><em>Example:</em></p>
8656		  <blockquote><p>source=ja-Latn-fonipa-hepburn-heploc</p>
8657		  <p>rule =&rdquo;&lt;languageAlias type="und-hepburn-heploc"</p>
8658		  <p>replacement="und-alalc97"&gt;&rdquo;</p>
8659		  <p>&nbsp;</p>
8660		  <p>result=&rdquo;ja-Latn-alalc97-fonipa&rdquo; // note that CLDR canonical order of variants is alphabetical</p></blockquote>
8661	  <h5 >Territory Exception</h5>
8662		  <p>If the field = territory, and the replacement.field has more than one value, then look up the most likely territory* for the base language code (and script, if there is one). If that likely territory is in the list of replacements, use it. Otherwise, use the first territory in the list.</p>
8663		  <p><em>Example:</em></p>
8664	  <blockquote><p>source=ja-Latn-fonipa-hepburn-heploc</p>
8665	    <p>rule =&rdquo;&lt;languageAlias type="und-hepburn-heploc"</p>
8666		  <p>replacement="und-alalc97"&gt;&rdquo;</p>
8667	    <p>&nbsp;</p>
8668	    <p>result=&rdquo;ja-Latn-alalc97-fonipa&rdquo; <em>// note that CLDR canonical order of variants is alphabetical</em></p>
8669	  </blockquote>
8670	  <h4>5. Canonicalizing Syntax</h4>
8671		<p>To canonicalize the syntax of <em>source</em>: </p>
8672		<ul>
8673		  <li>Initial Script Subtag
8674		    <ul>
8675		      <li>If the first subtag has 4 letters, prepend the source with &quot;und-&quot;</li>
8676		      <li>Note: These are only for specialized use.</li>
8677	        </ul>
8678	      </li>
8679		  <li>Casing
8680		    <ul>
8681		      <li>Put any script subtag into title case (eg, Hant)</li>
8682		      <li>Put any region subtag int uppercase (eg, DE)</li>
8683		      <li>Put all other subtags into lowercase (eg, en, fonipa)</li>
8684	        </ul>
8685	      </li>
8686		  <li>Order
8687		    <ul>
8688		      <li>Put any variants into alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)</li>
8689		      <li>Put any extensions into alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)</li>
8690		      <li>Put all attributes into  alphabetical order.</li>
8691		      <li>Put all &lt;keywords, tfields&gt; pairs into alphabetical order of their keys, within their respective extensions.</li>
8692		      <li>Remove any type or tfield value of "true"</li>
8693	        </ul>
8694		  </li>
8695		  <li>Separator
8696		    <ul>
8697		      <li>Replace '_' by '-' </li>
8698	        </ul>
8699		  </li>
8700	  </ul>
8701	  <h3 >Preprocessing</h3>
8702	  <p>The data from supplementalMetadata is (logically) preprocessed as follows.</p>
8703		  <ol start="1">
8704		    <li>Load the rules from supplementalMetadata.xml, replacing '_' by '-', and adding &ldquo;und-&rdquo; as described in <em>Definition 2. Alias Elements</em>.</li>
8705		    <li>Capture all languageAlias rules where the <em>type</em> is an invalid languageId into a set of <strong>BCP47 LegacyRules</strong>. Example:
8706		      <ol>
8707		        <li>&lt;languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy"/&gt;</li>
8708	          </ol>
8709		    </li>
8710		    <li>Discard all rules where the <em>type</em> is an invalid languageId. Examples are
8711<ol>
8712          <li>&lt;languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy"/&gt;</li>
8713		        <li>&lt;territoryAlias type="und-AAA" replacement="und-AA" reason="overlong"/&gt;</li>
8714	          </ol>
8715	        </li>
8716		    <li>Change the <em>type</em> and <em>replacement</em> values in the remaining rules into multimap rules, as per <em>Definition 1. Multimap Interpretation</em>.
8717		      <ol>
8718		        <li>Note that the &ldquo;und&rdquo; value disappears.</li>
8719	          </ol>
8720		    </li>
8721
8722		    <li>Order the set of rules by
8723		      <ol>
8724		        <li>the size of the union of all field value sets, with largest size first</li>
8725		        <li>and then alphabetically by field.</li>
8726	          </ol>
8727	        </li>
8728
8729		    <li>The result is the set of <strong>Alias Rules</strong></li>
8730      </ol>
8731		  <p> </p>
8732	  <p>So using the examples above, we get the following order:</p>
8733		  <table class='simple'>
8734		    <tbody>
8735		      <tr>
8736		        <td colspan="1" rowspan="1"><p><strong>languageId</strong></p></td>
8737		        <td colspan="1" rowspan="1"><p><strong>size of union</strong></p></td>
8738		        <td colspan="1" rowspan="1"><p><strong>Alpha</strong></p></td>
8739	          </tr>
8740		      <tr>
8741		        <td colspan="1" rowspan="1"><p>{V={hepburn, heploc}}</p></td>
8742		        <td colspan="1" rowspan="1"><p>2</p></td>
8743		        <td colspan="1" rowspan="1"><p>n/a</p></td>
8744	          </tr>
8745		      <tr>
8746		        <td colspan="1" rowspan="1"><p>{L={en}, R={GB}}</p></td>
8747		        <td colspan="1" rowspan="1"><p>2</p></td>
8748		        <td colspan="1" rowspan="2"><p>en &lt; fr</p></td>
8749	          </tr>
8750		      <tr>
8751		        <td colspan="1" rowspan="1"><p>{L={fr}, R={CA}}</p></td>
8752		        <td colspan="1" rowspan="1"><p>2</p></td>
8753	          </tr>
8754		      <tr>
8755		        <td colspan="1" rowspan="1"><p>{R={CA}}</p></td>
8756		        <td colspan="1" rowspan="1"><p>1</p></td>
8757		        <td colspan="1" rowspan="1"><p>n/a</p></td>
8758	          </tr>
8759	        </tbody>
8760      </table>
8761		  <p> </p>
8762		  <blockquote><strong>Note: </strong>The secondary sort order in Preprocessing step 5.2 is only to ensure  determinant results when two rules &ldquo;of the same length&rdquo; could apply.</blockquote>
8763	  <h3 >Processing LanguageIds</h3>
8764	  <p>To canonicalize a given <em>source</em>:</p>
8765		  <ol start="1">
8766		    <li>Canonicalize the syntax of <em>source</em> as per <em>Definition 5. Canonicalizing Syntax</em>.</li>
8767            <li>Where the <em>source</em> could be an arbitrary BCP 47 language tag, first process as follows:
8768<ol>
8769          <li>If the source is identical to one of the types in the BCP47 LegacyRules, replace the entire source by the replacement value.</li>
8770		        <li>Else if there is an extlang subtag, then apply Step 3 of <a href="https://www.google.com/url?q=https://tools.ietf.org/html/bcp47%23section-4.5&amp;sa=D&amp;ust=1600829915065000&amp;usg=AOvVaw12vD5EzoVl3VFzEyrECMj-">https://tools.ietf.org/html/bcp47#section-4.5</a> to remove the extlang subtag (possibly adjusting the language subtag).
8771		          <ol>
8772		            <li>Don&rsquo;t apply any of the other canonicalization steps in that section, however.</li>
8773	              </ol>
8774	            </li>
8775		        <li>Else if the first subtag is "x", prefix by "und-".</li>
8776		        <li><strong>Note: </strong>there are currently no valid 4-letter primary language subtags. While it is extremely unlikely that BCP47 would ever register them, if so then <i>languageAlias</i> mappings will be supplied for them, mapping to defined CLDR language subtags (from the idStatus=&quot;reserved&quot; set).</li>
8777	          </ol>
8778		    </li>
8779		    <li>Find the first matching rule in <strong>Alias Rules</strong> (from <strong>Preprocessing</strong>)
8780<ol>
8781          <li>If there are none, return <em>source</em></li>
8782	          </ol>
8783	        </li>
8784		    <li>Transform <em>source</em> according to that rule</li>
8785		    <li>loop (goto #3)</li>
8786      </ol>
8787	  <h2 >Processing LocaleIds</h2>
8788	  <p>The canonicalization of localeIds is done by first canonicalizing the languageId portion, then handling extensions in the following way:</p>
8789		  <ol start="1">
8790		    <li>Replace any <em>tlang</em> languageId value by its canonicalization.</li>
8791		    <li>Use the bcp47 data to replace keys, types, tfields, and tvalues by their canonical forms. See <strong>Section 3.6.4 U Extension Data Files</strong> and <strong>Section 3.7.1 T Extension Data Files</strong>. The matches are in the alias attribute value, while the canonical replacement is in the name attribute value. For example:
8792		      <ol>
8793		        <li>Because of the following bcp47 data:<br>
8794		          <code>&lt;key name="ms"…&gt;…&lt;type name="uksystem" … alias="imperial" … /&gt;…&lt;/key&gt;</code></li>
8795		        <li>We get the following transformation:<br>
8796		          <code>en-u-ms-imperial ⇒ en-u-ms-uksystem</code></li>
8797	          </ol>
8798		    </li>
8799
8800		    <li>If there is an 'sd' or 'rg' key, replace any subdivision alias in its value in the same way, using subdivisionAlias data.</li>
8801      </ol>
8802	  <h2 >Optimizations</h2>
8803		  <p>The above algorithm is a logical statement of the process, but would obviously not be directly suited to production code. Production-level code can use many optimizations for efficiency while achieving the same result. For example, the Alias Rules can be further preprocessed to avoid indefinite looping, instead doing a rule lookup once per subtag. As another example, the small number of <strong>Territory Exceptions</strong> can be preprocessed to avoid the likely subtags processing.</p>
8804	    <p>&nbsp;</p>
8805
8806	  <hr>
8807    <h2><a name="References" href="#References" id=
8808    "References">References</a></h2>
8809    <table cellpadding="4" cellspacing="0" class="noborder" border=
8810    "0">
8811      <tr>
8812        <th class="noborder" width="148">Ancillary Information</th>
8813        <td class="noborder" width="730"><i>To properly localize,
8814        parse, and format data requires ancillary information,
8815        which is not expressed in Locale Data Markup Language. Some
8816        of the formats for values used in Locale Data Markup
8817        Language are constructed according to external
8818        specifications. The sources for this data and/or formats
8819        include the following:<br>
8820        &nbsp;</i></td>
8821      </tr>
8822      <tr>
8823        <td class="noborder" width="148">[<a name="Bugs" href=
8824        "#Bugs" id="Bugs">Bugs</a>]</td>
8825        <td class="noborder" width="730">CLDR Bug Reporting
8826        form<br>
8827        <a href=
8828        "http://cldr.unicode.org/index/bug-reports">http://cldr.unicode.org/index/bug-reports</a></td>
8829      </tr>
8830      <tr>
8831        <td class="noborder" width="148">[<a name="Charts" href=
8832        "#Charts" id="Charts">Charts</a>]</td>
8833        <td class="noborder" width="730">The online code charts can
8834        be found at <a href=
8835        "https://unicode.org/charts/">https://unicode.org/charts/</a>
8836        An index to character names with links to the corresponding
8837        chart is found at <a href=
8838        "https://unicode.org/charts/charindex.html">https://unicode.org/charts/charindex.html</a></td>
8839      </tr>
8840      <tr>
8841        <td class="noborder" width="148">[<a name="DUCET" href=
8842        "#DUCET" id="DUCET">DUCET</a>]</td>
8843        <td class="noborder" width="730">The Default Unicode
8844        Collation Element Table (DUCET)<br>
8845        For the base-level collation, of which all the collation
8846        tables in this document are tailorings.<br>
8847        <a href=
8848        "https://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table">
8849        https://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table</a></td>
8850      </tr>
8851      <tr>
8852        <td class="noborder" width="148">[<a name="FAQ" href="#FAQ"
8853        id="FAQ">FAQ</a>]</td>
8854        <td class="noborder" valign="top" width="730">Unicode
8855        Frequently Asked Questions<br>
8856        <a href=
8857        "https://unicode.org/faq/">https://unicode.org/faq/<br></a>
8858        <i>For answers to common questions on technical
8859        issues.</i></td>
8860      </tr>
8861      <tr>
8862        <td class="noborder" width="148">[<a name="FCD" href="#FCD"
8863        id="FCD">FCD</a>]</td>
8864        <td class="noborder" width="730">As defined in UTN #5
8865        Canonical Equivalences in Applications<br>
8866        <a href=
8867        "https://unicode.org/notes/tn5/">https://unicode.org/notes/tn5/</a></td>
8868      </tr>
8869      <tr>
8870        <td class="noborder" width="148">[<a name="Glossary" href=
8871        "#Glossary" id="Glossary">Glossary</a>]</td>
8872        <td class="noborder" width="730">Unicode Glossary<a href=
8873        "https://unicode.org/glossary/"><br>
8874        https://unicode.org/glossary/<br></a> <i>For explanations of
8875        terminology used in this and other documents.</i></td>
8876      </tr>
8877      <tr>
8878        <td class="noborder" width="148">[<a name="JavaChoice"
8879        href="#JavaChoice" id="JavaChoice">JavaChoice</a>]</td>
8880        <td class="noborder" width="730">Java ChoiceFormat<br>
8881        <a href=
8882        "https://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html">
8883        https://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html</a></td>
8884      </tr>
8885      <tr>
8886        <td class="noborder" width="148">[<a name="Olson" href=
8887        "#Olson" id="Olson">Olson</a>]</td>
8888        <td class="noborder" width="730">The <i>TZ</i>ID Database
8889        (aka Olson timezone database)<br>
8890        Time zone and daylight savings information.<br>
8891        <a href=
8892        "https://www.iana.org/time-zones">https://www.iana.org/time-zones</a><br>
8893
8894        For archived data, see&nbsp;<br>
8895        <a href=
8896        "ftp://ftp.iana.org/tz/releases/">ftp://ftp.iana.org/tz/releases/</a></td>
8897      </tr>
8898      <tr>
8899        <td class="noborder" width="148">[<a name="Reports" href=
8900        "#Reports" id="Reports">Reports</a>]</td>
8901        <td class="noborder" width="730">Unicode Technical
8902        Reports<br>
8903        <a href=
8904        "https://unicode.org/reports/">https://unicode.org/reports/<br>
8905        </a> <i>For information on the status and development
8906        process for technical reports, and for a list of technical
8907        reports.</i></td>
8908      </tr>
8909      <tr>
8910        <td class="noborder" width="148">[<a name="Unicode" href=
8911        "#Unicode" id="Unicode">Unicode</a>]</td>
8912        <td class="noborder" width="730">The Unicode Consortium, <i>The Unicode Standard, Version 13.0.0</i><br>
8913        (Mountain View, CA: The Unicode Consortium, 2020. ISBN 978-1-936213-26-9)<br>
8914        <a href="https://www.unicode.org/versions/Unicode13.0.0/">https://www.unicode.org/versions/Unicode13.0.0/</a>
8915      </td>
8916      </tr>
8917      <tr>
8918        <td class="noborder" width="148">[<a name="Versions" href=
8919        "#Versions" id="Versions">Versions</a>]</td>
8920        <td class="noborder" width="730">Versions of the Unicode
8921        Standard<br>
8922        <a href=
8923        "https://www.unicode.org/versions/">https://www.unicode.org/versions/</a><br>
8924
8925        <i>For information on version numbering, and citing and
8926        referencing the Unicode Standard, the Unicode Character
8927        Database, and Unicode Technical Reports.</i></td>
8928      </tr>
8929      <tr>
8930        <td class="noborder" width="148">[<a name="XPath" href=
8931        "#XPath" id="XPath">XPath</a>]</td>
8932        <td class="noborder" width="730"><a href=
8933        "https://www.w3.org/TR/xpath/">https://www.w3.org/TR/xpath/</a></td>
8934      </tr>
8935      <tr>
8936        <th class="noborder" width="148">Other Standards</th>
8937        <td class="noborder" width="730"><i>Various standards
8938        define codes that are used as keys or values in Locale Data
8939        Markup Language. These include:</i></td>
8940      </tr>
8941      <tr>
8942        <td class="noborder">[<a name="BCP47" href="#BCP47" id=
8943        "BCP47">BCP47</a>]</td>
8944        <td class="noborder">
8945          <a href=
8946          "https://www.rfc-editor.org/rfc/bcp/bcp47.txt">https://www.rfc-editor.org/rfc/bcp/bcp47.txt</a>
8947          <p>The Registry<br>
8948          <a href=
8949          "https://www.iana.org/assignments/language-subtag-registry">
8950          https://www.iana.org/assignments/language-subtag-registry</a></p>
8951        </td>
8952      </tr>
8953      <tr>
8954        <td class="noborder" width="148">[<a name="ISO639" href=
8955        "#ISO639" id="ISO639">ISO639</a>]</td>
8956        <td class="noborder" width="730">ISO Language Codes<br>
8957        <a href=
8958        "https://www.loc.gov/standards/iso639-2/">https://www.loc.gov/standards/iso639-2/</a><br>
8959
8960        Actual List<br>
8961        <a href=
8962        "https://www.loc.gov/standards/iso639-2/langcodes.html">https://www.loc.gov/standards/iso639-2/langcodes.html</a></td>
8963      </tr>
8964      <tr>
8965        <td class="noborder" width="148">[<a name="ISO1000" href=
8966        "#ISO1000" id="ISO1000">ISO1000</a>]</td>
8967        <td class="noborder" width="730">ISO 1000: SI units and
8968        recommendations for the use of their multiples and of
8969        certain other units, International Organization for
8970        Standardization, 1992.<br>
8971        <a href=
8972        "https://www.iso.org/iso/catalogue_detail?csnumber=5448">https://www.iso.org/iso/catalogue_detail?csnumber=5448</a></td>
8973      </tr>
8974      <tr>
8975        <td class="noborder" width="148">[<a name="ISO3166" href=
8976        "#ISO3166" id="ISO3166">ISO3166</a>]</td>
8977        <td class="noborder" width="730">ISO Region Codes<br>
8978        <a href=
8979        "https://www.iso.org/iso/country_codes">https://www.iso.org/iso/country_codes</a><br>
8980
8981        Actual List<br>
8982        <a href=
8983        "https://www.iso.org/obp/ui/#search">https://www.iso.org/obp/ui/#search</a></td>
8984      </tr>
8985      <tr>
8986        <td class="noborder" width="148">[<a name="ISO4217" href=
8987        "#ISO4217" id="ISO4217">ISO4217</a>]</td>
8988        <td class="noborder" width="730">
8989          ISO Currency Codes<br>
8990          <a href=
8991          "https://www.iso.org/iso/home/standards/currency_codes.htm">
8992          https://www.iso.org/iso/home/standards/currency_codes.htm</a>
8993          <p><i>(Note that as of this point, there are significant
8994          problems with this list. The supplemental data file
8995          contains the best compendium of currency information
8996          available.)</i></p>
8997        </td>
8998      </tr>
8999      <tr>
9000        <td class="noborder" width="148">[<a name="ISO8601" href=
9001        "#ISO8601" id="ISO8601">ISO8601</a>]</td>
9002        <td class="noborder" width="730">ISO Date and Time
9003        Format<br>
9004        <a href=
9005        "https://www.iso.org/iso/iso8601">https://www.iso.org/iso/iso8601</a></td>
9006      </tr>
9007      <tr>
9008        <td class="noborder" width="148">[<a name="ISO15924" href=
9009        "#ISO15924" id="ISO15924">ISO15924</a>]</td>
9010        <td class="noborder" width="730">ISO Script Codes<br>
9011        <a href=
9012        "https://www.unicode.org/iso15924/index.html">https://www.unicode.org/iso15924/index.html</a><br>
9013
9014        Actual List<br>
9015        <a href=
9016        "https://www.unicode.org/iso15924/codelists.html">https://www.unicode.org/iso15924/codelists.html</a></td>
9017      </tr>
9018      <tr>
9019        <td class="noborder" width="148">[<a name="LOCODE" href=
9020        "#LOCODE" id="LOCODE">LOCODE</a>]</td>
9021        <td class="noborder" width="730">United Nations Code for
9022        Trade and Transport Locations, commonly known as
9023        "UN/LOCODE"<br>
9024        <a href=
9025        "https://www.unece.org/cefact/locode/welcome.html">https://www.unece.org/cefact/locode/welcome.html</a><br>
9026
9027        Download at:&nbsp;<a href=
9028        "https://www.unece.org/cefact/codesfortrade/codes_index.htm">&nbsp;https://www.unece.org/cefact/codesfortrade/codes_index.htm</a></td>
9029      </tr>
9030      <tr>
9031        <td class="noborder" width="148">[<a name="RFC6067" href=
9032        "#RFC6067" id="RFC6067">RFC6067</a>]</td>
9033        <td class="noborder" width="730">BCP 47 Extension U<br>
9034        <a href=
9035        "https://www.ietf.org/rfc/rfc6067.txt">https://www.ietf.org/rfc/rfc6067.txt</a></td>
9036      </tr>
9037      <tr>
9038        <td class="noborder" width="148">[<a name="RFC6497" href=
9039        "#RFC6497" id="RFC6497">RFC6497</a>]</td>
9040        <td class="noborder" width="730">BCP 47 Extension T -
9041        Transformed Content<br>
9042        <a href=
9043        "https://www.ietf.org/rfc/rfc6497.txt">https://www.ietf.org/rfc/rfc6497.txt</a></td>
9044      </tr>
9045      <tr>
9046        <td class="noborder" width="148">[<a name="UNM49" href=
9047        "#UNM49" id="UNM49">UNM49</a>]</td>
9048        <td class="noborder" width="730">
9049          UN M.49: UN Statistics Division
9050          <p>Country or area &amp; region codes<br>
9051          <a href=
9052          "https://unstats.un.org/unsd/methods/m49/m49.htm">https://unstats.un.org/unsd/methods/m49/m49.htm</a></p>
9053          <p>Composition of macro geographical (continental)
9054          regions, geographical sub-regions, and selected economic
9055          and other groupings<br>
9056          <a href=
9057          "https://unstats.un.org/unsd/methods/m49/m49regin.htm">https://unstats.un.org/unsd/methods/m49/m49regin.htm</a></p>
9058        </td>
9059      </tr>
9060      <tr>
9061        <td class="noborder" width="148">[<a name="XMLSchema" href=
9062        "#XMLSchema" id="XMLSchema">XML Schema</a>]</td>
9063        <td class="noborder" width="730">W3C XML Schema<br>
9064        <a href=
9065        "https://www.w3.org/XML/Schema">https://www.w3.org/XML/Schema</a></td>
9066      </tr>
9067      <tr>
9068        <th class="noborder" width="148">General</th>
9069        <td class="noborder" width="730"><i>The following are
9070        general references from the text:</i></td>
9071      </tr>
9072      <tr>
9073        <td class="noborder" width="148">[<a name="ByType" href=
9074        "#ByType" id="ByType">ByType</a>]</td>
9075        <td class="noborder" width="730">CLDR Comparison Charts<br>
9076        <a href=
9077        "https://www.unicode.org/cldr/comparison_charts.html">https://www.unicode.org/cldr/comparison_charts.html</a></td>
9078      </tr>
9079      <tr>
9080        <td class="noborder" width="148">[<a name="Calendars" href=
9081        "#Calendars" id="Calendars">Calendars</a>]</td>
9082        <td class="noborder" width="730">Calendrical Calculations:
9083        The Millennium Edition by Edward M. Reingold, Nachum
9084        Dershowitz; Cambridge University Press; Book and CD-ROM
9085        edition (July 1, 2001); ISBN: 0521777526. Note that the
9086        algorithms given in this book are copyrighted.</td>
9087      </tr>
9088      <tr>
9089        <td class="noborder" width="148">[<a name="Comparisons"
9090        href="#Comparisons" id="Comparisons">Comparisons</a>]</td>
9091        <td class="noborder" width="730">Comparisons between locale
9092        data from different sources<br>
9093        <a href=
9094        "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/dtd_deltas.html">https://unicode-org.github.io/cldr-staging/charts/38/supplemental/dtd_deltas.html</a></td>
9095      </tr>
9096      <tr>
9097        <td class="noborder" width="148">[<a name="CurrencyInfo"
9098        href="#CurrencyInfo" id=
9099        "CurrencyInfo">CurrencyInfo</a>]</td>
9100        <td class="noborder" width="730">UNECE Currency Data<br>
9101        <a href=
9102        "https://www.currency-iso.org/en/home/tables.html">https://www.currency-iso.org/en/home/tables.html</a></td>
9103      </tr>
9104      <tr>
9105        <td class="noborder" width="148">[<a name="DataFormats"
9106        href="#DataFormats" id="DataFormats">DataFormats</a>]</td>
9107        <td class="noborder" width="730">CLDR Translation
9108        Guidelines<br>
9109        <a href=
9110        "http://cldr.unicode.org/translation">http://cldr.unicode.org/translation</a></td>
9111      </tr>
9112      <tr>
9113        <td class="noborder" width="148">[<a name="LDML" href=
9114        "#LDML" id="LDML">Example</a>]</td>
9115        <td class="noborder" width="730">A sample in Locale Data
9116        Markup Language<br>
9117        <a href=
9118        "https://unicode.org/cldr/dtd/1.1/ldml-example.xml">https://unicode.org/cldr/dtd/1.1/ldml-example.xml</a></td>
9119      </tr>
9120      <tr>
9121        <td class="noborder" width="148">[<a name="ICUCollation"
9122        href="#ICUCollation" id=
9123        "ICUCollation">ICUCollation</a>]</td>
9124        <td class="noborder" width="730">ICU rule syntax<br>
9125        <a href=
9126        "https://unicode-org.github.io/icu/userguide/collation/customization/">
9127        https://unicode-org.github.io/icu/userguide/collation/customization/</a></td>
9128      </tr>
9129      <tr>
9130        <td class="noborder" width="148">[<a name="ICUTransforms"
9131        href="#ICUTransforms" id=
9132        "ICUTransforms">ICUTransforms</a>]</td>
9133        <td class="noborder" width="730">Transforms<br>
9134        <a href=
9135        "https://unicode-org.github.io/icu/userguide/transforms/">
9136        https://unicode-org.github.io/icu/userguide/transforms/</a><br>
9137
9138        Transforms Demo<br>
9139        <a href=
9140        "http://demo.icu-project.org/icu-bin/translit/">http://demo.icu-project.org/icu-bin/translit/</a></td>
9141      </tr>
9142      <tr>
9143        <td class="noborder" width="148">[<a name="ICUUnicodeSet"
9144        href="#ICUUnicodeSet" id=
9145        "ICUUnicodeSet">ICUUnicodeSet</a>]</td>
9146        <td class="noborder" width="730">ICU UnicodeSet<br>
9147        <a href=
9148        "https://unicode-org.github.io/icu/userguide/strings/unicodeset.html">https://unicode-org.github.io/icu/userguide/strings/unicodeset.html<br>
9149        </a> API<br>
9150        <a href=
9151        "https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html">
9152        https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html</a></td>
9153      </tr>
9154      <tr>
9155        <td class="noborder" width="148">[<a name="ITUE164" href=
9156        "#ITUE164" id="ITUE164">ITUE164</a>]</td>
9157        <td class="noborder" width="730">International
9158        Telecommunication Union: List Of ITU Recommendation E.164
9159        Assigned Country Codes<br>
9160        available at <a href=
9161        "https://www.itu.int/opb/publications.aspx?parent=T-SP&amp;view=T-SP2">
9162        https://www.itu.int/opb/publications.aspx?parent=T-SP&amp;view=T-SP2</a></td>
9163      </tr>
9164      <tr>
9165        <td class="noborder" width="148">[<a name="LocaleExplorer"
9166        href="#LocaleExplorer" id=
9167        "LocaleExplorer">LocaleExplorer</a>]</td>
9168        <td class="noborder" width="730">ICU Locale Explorer<br>
9169        <a href=
9170        "http://demo.icu-project.org/icu-bin/locexp">http://demo.icu-project.org/icu-bin/locexp</a></td>
9171      </tr>
9172      <tr>
9173        <td class="noborder" width="148">[<a name="localeProject"
9174        href="#localeProject" id=
9175        "localeProject">LocaleProject</a>]</td>
9176        <td class="noborder" width="730">Common Locale Data
9177        Repository Project<br>
9178        <a href=
9179        "https://unicode.org/cldr/">https://unicode.org/cldr/</a></td>
9180      </tr>
9181      <tr>
9182        <td class="noborder" width="148">[<a name="NamingGuideline"
9183        href="#NamingGuideline" id=
9184        "NamingGuideline">NamingGuideline</a>]</td>
9185        <td class="noborder" width="730">OpenI18N Locale Naming
9186        Guideline<br>
9187        formerly at
9188        https://www.openi18n.org/docs/text/LocNameGuide-V10.txt</td>
9189      </tr>
9190      <tr>
9191        <td class="noborder" width="148">[<a name="RBNF" href=
9192        "#RBNF" id="RBNF">RBNF</a>]</td>
9193        <td class="noborder" width="730">Rule-Based Number
9194        Format<br>
9195        <a href=
9196        "https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html">
9197        https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html</a></td>
9198      </tr>
9199      <tr>
9200        <td class="noborder" width="148">[<a name="RBBI" href=
9201        "#RBBI" id="RBBI">RBBI</a>]</td>
9202        <td class="noborder" width="730">Rule-Based Break
9203        Iterator<br>
9204        <a href=
9205        "https://unicode-org.github.io/icu/userguide/boundaryanalysis">
9206        https://unicode-org.github.io/icu/userguide/boundaryanalysis</a></td>
9207      </tr>
9208      <tr>
9209        <td class="noborder" width="148">[<a name="UCAChart" href=
9210        "#UCAChart" id="UCAChart">UCAChart</a>]</td>
9211        <td class="noborder" width="730">Collation Chart<a href=
9212        "https://unicode.org/charts/collation/"><br>
9213        https://unicode.org/charts/collation/</a></td>
9214      </tr>
9215      <tr>
9216        <td class="noborder" width="148">[<a name="UTCInfo" href=
9217        "#UTCInfo" id="UTCInfo">UTCInfo</a>]</td>
9218        <td class="noborder" width="730">NIST Time and Frequency
9219        Division Home Page<br>
9220        <a href="https://tf.nist.gov/">https://tf.nist.gov/<br></a>
9221        U.S. Naval Observatory: What is Universal Time?<br>
9222        <a href=
9223        "https://www.usno.navy.mil/USNO/time/master-clock/systems-of-time">https://www.usno.navy.mil/USNO/time/master-clock/systems-of-time</a></td>
9224      </tr>
9225      <tr>
9226        <td class="noborder" width="148">[<a name="WindowsCulture"
9227        href="#WindowsCulture" id=
9228        "WindowsCulture">WindowsCulture</a>]</td>
9229        <td class="noborder" width="730">Windows Culture Info
9230        (with&nbsp; mappings from [<a href=
9231        "#BCP47">BCP47</a>]-style codes to LCIDs)<br>
9232        <a href=
9233        "https://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo(vs.71).aspx">
9234        http://msdn2.microsoft.com/en-us/library/system.globalization.cultureinfo(vs.71).aspx</a></td>
9235      </tr>
9236    </table>
9237    <h2><a name="Acknowledgments" href="#Acknowledgments" id=
9238    "Acknowledgments">Acknowledgments</a></h2>
9239    <p>Special thanks to the following people for their continuing
9240    overall contributions to the CLDR project, and for their
9241    specific contributions in the following areas. These
9242    descriptions only touch on the many contributions that they
9243    have made.</p>
9244    <ul>
9245      <li>Mark
9246      Davis for creating the initial version of LDML, and
9247      adding to and maintaining this specification, and for his
9248      work on the LDML code and tests, much of the supplemental
9249      data and overall structure, and transforms and
9250      keyboards.</li>
9251      <li>John Emmons for the POSIX conversion tool and
9252      metazones.</li>
9253      <li>Deborah Goldsmith for her contributions to LDML
9254      architecture and this specification.</li>
9255      <li>Chris Hansten for coordinating and managing data
9256      submissions and vetting.</li>
9257      <li>Erkki Kolehmainen and his team for their work on
9258      Finnish.</li>
9259      <li>Steven R. Loomis for development of the survey tool and
9260      database management.</li>
9261      <li>Peter Nugent for his contributions to the POSIX tool and
9262      from Open Office, and for coordinating and managing data
9263      submissions and vetting.</li>
9264      <li>George Rhoten for his work on currencies.</li>
9265      <li>Roozbeh Pournader (روزبه پورنادر) for his work on South
9266      Asian countries.</li>
9267      <li>Ram Viswanadha (రఘురామ్ విశ్వనాధ) for all of his work on
9268      LDML code and data integration, and for coordinating and
9269      managing data submissions and vetting.</li>
9270      <li>Vladimir Weinstein (Владимир Вајнштајн) for his work on
9271      collation.</li>
9272      <li>Yoshito Umaoka (馬岡 由人) for his work on the timezone
9273      architecture.</li>
9274      <li>Rick McGowan for his work gathering language, script and
9275      region data.</li>
9276      <li>Xiaomei Ji (吉晓梅) for her work on time intervals and
9277      plural formatting.</li>
9278      <li>David Bertoni for his contributions to the conversion
9279      tools.</li>
9280      <li>Mike Tardif for reviewing this specification and for
9281      coordinating and vetting data submissions.</li>
9282      <li>Peter Edberg for work on this specification,
9283      monthPatterns, cyclicNameSets, contextTransforms and other
9284      items.</li>
9285      <li>Raymond Wainman and Cibu Johny for their work on
9286      keyboards.</li>
9287      <li>Jennifer Chye for her contributions to the conversion
9288      tools.</li>
9289      <li>Markus Scherer for a major rewrite of Part 5, Collation.</li>
9290      <li><a href="https://www.sffc.xyz/">Shane Carr</a> for his work on numbers and measurement units.</li>
9291      <li>Robin Leroy for his work on compact plurals: Part 3, Section 5, <a href="tr35-numbers.html#Language_Plural_Rules">Language Plural
9292      Rules</a></li>
9293    </ul>
9294    <p>Other contributors to CLDR are listed on the <a href=
9295    "https://www.unicode.org/cldr/">CLDR Project Page</a>.</p>
9296
9297
9298    <h2><a name="Modifications" href="#Modifications" id=
9299    "Modifications">Modifications</a></h2>
9300
9301    <p><b>Revision 61</b></p>
9302	<ul>
9303	  <li><b>Reissued</b> for CLDR 38.</li>
9304
9305	  <li><strong>Part 1: <a href="tr35.html#Contents">Core</a> (languages, locales, basic structure)</strong>
9306        <ul>
9307          <li><strong>Section 3.2.1 <a href="#Canonical_Unicode_Locale_Identifiers">Canonical Unicode Locale Identifiers</a></strong>: replaced text by a reference to <strong>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></strong>
9308          <li><strong>Section 3.3.1 <a  href=
9309    "#BCP_47_Language_Tag_Conversion" >BCP 47 Language Tag
9310    Conversion</a>:</strong> replaced text by a reference to <strong>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></strong></li>
9311          <li><strong>Section 3.6.1 <a href="#Key_And_Type_Definitions_" >Key And Type Definitions</a></strong>:
9312          added new key “dx”, for <a href="#UnicodeDictionaryBreakExclusionIdentifier" >Unicode Dictionary Break Exclusion Identifier</a>.</li>
9313          <li><strong>Section 3.6.4 <a href="#Unicode_Locale_Extension_Data_Files" >U Extension Data Files</a></strong>:
9314          added description of <a href="#SCRIPT_CODE" >SCRIPT_CODE</a> value for key “dx”.</li>
9315          <li><strong>Section 4.1.2 <a  href="#Lateral_Inheritance">Lateral Inheritance</a>: </strong>specified lateral inheritance in more detail, added case and gender.</li>
9316          <li><strong>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></strong>
9317            <ul>
9318              <li>Added new Annex, replacing text in <strong>Section 3.2.1 <a href="#Canonical_Unicode_Locale_Identifiers">Canonical Unicode Locale Identifiers</a></strong> and <strong>Section 3.3.1 <a  href=
9319    "#BCP_47_Language_Tag_Conversion" >BCP 47 Language Tag
9320    Conversion</a></strong></li>
9321              <li>Cleans up ambiguities in the previous specification of canonicalization. (This was done in concert with fixes to the alias data to work better with the specification.)</li>
9322            </ul>
9323          </li>
9324        </ul>
9325	  </li>
9326	  <li><strong>Part 2: <a href="tr35-general.html#Contents">General</a> (display names &amp;transforms, etc.)</strong>
9327        <ul>
9328          <li><strong>Section 6 <a href="tr35-general.html#Unit_Elements">Unit Elements</a></strong>
9329		    <ul>
9330		      <li>Added new element compoundUnitPattern1</li>
9331		      <li>Added case attribute to compoundUnitPattern</li>
9332		      <li>Provided full description of compound unit components</li>
9333		    </ul>
9334          </li>
9335
9336          <li><strong>Section 14.2 <a href="tr35-general.html#Character_Labels">Annotations Character Labels</a></strong>
9337		    <ul>
9338		      <li>Added new characterLabelPattern type attribute values subscript and superscript.</li>
9339		    </ul>
9340          </li>
9341
9342          <li><strong>Section 16 <a href="tr35-general.html#Grammatical_Derivations">Grammatical Derivations</a></strong> — new</li>
9343        </ul>
9344	  </li>
9345	  <li><strong>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a> (number &amp; currency formatting)</strong>
9346        <ul>
9347	      <li><strong>Section 2.3 <a href="tr35-numbers.html#Number_Symbols">Number Symbols</a>:</strong>
9348	        added approximatelySign.</li>
9349	      <li><strong>Section 2.6 <a href="tr35-numbers.html#Minimal_Pairs">Minimal Pairs</a>:</strong> added case and
9350	        gender minimal pairs. Removed the alt/draft ATTLIST since those are documented elsewhere and just obfuscate
9351	        the text.</li>
9352	      <li><strong>Section 5 <a href="tr35-numbers.html#Language_Plural_Rules">Language Plural Rules</a>:</strong>
9353	        added the 'e' operand for use in certain compact number formatting.</li>
9354        </ul>
9355	  </li>
9356      <li><strong>Part 6: <a href="tr35-info.html#Contents">Supplemental</a> (supplemental data)</strong>
9357	    <ul>
9358	      <li><strong>Section 14 <a href="tr35-info.html#Unit_Preferences">Unit Preferences</a></strong>: defined the
9359	        userPreferences skeleton more precisely.</li>
9360        </ul>
9361	  </li>
9362      <li><strong>Throughout: </strong>Where possible, use “legacy” (for language tag or unit) instead of “grandfathered”.</li>
9363 </ul>
9364
9365
9366	      <p>&nbsp;</p>
9367
9368       <p>Modifications in previous versions are listed in those
9369    respective versions. Click on <strong>Previous Version</strong>
9370    in the header until you get to the desired version.</p>
9371    <hr>
9372    <p class="copyright">Copyright © 2001–2020 Unicode, Inc. All
9373    Rights Reserved. The Unicode Consortium makes no expressed or
9374    implied warranty of any kind, and assumes no liability for
9375    errors or omissions. No liability is assumed for incidental and
9376    consequential damages in connection with or arising out of the
9377    use of the information or programs contained or accompanying
9378    this technical report. The Unicode <a href=
9379    "https://unicode.org/copyright.html">Terms of Use</a> apply.</p>
9380    <p class="copyright">Unicode and the Unicode logo are
9381    trademarks of Unicode, Inc., and are registered in some
9382    jurisdictions.</p>
9383  </div>
9384</body>
9385</html>
9386