• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2"https://www.w3.org/TR/html4/loose.dtd">
3<html>
4<head>
5  <meta name="generator" content=
6  "HTML Tidy for HTML5 for Apple macOS version 5.6.0">
7  <meta http-equiv="Content-Type" content=
8  "text/html; charset=utf-8">
9  <meta http-equiv="Content-Language" content="en-us">
10  <link rel="stylesheet" href=
11  "../reports.css" type="text/css">
12  <title>UTS #35: Unicode LDML: Supplemental</title>
13  <style type="text/css">
14  <!--
15  .dtd {
16        font-family: monospace;
17        font-size: 90%;
18        background-color: #CCCCFF;
19        border-style: dotted;
20        border-width: 1px;
21  }
22
23  .xmlExample {
24        font-family: monospace;
25        font-size: 80%
26  }
27
28  .blockedInherited {
29        font-style: italic;
30        font-weight: bold;
31        border-style: dashed;
32        border-width: 1px;
33        background-color: #FF0000
34  }
35
36  .inherited {
37        font-weight: bold;
38        border-style: dashed;
39        border-width: 1px;
40        background-color: #00FF00
41  }
42
43  .element {
44        font-weight: bold;
45        color: red;
46  }
47
48  .attribute {
49        font-weight: bold;
50        color: maroon;
51  }
52
53  .attributeValue {
54        font-weight: bold;
55        color: blue;
56  }
57
58  li, p {
59        margin-top: 0.5em;
60        margin-bottom: 0.5em
61  }
62
63  h2, h3, h4, table {
64        margin-top: 1.5em;
65        margin-bottom: 0.5em;
66  }
67  -->
68  </style>
69</head>
70<body>
71  <table class="header" width="100%">
72    <tr>
73      <td class="icon"><a href="https://unicode.org"><img alt=
74      "[Unicode]" src="../logo60s2.gif"
75      width="34" height="33" style=
76      "vertical-align: middle; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px; border-top-width: 0px;"></a>&nbsp;
77      <a class="bar" href=
78      "https://www.unicode.org/reports/">Technical Reports</a></td>
79    </tr>
80    <tr>
81      <td class="gray">&nbsp;</td>
82    </tr>
83  </table>
84  <div class="body">
85    <h2 style="text-align: center">Unicode Technical Standard #35</h2>
86    <h1>Unicode Locale Data Markup Language (LDML)<br>
87    Part 6: Supplemental</h1>
88    <!-- At least the first row of this header table should be identical across the parts of this UTS. -->
89    <table border="1" cellpadding="2" cellspacing="0" class="wide">
90      <tr>
91        <td>Version</td>
92        <td>38</td>
93      </tr>
94      <tr>
95        <td>Editors</td>
96        <td>Steven Loomis (<a href=
97        "mailto:srl@icu-project.org">srl@icu-project.org</a>) and
98        <a href="tr35.html#Acknowledgments">other CLDR committee
99        members</a></td>
100      </tr>
101    </table>
102    <p>For the full header, summary, and status, see <a href=
103    "tr35.html">Part 1: Core</a></p>
104    <h3><i>Summary</i></h3>
105    <p>This document describes parts of an XML format
106    (<i>vocabulary</i>) for the exchange of structured locale data.
107    This format is used in the <a href=
108    "https://unicode.org/cldr/">Unicode Common Locale Data
109    Repository</a>.</p>
110    <p>This is a partial document, describing only those parts of
111    the LDML that are relevant for supplemental data. For the other
112    parts of the LDML see the <a href="tr35.html">main LDML
113    document</a> and the links above.</p>
114    <h3><i>Status</i></h3>
115
116    <!-- NOT YET APPROVED
117                <p>
118                                <i class="changed">This is a<b><font color="#ff3333">
119                                draft </font></b>document which may be updated, replaced, or superseded by
120                                other documents at any time. Publication does not imply endorsement
121                                by the Unicode Consortium. This is not a stable document; it is
122                                inappropriate to cite this document as other than a work in
123                                progress.
124                        </i>
125                </p>
126     END NOT YET APPROVED -->
127    <!-- APPROVED -->
128    <p><i>This document has been reviewed by Unicode members and
129    other interested parties, and has been approved for publication
130    by the Unicode Consortium. This is a stable document and may be
131    used as reference material or cited as a normative reference by
132    other specifications.</i></p>
133    <!-- END APPROVED -->
134
135    <blockquote>
136      <p><i><b>A Unicode Technical Standard (UTS)</b> is an
137      independent specification. Conformance to the Unicode
138      Standard does not imply conformance to any UTS.</i></p>
139    </blockquote>
140    <p><i>Please submit corrigenda and other comments with the CLDR
141    bug reporting form [<a href="tr35.html#Bugs">Bugs</a>]. Related
142    information that is useful in understanding this document is
143    found in the <a href="tr35.html#References">References</a>. For
144    the latest version of the Unicode Standard see [<a href=
145    "tr35.html#Unicode">Unicode</a>]. For a list of current Unicode
146    Technical Reports see [<a href=
147    "tr35.html#Reports">Reports</a>]. For more information about
148    versions of the Unicode Standard, see [<a href=
149    "tr35.html#Versions">Versions</a>].</i></p>
150    <!-- This section of Parts should be identical in all of the parts of this UTS. -->
151    <h2><a name="Parts" href="#Parts" id="Parts">Parts</a></h2>
152    <p>The LDML specification is divided into the following
153    parts:</p>
154    <ul class="toc">
155      <li>Part 1: <a href="tr35.html#Contents">Core</a> (languages,
156      locales, basic structure)</li>
157      <li>Part 2: <a href="tr35-general.html#Contents">General</a>
158      (display names &amp; transforms, etc.)</li>
159      <li>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a>
160      (number &amp; currency formatting)</li>
161      <li>Part 4: <a href="tr35-dates.html#Contents">Dates</a>
162      (date, time, time zone formatting)</li>
163      <li>Part 5: <a href=
164      "tr35-collation.html#Contents">Collation</a> (sorting,
165      searching, grouping)</li>
166      <li>Part 6: <a href=
167      "tr35-info.html#Contents">Supplemental</a> (supplemental
168      data)</li>
169      <li>Part 7: <a href=
170      "tr35-keyboards.html#Contents">Keyboards</a> (keyboard
171      mappings)</li>
172    </ul>
173    <h2><a name="Contents" href="#Contents" id="Contents">Contents
174    of Part 6, Supplemental</a></h2>
175    <!-- START Generated TOC: CheckHtmlFiles -->
176    <ul class="toc">
177      <li>1 <a href="#Supplemental_Data">Introduction Supplemental
178      Data</a></li>
179      <li>2 <a href="#Territory_Data">Territory Data</a>
180        <ul class="toc">
181          <li>2.1 <a href=
182          "#Supplemental_Territory_Containment">Supplemental
183          Territory Containment</a></li>
184          <li>2.2 <a href="#Subdivision_Containment">Subdivision
185          Containment</a></li>
186          <li>2.3 <a href=
187          "#Supplemental_Territory_Information">Supplemental
188          Territory Information</a></li>
189          <li>2.4 <a href=
190          "#Territory_Based_Preferences">Territory-Based
191          Preferences</a>
192            <ul class="toc">
193              <li>2.4.1 <a href=
194              "#Preferred_Units_For_Usage">Preferred Units for
195              Specific Usages</a>
196                <ul class="toc">
197                  <li>Table: <a href=
198                  "#Unit_Preferences">Unit Preference
199                  Categories</a></li>
200                </ul>
201              </li>
202            </ul>
203          </li>
204          <li>2.5 <a href="#rgScope">&lt;rgScope&gt;: Scope of the
205          “rg” Locale Key</a></li>
206        </ul>
207      </li>
208      <li>3 <a href="#Supplemental_Language_Data">Supplemental
209      Language Data</a>
210        <ul class="toc">
211          <li>3.1 <a href=
212          "#Supplemental_Language_Grouping">Supplemental Language
213          Grouping</a></li>
214        </ul>
215      </li>
216      <li>4 <a href="#Supplemental_Code_Mapping">Supplemental Code
217      Mapping</a></li>
218      <li>5 <a href="#Telephone_Code_Data">Telephone Code Data</a>
219      (Deprecated)</li>
220      <li>6 <a href="#Postal_Code_Validation">Postal Code
221      Validation (Deprecated)</a></li>
222      <li>7 <a href=
223      "#Supplemental_Character_Fallback_Data">Supplemental
224      Character Fallback Data</a></li>
225      <li>8 <a href="#Coverage_Levels">Coverage Levels</a>
226        <ul class="toc">
227          <li>8.1 <a href=
228          "#Coverage_Level_Definitions">Definitions</a></li>
229          <li>8.2 <a href="#Coverage_Level_Data_Requirements">Data
230          Requirements</a></li>
231          <li>8.3 <a href="#Coverage_Level_Default_Values">Default
232          Values</a></li>
233        </ul>
234      </li>
235      <li>9 <a href="#Appendix_Supplemental_Metadata">Supplemental
236      Metadata</a>
237        <ul class="toc">
238          <li>9.1 <a href=
239          "#Supplemental_Alias_Information">Supplemental Alias
240          Information</a>
241            <ul class="toc">
242              <li>Table: <a href="#Alias_Attribute_Values">Alias
243              Attribute Values</a></li>
244            </ul>
245          </li>
246          <li>9.2 <a href=
247          "#Supplemental_Deprecated_Information">Supplemental
248          Deprecated Information (Deprecated)</a></li>
249          <li>9.3 <a href="#Default_Content">Default
250          Content</a></li>
251        </ul>
252      </li>
253      <li>10 <a href="#Metadata_Elements">Locale Metadata
254      Elements</a></li>
255      <li>11 <a href="#Version_Information">Version
256      Information</a></li>
257      <li>12 <a href="#Parent_Locales">Parent Locales</a></li>
258      <li>13 <a href="#Unit_Conversion" >Unit Conversion</a></li>
259      <li>14 <a href="#Unit_Preferences">Unit Preferences</a></li>
260    </ul>
261    <!-- END Generated TOC: CheckHtmlFiles -->
262    <h2>1 Introduction <a name="Supplemental_Data" href=
263    "#Supplemental_Data" id="Supplemental_Data">Supplemental
264    Data</a></h2>
265    <p>The following represents the format for additional
266    supplemental information. This is information that is important
267    for internationalization and proper use of CLDR, but is not
268    contained in the locale hierarchy. It is not localizable, nor
269    is it overridden by locale data. The current CLDR data can be
270    viewed in the <a href=
271    "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/index.html">
272    Supplemental Charts</a>.</p>
273    <p class="dtd">
274    <!-- t d {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}-->
275     &lt;!ELEMENT supplementalData (version, generation?,
276    cldrVersion?, currencyData?, territoryContainment?,
277    subdivisionContainment?, languageData?, territoryInfo?,
278    postalCodeData?, calendarData?, calendarPreferenceData?,
279    weekData?, timeData?, measurementData?, unitPreferenceData?,
280    timezoneData?, characters?, transforms?, metadata?,
281    codeMappings?, parentLocales?, likelySubtags?, metazoneInfo?,
282    plurals?, telephoneCodeData?, numberingSystems?,
283    bcp47KeywordMappings?, gender?, references?, languageMatching?,
284    dayPeriodRuleSet*, metaZones?, primaryZones?, windowsZones?,
285    coverageLevels?, idValidity?, rgScope?) &gt;</p>
286    <p>The data in CLDR is presently split into multiple files:
287    supplementalData.xml, supplementalMetadata.xml, characters.xml,
288    likelySubtags.xml, ordinals.xml, plurals.xml,
289    telephoneCodeData.xml, genderList.xml, plus transforms (see
290    <i>Part 2 Section 10 <a href=
291    "tr35-general.html#Transforms">Transforms</a></i> and <i>Part 2
292    Section 10.3 <a href=
293    "tr35-general.html#Transform_Rules_Syntax">Transform Rule
294    Syntax</a></i>). The split is just for convenience: logically,
295    they are treated as though they were a single file. Future
296    versions of CLDR may split the data in a different fashion. Do
297    not depend on any specific XML filename or path for
298    supplemental data.</p>
299    <p>Note that <a href="#Metadata_Elements">Chapter 10</a>
300    presents information about metadata that is maintained on a
301    per-locale basis. It is included in this section because it is
302    not intended to be used as part of the locale itself.</p>
303    <h2>2 <a name="Territory_Data" href="#Territory_Data" id=
304    "Territory_Data">Territory Data</a></h2>
305    <h3>2.1 <a name="Supplemental_Territory_Containment" href=
306    "#Supplemental_Territory_Containment" id=
307    "Supplemental_Territory_Containment">Supplemental Territory
308    Containment</a></h3>
309    <p class="dtd">&lt;!ELEMENT territoryContainment ( group* )
310    &gt;<br>
311    &lt;!ELEMENT group EMPTY &gt;<br>
312    &lt;!ATTLIST group type NMTOKEN #REQUIRED &gt;<br>
313    &lt;!ATTLIST group contains NMTOKENS #IMPLIED &gt;<br>
314    &lt;!ATTLIST group grouping ( true | false ) #IMPLIED &gt;<br>
315    &lt;!ATTLIST group status ( deprecated, grouping ) #IMPLIED
316    &gt;</p>
317    <p>The following data provides information that shows groupings
318    of countries (regions). The data is based on the [<a href=
319    "tr35.html#UNM49">UNM49</a>]. There is one special code,
320    <code>QO</code> , which is used for outlying areas of Oceania
321    that are typically uninhabited. The territory containment forms
322    a tree with the following levels:</p>
323    <p align="center">World</p>
324    <p align="center">Continent</p>
325    <p align="center">Subcontinent</p>
326    <p align="center">Country</p>
327    <p>Excluding groupings, in this tree:<br></p>
328    <ul>
329      <li>All non-overlapping regions form a strict tree rooted at
330      World</li>
331      <li>All leaf-nodes (country) are always at depth 4. Some of
332      these “country” regions are actually parts of other
333      countries, such as Hong Kong (part of China). Such
334      relationships are not part of the containment data.</li>
335    </ul>
336    <p>For a chart showing the relationships (plus the included
337    timezones), see the <a href=
338    "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html">
339    Territory Containment Chart</a>. The XML structure has the
340    following form.</p>
341    <pre>&lt;territoryContainment&gt;</pre>
342    <blockquote>
343      <pre>
344      &lt;group type="001" contains="002 009 019 142 150"/&gt; &lt;!--World --&gt;
345&lt;group type="011" contains="BF BJ CI CV GH GM GN GW LR ML MR NE NG SH SL SN TG"/&gt; &lt;!--Western Africa --&gt;
346&lt;group type="013" contains="BZ CR GT HN MX NI PA SV"/&gt; &lt;!--Central America --&gt;
347&lt;group type="014" contains="BI DJ ER ET KE KM MG MU MW MZ RE RW SC SO TZ UG YT ZM ZW"/&gt; &lt;!--Eastern Africa --&gt;
348&lt;group type="142" contains="030 035 062 145"/&gt; &lt;!--Asia --&gt;
349&lt;group type="145" contains="AE AM AZ BH CY GE IL IQ JO KW LB OM PS QA SA SY TR YE"/&gt; &lt;!--Western Asia --&gt;
350&lt;group type="015" contains="DZ EG EH LY MA SD TN"/&gt; &lt;!--Northern Africa --&gt;
351...</pre>
352    </blockquote>
353    <p>There are groupings that don't follow this regular
354    structure, such as:</p>
355    <pre>
356    &lt;group type="003" contains="013 021 029" grouping="true"/&gt; &lt;!--North America --&gt;</pre>
357    <p>These are marked with the attribute <span class=
358    "attribute">grouping</span>="<span class=
359    "attributeValue">true</span>".</p>
360    <p>When groupings have been deprecated but kept around for
361    backwards compatibility, they are marked with the attribute
362    <span class="attribute">status</span>="<span class=
363    "attributeValue">deprecated</span>", like this:</p>
364    <pre>
365    &lt;group type="029" contains="AN" status="deprecated"/&gt; &lt;!--Caribbean --&gt;</pre>
366    <p>When the containment relationship itself is a grouping, it
367    is marked with the attribute <span class=
368    "attribute">status</span>="<span class=
369    "attributeValue">grouping</span>", like this:</p>
370    <pre>
371    &lt;group type="150" contains="EU" status="grouping"/&gt; &lt;!--Europe --&gt;</pre>
372    <p>That is, the type value isn’t a grouping, but if you filter
373    out groupings you can drop this containment. In the example
374    above, EU is a grouping, and contained in 150.</p>
375    <h3>2.2 <a name="Subdivision_Containment" href=
376    "#Subdivision_Containment" id=
377    "Subdivision_Containment">Subdivision Containment</a></h3>
378    <p class="dtd">&lt;!ELEMENT subdivisionContainment ( subgroup*
379    ) &gt;<br>
380    <br>
381    &lt;!ELEMENT subgroup EMPTY &gt;<br>
382    &lt;!ATTLIST subgroup type NMTOKEN #REQUIRED &gt;<br>
383    &lt;!ATTLIST subgroup contains NMTOKENS #IMPLIED &gt;</p>
384    <p>The subdivision containment data is similar to the territory
385    containment. It is based on ISO 3166-2 data, but may diverge
386    from it in the future.</p>
387    <p class="xmlExample">&lt;subgroup type="BD" contains="bda bdb
388    bdc bdd bde bdf bdg bdh"/&gt;<br>
389    &lt;subgroup type="bda" contains="bd02 bd06 bd07 bd25 bd50
390    bd51"/&gt;</p>
391    <p>The <strong>type</strong> is a <code><a href=
392    "tr35.html#unicode_region_subtag">unicode_region_subtag</a></code>
393    (territory) identifier for the top level of containment, or a
394    <code><a href=
395    "tr35.html#unicode_subdivision_subtag">unicode_subdivision_id</a></code>
396    for lower levels of containment when there are multiple levels.
397    The <strong>contains</strong> value is a space-delimited list
398    of one or more <code><a href=
399    "tr35.html#unicode_subdivision_subtag">unicode_subdivision_id</a></code>
400    values. In the example above, subdivision bda contains other
401    subdivisions bd02, bd06, bd07, bd25, bd50, bd51.</p>
402    <p>Note: Formerly (in CLDR 28 through 30):</p>
403    <ul>
404      <li>The <strong>type</strong> attribute could only contain a
405      <code>unicode_region_subtag</code>;</li>
406      <li>The <strong>contains</strong> attribute contained
407      <code>unicode_subdivision_suffix</code> values; these are not
408      unique across multiple territories, so...</li>
409      <li>For lower containment levels, a now-deprecated subtype
410      <strong>attribute</strong> was used to specify the parent
411      <code>unicode_subdivision_suffix</code>.</li>
412    </ul>* The type attribute contained only a
413    <code>unicode_region_subtag</code>
414    <code>unicode_subdivision_suffix</code> values were used in the
415    <strong>contains</strong> attribute; these are not unique
416    across multiple territories, so for lower levels a
417    now-deprecated
418    <h3>2.3 <a name="Supplemental_Territory_Information" href=
419    "#Supplemental_Territory_Information" id=
420    "Supplemental_Territory_Information">Supplemental Territory
421    Information</a></h3>
422    <p class="dtd">&lt;!ELEMENT territory ( languagePopulation* )
423    &gt;<br>
424    &lt;!ATTLIST territory type NMTOKEN #REQUIRED &gt;<br>
425    &lt;!ATTLIST territory gdp NMTOKEN #REQUIRED &gt;<br>
426    &lt;!ATTLIST territory literacyPercent NMTOKEN #REQUIRED
427    &gt;<br>
428    &lt;!ATTLIST territory population NMTOKEN #REQUIRED &gt;<br>
429    <br>
430    &lt;!ELEMENT languagePopulation EMPTY &gt;<br>
431    &lt;!ATTLIST languagePopulation type NMTOKEN #REQUIRED &gt;<br>
432    &lt;!ATTLIST languagePopulation literacyPercent NMTOKEN
433    #IMPLIED &gt;<br>
434    &lt;!ATTLIST languagePopulation writingPercent NMTOKEN #IMPLIED
435    &gt;<br>
436    &lt;!ATTLIST languagePopulation populationPercent NMTOKEN
437    #REQUIRED &gt;<br>
438    &lt;!ATTLIST languagePopulation officialStatus
439    (de_facto_official | official | official_regional |
440    official_minority) #IMPLIED &gt;</p>
441    <p>This data provides testing information for language and
442    territory populations. The main goal is to provide approximate
443    figures for the literate, functional population for each
444    language in each territory: that is, the population that is
445    able to read and write each language, and is comfortable enough
446    to use it with computers. For a chart of this data, see
447    <a href='https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_language_information.html'>
448    Territory-Language Information</a>.</p>
449    <p><em>Example</em></p>
450    <pre style='font-size: 70%'>
451    &lt;territory type="AO" gdp="175500000000" literacyPercent="70.4" population="19088100"&gt; &lt;!--Angola--&gt;
452 &lt;languagePopulation type="pt" populationPercent="67" officialStatus="official"/&gt; &lt;!--Portuguese--&gt;
453 &lt;languagePopulation type="umb" populationPercent="29"/&gt; &lt;!--Umbundu--&gt;
454 &lt;languagePopulation type="kmb" writingPercent="10" populationPercent="25" references="R1034"/&gt; &lt;!--Kimbundu--&gt;
455 &lt;languagePopulation type="ln" populationPercent="0.67" references="R1010"/&gt; &lt;!--Lingala--&gt;
456&lt;/territory&gt;</pre>
457    <p>Note that reliable information is difficult to obtain; the
458    information in CLDR is an estimate culled from different
459    sources, including the World Bank, CIA Factbook, and others.
460    The GDP and country literacy figures are taken from the World
461    Bank where available, otherwise supplemented by FactBook data
462    and other sources. The GDP figures are “PPP (constant 2000
463    international $)”. Much of the per-language data is taken from
464    the Ethnologue, but is supplemented and processed using many
465    other sources, including per-country census data. (The focus of
466    the Ethnologue is native speakers, which includes people who
467    are not literate, and excludes people who are functional
468    second-language users.) Some references are marked in the XML
469    files, with attributes such as <code>references="R1010"</code>
470    .</p>
471    <p>The percentages may add up to more than 100% due to
472    multilingual populations, or may be less than 100% due to
473    illiteracy or because the data has not yet been gathered or
474    processed. Languages with smaller populations might not be
475    included.</p>
476    <p>The following describes the meaning of some of these
477    terms—as used in CLDR—in more detail.</p>
478    <p><a name="literacy_percent" href="#literacy_percent" id=
479    "literacy_percent">literacy percent for the
480    territory</a>&nbsp;— an estimate of the percentage of the
481    country’s population that is functionally literate.</p>
482    <p><a name="language_population_percent" href=
483    "#language_population_percent" id=
484    "language_population_percent">language population
485    percent</a>&nbsp;— an estimate of the number of people who are
486    functional in that language in that country, including both
487    first and second language speakers. The level of fluency is
488    that necessary to use a UI on a computer, smartphone, or
489    similar devices, rather than complete fluency.</p>
490    <p><a name="literacy_percent_for_langPop" href=
491    "#literacy_percent_for_langPop" id=
492    "literacy_percent_for_langPop">literacy percent for language
493    population</a>&nbsp;— Within the set of people who are
494    functional in the corresponding language (as specified by
495    <a href="#language_population_percent">language population
496    percent</a>), this is an estimate of the percentage of those
497    people who are functionally literate in that language, that is,
498    who are <em>capable</em> of reading or writing in that
499    language, even if they do not regularly use it for reading or
500    writing. If not specified, this defaults to the <a href=
501    "#literacy_percent">literacy percent for the territory</a>.</p>
502    <p><a name="writing_percent" href="#writing_percent" id=
503    "writing_percent">writing percent</a> — Within the set of
504    people who are functional in the corresponding language (as
505    specified by <a href="#language_population_percent">language
506    population percent</a>), this is an estimate of the percentage
507    of those people who regularly read or write a significant
508    amount in that language. Ideally, the regularity would be
509    measured as “7-day actives”. If it is known that the language
510    is not widely or commonly written, but there are no solid
511    figures, the value is typically given 1%-5%.</p>
512    <p>For a language such as Swiss German, which is typically not
513    written, even though nearly the whole native Germanophone
514    population&nbsp;<em>could</em> write in Swiss German, the
515    <a href="#literacy_percent_for_langPop">literacy percent for
516    language population</a> is high, but the <a href=
517    "#writing_percent">writing percent</a> is low.</p>
518    <p><a name="official_language" href="#official_language" id=
519    "official_language">official language</a>&nbsp;— as used in
520    CLDR, a language that can generally be used in all
521    communications with a central government. That is, people can
522    expect that essentially all communication from the government
523    is available in that language (ballots, information pamphlets,
524    legal documents, …) and that they can use that language in any
525    communication to the central government (petitions, forms,
526    filing lawsuits,…).</p>
527    <p>Official languages for a country in this sense are not
528    necessarily the same as those with official legal status in the
529    country. For example, Irish is declared to be an official
530    language in Ireland, but English has no such formal status in
531    the United States. Languages such as the latter are
532    called&nbsp;<em>de facto</em>&nbsp;official languages. As
533    another example, German has legal status in Italy, but cannot
534    be used in all communications with the central government, and
535    is thus not an official language <em>of Italy</em> for CLDR
536    purposes. It is, however, an&nbsp;<em>official regional
537    language</em>. Other languages are declared to be official, but
538    can’t actually be used for all communication with any major
539    governmental entity in the country. There is no intention to
540    mark such nominally official languages as “official” in the
541    CLDR data.</p>
542    <p><a name="official_regional_language" href=
543    "#official_regional_language" id=
544    "official_regional_language">official regional
545    language</a>&nbsp;— a language that is official (<em>de
546    jure</em> or <em>de facto</em>) in a major region within a
547    country, but does not qualify as an official language of the
548    country as a whole. For example, it can be used in an official
549    petition to a provincial government, but not the central
550    government. The term “major” is meant to distinguish from
551    smaller-scale usage, such as for a town or village.</p>
552    <h3>2.4 <a name="Territory_Based_Preferences" href=
553    "#Territory_Based_Preferences" id=
554    "Territory_Based_Preferences">Territory-Based
555    Preferences</a></h3>
556    <p>The default preference for several locale items is based
557    solely on a <a href=
558    "tr35.html#unicode_region_subtag">unicode_region_subtag</a>,
559    which may either be specified as part of a <a href=
560    "tr35.html#unicode_language_id">unicode_language_id</a>,
561    inferred from other locale ID elements using the <a href=
562    "tr35.html#Likely_Subtags">Likely Subtags</a> mechanism, or
563    provided explicitly using an “rg” <a href=
564    "tr35.html#RegionOverride">Region Override</a> locale key. For
565    more information on this process see <a href=
566    "tr35.html#Locale_Inheritance">Locale Inheritance and
567    Matching</a>. The specific items that are handled in this way
568    are:</p>
569    <ul>
570      <li>Default calendar (see <a href=
571      "tr35-dates.html#Calendar_Preference_Data">Calendar
572      Preference Data</a>)</li>
573      <li>Default week conventions (first day of week and weekend
574      days; see <a href="tr35-dates.html#Week_Data">Week
575      Data</a>)</li>
576      <li>Default hour cycle (see <a href=
577      "tr35-dates.html#Time_Data">Time Data</a>)</li>
578      <li>Default currency (see <a href=
579      "tr35-numbers.html#Supplemental_Currency_Data">Supplemental
580      Currency Data</a>)</li>
581      <li>Default measurement system and paper size (see <a href=
582      "tr35-general.html#Measurement_System_Data">Measurement
583      System Data</a>)</li>
584      <li>Default units for specific usage (see <a href=
585      "#Preferred_Units_For_Usage">Preferred Units for Specific
586      Usages</a>, below)</li>
587    </ul>
588    <h4>2.4.1 <a name="Preferred_Units_For_Usage" href=
589    "#Preferred_Units_For_Usage" id=
590    "Preferred_Units_For_Usage">Preferred Units for Specific
591    Usages</a></h4>
592    <p><em>For information about preferred units and unit conversion, see Section 13 <a href="#Unit_Conversion" >Unit Conversion</a> and Section 14 <a href="#Unit_Preferences" >Unit Preferences</a>.</em></p>
593    <h3>2.5 <a name="rgScope" href="#rgScope" id=
594    "rgScope">&lt;rgScope&gt;: Scope of the “rg” Locale
595    Key</a></h3>
596    <p>The supplemental &lt;rgScope&gt; element specifies the data
597    paths for which the region used for data lookup is determined
598    by the value of any “rg” key present in the locale identifier
599    (see <a href="tr35.html#RegionOverride">Region Override</a>).
600    If no “rg” key is present, the region used for lookup is
601    determined as usual: from the unicode_region_subtag if present,
602    else inferred from the unicode_language_subtag. The DTD
603    structure is as follows:</p>
604    <p class="dtd">&lt;!ELEMENT rgScope ( rgPath* ) &gt;<br>
605    <br>
606    &lt;!ELEMENT rgPath EMPTY &gt;<br>
607    &lt;!ATTLIST rgPath path CDATA #REQUIRED &gt;<br></p>
608    <p>The &lt;rgScope&gt; element contains a list of
609    &lt;rgPath&gt; elements, each of which specifies a datapath for
610    which any “rg” key determines the region for lookup. For
611    example:</p>
612    <pre>
613   &lt;rgScope&gt;
614      &lt;rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*'][@cashDigits='*'][@cashRounding='*']" draft="provisional" /&gt;
615      &lt;rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*'][@cashRounding='*']" draft="provisional" /&gt;
616      &lt;rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*']" draft="provisional" /&gt;
617      &lt;rgPath path="//supplementalData/calendarPreferenceData/calendarPreference[@territories='#'][@ordering='*']" draft="provisional" /&gt;
618      ...
619      &lt;rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*'][@scope='*']/unitPreference[@regions='#'][@alt='*']" draft="provisional" /&gt;
620      &lt;rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*'][@scope='*']/unitPreference[@regions='#']" draft="provisional" /&gt;
621      &lt;rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*']/unitPreference[@regions='#'][@alt='*']" draft="provisional" /&gt;
622      &lt;rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*']/unitPreference[@regions='#']" draft="provisional" /&gt;
623   &lt;/rgScope&gt;
624</pre>
625    <p>The exact format of the path is provisional in CLDR 29, but
626    as currently shown:</p>
627    <ul>
628      <li>An attribute value of '*' indicates that the path applies
629      regardless of the value of the attribute.</li>
630      <li>Each path must have exactly one attribute whose value is
631      marked here as '#'; in actual data items with this path, the
632      corresponding value is a list of region codes. It is the
633      region codes in this list that are compared with the region
634      specified by the “rg” key to determine which data item to use
635      for this path.</li>
636    </ul>
637    <h2>3 <a name="Supplemental_Language_Data" href=
638    "#Supplemental_Language_Data" id=
639    "Supplemental_Language_Data">Supplemental Language
640    Data</a></h2>
641    <p class="dtd">&lt;!ELEMENT languageData ( language* ) &gt;<br>
642    &lt;!ELEMENT language EMPTY &gt;<br>
643    &lt;!ATTLIST language type NMTOKEN #REQUIRED &gt;<br>
644    &lt;!ATTLIST language scripts NMTOKENS #IMPLIED &gt;<br>
645    &lt;!ATTLIST language territories NMTOKENS #IMPLIED &gt;<br>
646    &lt;!ATTLIST language variants NMTOKENS #IMPLIED &gt;<br>
647    &lt;!ATTLIST language alt NMTOKENS #IMPLIED &gt;<br>
648    &nbsp;</p>
649    <p>The language data is used for consistency checking and
650    testing. It provides a list of which languages are used with
651    which scripts and in which countries. To a large extent,
652    however, the territory list has been superseded by the data in
653    <em>Section 2.2 <a href=
654    "#Supplemental_Territory_Information">Supplemental Territory
655    Information</a></em> .</p>
656    <pre>   &lt;languageData&gt;
657                &lt;language type="af" scripts="Latn" territories="ZA"/&gt;
658                &lt;language type="am" scripts="Ethi" territories="ET"/&gt;
659                &lt;language type="ar" scripts="Arab" territories="AE BH DZ EG IN IQ JO KW LB
660LY MA OM PS QA SA SD SY TN YE"/&gt;
661                     ...</pre>
662    <p>If the language is not a modern language, or the script is
663    not a modern script, or the language not a major language of
664    the territory, then the alt attribute is set to secondary.</p>
665    <pre>
666    &lt;language type="fr" scripts="Latn" territories="IT US" alt="secondary" /&gt;
667                     ...</pre>
668    <h2>3.1 <a name="Supplemental_Language_Grouping" href=
669    "#Supplemental_Language_Grouping" id=
670    "Supplemental_Language_Grouping">Supplemental Language
671    Grouping</a></h2>
672    <p>&lt;!ELEMENT languageGroups ( languageGroup* ) &gt;<br>
673    &lt;!ELEMENT languageGroup ( #PCDATA ) &gt;<br>
674    &lt;!ATTLIST languageGroup parent NMTOKEN #REQUIRED &gt;</p>
675    <p>The language groups supply language containment. For
676    example, the following indicates that aav is the Unicode
677    language code for a language group that contains caq, crv,
678    etc.</p><code>&lt;languageGroup
679    parent="<strong>fiu</strong>"&gt;chm et <strong>fi</strong> fit
680    fkv hu izh kca koi krl kv liv mdf mns mrj myv smi udm vep vot
681    vro&lt;/languageGroup&gt;</code>
682    <p>The vast majority of the languageGroup data is extracted
683    from wikidata, but may be overridden in some cases. The
684    wikidata information is more fine-grained, but makes use of
685    language groups that don't have ISO or Unicode language codes.
686    Those language groups are omitted from the data. For example,
687    wikidata has the following child-parent chain: only the first
688    and last elements are present in the language groups.</p>
689    <table>
690      <tr>
691        <td>Name</td>
692        <td>Wikidata Code</td>
693        <td>Language Code</td>
694      </tr>
695      <tr>
696        <td>Finnish</td>
697        <td><a href=
698        "https://www.wikidata.org/wiki/Q1412">Q1412</a></td>
699        <td>fi</td>
700      </tr>
701      <tr>
702        <td>Finnic languages</td>
703        <td><a href=
704        "https://www.wikidata.org/wiki/Q33328">Q33328</a></td>
705      </tr>
706      <tr>
707        <td>Finno-Samic languages</td>
708        <td><a href=
709        "https://www.wikidata.org/wiki/Q163652">Q163652</a></td>
710      </tr>
711      <tr>
712        <td>Finno-Volgaic languages</td>
713        <td><a href=
714        "https://www.wikidata.org/wiki/Q161236">Q161236</a></td>
715      </tr>
716      <tr>
717        <td>Finno-Permic languages</td>
718        <td><a href=
719        "https://www.wikidata.org/wiki/Q161240">Q161240</a></td>
720      </tr>
721      <tr>
722        <td>Finno-Ugric languages</td>
723        <td><a href=
724        "https://www.wikidata.org/wiki/Q79890">Q79890</a></td>
725        <td>fiu</td>
726      </tr>
727    </table><br>
728    <h2>4 <a name="Supplemental_Code_Mapping" href=
729    "#Supplemental_Code_Mapping" id=
730    "Supplemental_Code_Mapping">Supplemental Code Mapping</a></h2>
731    <p class="dtd">&lt;!ELEMENT codeMappings (languageCodes*,
732    territoryCodes*, currencyCodes*) &gt;</p>
733    <p class="dtd">&lt;!ELEMENT languageCodes EMPTY &gt;<br>
734    &lt;!ATTLIST languageCodes type NMTOKEN #REQUIRED&gt;<br>
735    &lt;!ATTLIST languageCodes alpha3 NMTOKEN #REQUIRED&gt;</p>
736    <p class="dtd">&lt;!ELEMENT territoryCodes EMPTY &gt;<br>
737    &lt;!ATTLIST territoryCodes type NMTOKEN #REQUIRED&gt;<br>
738    &lt;!ATTLIST territoryCodes numeric NMTOKEN #REQUIRED&gt;<br>
739    &lt;!ATTLIST territoryCodes alpha3 NMTOKEN #REQUIRED&gt;<br>
740    &lt;!ATTLIST territoryCodes fips10 NMTOKEN #IMPLIED&gt;<br>
741    &lt;!ATTLIST territoryCodes internet NMTOKENS #IMPLIED&gt;
742    [deprecated]</p>
743    <p class="dtd">&lt;!ELEMENT currencyCodes EMPTY &gt;<br>
744    &lt;!ATTLIST currencyCodes type NMTOKEN #REQUIRED&gt;<br>
745    &lt;!ATTLIST currencyCodes numeric NMTOKEN #REQUIRED&gt;</p>
746    <p>The code mapping information provides mappings between the
747    subtags used in the CLDR locale IDs (from BCP 47) and other
748    coding systems or related information. The language codes are
749    only provided for those codes that have two letters in BCP 47
750    to their ISO three-letter equivalents. The territory codes
751    provide mappings to numeric (UN M.49 [<a href=
752    "tr35.html#UNM49">UNM49</a>] codes, equivalent to ISO numeric
753    codes), ISO three-letter codes, FIPS 10 codes, and the internet
754    top-level domain codes.</p>
755    <p>The alphabetic codes are only provided where different from
756    the type. For example:</p>
757    <pre>
758    &lt;territoryCodes type="AA" numeric="958" alpha3="AAA"/&gt;
759&lt;territoryCodes type="AD" numeric="020" alpha3="AND" fips10="AN"/&gt;
760&lt;territoryCodes type="AE" numeric="784" alpha3="ARE"/&gt;
761...
762&lt;territoryCodes type="GB" numeric="826" alpha3="GBR" fips10="UK"/&gt;
763...
764&lt;territoryCodes type="QU" numeric="967" alpha3="QUU" internet="EU"/&gt;
765...
766&lt;territoryCodes type="XK" numeric="983" alpha3="XKK"/&gt;
767...</pre>
768    <p>Where there is no corresponding code, sometimes private use
769    codes are used, such as the numeric code for XK.</p>
770    <p>The currencyCodes are mappings from three letter currency
771    codes to numeric values (ISO 4217 <a href=
772    "https://www.currency-iso.org/en/home/tables/table-a1.html">Current
773    currency &amp; funds code list</a>.) The mapping currently
774    covers only current codes and does not include historic
775    currencies. For example:</p>
776    <pre>
777&lt;currencyCodes type="AED" numeric="784"/&gt;
778&lt;currencyCodes type="AFN" numeric="971"/&gt;
779...
780&lt;currencyCodes type="EUR" numeric="978"/&gt;
781...
782&lt;currencyCodes type="ZAR" numeric="710"/&gt;
783&lt;currencyCodes type="ZMW" numeric="967"/&gt;
784</pre>
785    <h2>5 <a name="Telephone_Code_Data" href="#Telephone_Code_Data"
786    id="Telephone_Code_Data">Telephone Code Data</a>
787    (Deprecated)</h2>
788    <p>Deprecated in CLDR v34, and data removed.</p>
789    <p class="dtd">&lt;!ELEMENT telephoneCodeData (
790    codesByTerritory* ) &gt;<br>
791    <br>
792    &lt;!ELEMENT codesByTerritory ( telephoneCountryCode+ )
793    &gt;<br>
794    &lt;!ATTLIST codesByTerritory territory NMTOKEN #REQUIRED
795    &gt;<br>
796    <br>
797    &lt;!ELEMENT telephoneCountryCode EMPTY &gt;<br>
798    &lt;!ATTLIST telephoneCountryCode code NMTOKEN #REQUIRED
799    &gt;<br>
800    &lt;!ATTLIST telephoneCountryCode from NMTOKEN #IMPLIED
801    &gt;<br>
802    &lt;!ATTLIST telephoneCountryCode to NMTOKEN #IMPLIED &gt;</p>
803    <p>This data specifies the mapping between ITU telephone
804    country codes [<a href="tr35.html#ITUE164">ITUE164</a>] and
805    CLDR-style territory codes (ISO 3166 2-letter codes or
806    non-corresponding UN M.49 [<a href="tr35.html#UNM49">UNM49</a>]
807    3-digit codes). There are several things to note:</p>
808    <ul>
809      <li>A given telephone country code may map to multiple CLDR
810      territory codes; +1 (North America Numbering Plan) covers the
811      US and Canada, as well as many islands in the Caribbean and
812      some in the Pacific</li>
813      <li>Some telephone country codes are for global services (for
814      example, some satellite services), and thus correspond to
815      territory code 001.</li>
816      <li>The mappings change over time (territories move from one
817      telephone code to another). These changes are usually planned
818      several years in advance, and there may be a period during
819      which either telephone code can be used to reach the
820      territory. While the CLDR telephone code data is not intended
821      to include past changes, it is intended to incorporate known
822      information on planned future changes, using "from" and "to"
823      date attributes to indicate when mappings are valid.</li>
824    </ul>
825    <p>A subset of the telephone code data might look like the
826    following (showing a past mapping change to illustrate the from
827    and to attributes):</p>
828    <pre>&lt;codesByTerritory territory="001"&gt;
829        &lt;telephoneCountryCode code="800"/&gt; &lt;!-- International Freephone Service --&gt;
830        &lt;telephoneCountryCode code="808"/&gt; &lt;!-- International Shared Cost Services (ISCS) --&gt;
831        &lt;telephoneCountryCode code="870"/&gt; &lt;!-- Inmarsat Single Number Access Service (SNAC) --&gt;
832&lt;/codesByTerritory&gt;
833&lt;codesByTerritory territory="AS"&gt; &lt;!-- American Samoa --&gt;
834        &lt;telephoneCountryCode code="1" from="2004-10-02"/&gt; &lt;!-- +1 684 in North America Numbering Plan --&gt;
835        &lt;telephoneCountryCode code="684" to="2005-04-02"/&gt; &lt;!-- +684 now a spare code --&gt;
836&lt;/codesByTerritory&gt;
837&lt;codesByTerritory territory="CA"&gt;
838        &lt;telephoneCountryCode code="1"/&gt; &lt;!-- North America Numbering Plan --&gt;
839&lt;/codesByTerritory&gt;</pre>
840    <h2>6 <a name="Postal_Code_Validation" href=
841    "#Postal_Code_Validation" id="Postal_Code_Validation">Postal
842    Code Validation (Deprecated)</a></h2>
843    <p>Deprecated in v27. Please see other services that are kept
844    up to date, such as:</p>
845    <ul>
846      <li><a href=
847      "https://i18napis.appspot.com/address/data/US">https://i18napis.appspot.com/address/data/US</a></li>
848      <li><a href=
849      "https://i18napis.appspot.com/address/data/CH">https://i18napis.appspot.com/address/data/CH</a></li>
850      <li>...<br></li>
851    </ul>
852    <p class="dtd">&lt;!ELEMENT postalCodeData (postCodeRegex*)
853    &gt;<br>
854    &lt;!ELEMENT postCodeRegex (#PCDATA) &gt;<br>
855    &lt;!ATTLIST postCodeRegex territoryId NMTOKEN
856    #REQUIRED&gt;<br></p>
857    <p>The Postal Code regex information can be used to validate
858    postal codes used in different countries. In some cases, the
859    regex is quite simple, such as for Germany:</p>
860    <pre>
861    &lt;postCodeRegex territoryId="DE" &gt;\d{5}&lt;/postCodeRegex&gt;</pre>
862    <p>The US code is slightly more complicated, since there is an
863    optional portion:</p>
864    <pre>
865    &lt;postCodeRegex territoryId="US" &gt;\d{5}([ \-]\d{4})?&lt;/postCodeRegex&gt;</pre>
866    <p>The most complicated currently is the UK.</p>
867    <h2>7 <a name="Supplemental_Character_Fallback_Data" href=
868    "#Supplemental_Character_Fallback_Data" id=
869    "Supplemental_Character_Fallback_Data">Supplemental Character
870    Fallback Data</a></h2>
871    <p class="dtd">&lt;!ELEMENT characters ( character-fallback*)
872    &gt;<br>
873    <br>
874    &lt;!ELEMENT character-fallback ( character* ) &gt;<br>
875    &lt;!ELEMENT character (substitute*) &gt;<br>
876    &lt;!ATTLIST character value CDATA #REQUIRED &gt;<br>
877    <br>
878    &lt;!ELEMENT substitute (#PCDATA) &gt;</p>
879    <p>The characters element provides a way for non-Unicode
880    systems, or systems that only support a subset of Unicode
881    characters, to transform CLDR data. It gives a list of
882    characters with alternative values that can be used if the main
883    value is not available. For example:</p>
884    <pre>&lt;characters&gt;
885       &lt;character-fallback&gt;
886        &lt;character value = "ß"&gt;
887                &lt;substitute&gt;ss&lt;/substitute&gt;
888        &lt;/character&gt;
889        &lt;character value = "Ø"&gt;
890                &lt;substitute&gt;Ö&lt;/substitute&gt;
891                &lt;substitute&gt;O&lt;/substitute&gt;
892        &lt;/character&gt;
893        &lt;character value = "<span style=
894"font-size: 150%">₧</span>"&gt;
895                &lt;substitute&gt;Pts&lt;/substitute&gt;
896        &lt;/character&gt;
897        &lt;character value = "<span style=
898"font-size: 150%">₣</span>"&gt;
899                &lt;substitute&gt;Fr.&lt;/substitute&gt;
900        &lt;/character&gt;
901       &lt;/character-fallback&gt;
902&lt;/characters&gt;</pre>
903    <p>The ordering of the substitute elements indicates the
904    preference among them.</p>That is, this data provides
905    recommended fallbacks for use when a charset or supported
906    repertoire does not contain a desired character. There is more
907    than one possible fallback: the recommended usage is that when
908    a character <i>value</i> is not in the desired repertoire the
909    following process is used, whereby the first value that is
910    wholly in the desired repertoire is used.
911    <ul>
912      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
913      <code>toNFC</code>(<i>value</i>)</li>
914      <li style="margin-top: 0.5em; margin-bottom: 0.5em">other
915      canonically equivalent sequences, if there are any</li>
916      <li style="margin-top: 0.5em; margin-bottom: 0.5em">the
917      explicit <i>substitutes</i> value (in order)</li>
918      <li style="margin-top: 0.5em; margin-bottom: 0.5em">
919      <code>toNFKC</code>(<i>value</i>)</li>
920    </ul>
921    <h2>8 <a name="Coverage_Levels" href="#Coverage_Levels" id=
922    "Coverage_Levels">Coverage Levels</a></h2>
923    <p>The following describes the coverage levels used for the
924    current version of CLDR. This list will change between releases
925    of CLDR. Each level adds to what is in the lower level.</p>
926    <table border="1" cellpadding="0" cellspacing="1">
927      <!-- nocaption -->
928      <tr>
929        <th nowrap>
930          <div align="right">
931            Level
932          </div>
933        </th>
934        <th colspan="2">Description</th>
935      </tr>
936      <tr>
937        <td nowrap>
938          <div align="right">
939            0
940          </div>
941        </td>
942        <td>undetermined</td>
943        <td>Does not meet any of the following levels.</td>
944      </tr>
945      <tr>
946        <td nowrap>
947          <div align="right">
948            10
949          </div>
950        </td>
951        <td>core</td>
952        <td>The CLDR "core" data, which is defined as the basic
953        information about the language and writing system that is
954        required before other information can be added using the
955        CLDR survey tool. See <a href=
956        "http://cldr.unicode.org/index/cldr-spec/minimaldata">http://cldr.unicode.org/index/cldr-spec/minimaldata</a></td>
957      </tr>
958      <tr>
959        <td nowrap>
960          <div align="right">
961            40
962          </div>
963        </td>
964        <td>basic</td>
965        <td>The minimum amount of locale data deemed necessary to
966        create a "viable" locale in CLDR. Contains names for the
967        languages, scripts, and territories associated with the
968        language, numbering systems used in those languages, date
969        and number formats, plus a few key values such as the
970        values in Section 3.1 <a href=
971        "tr35.html#Unknown_or_Invalid_Identifiers">Unknown or
972        Invalid Identifiers</a>. Also contains data associated with
973        the most prominent languages and countries.</td>
974      </tr>
975      <tr>
976        <td nowrap>
977          <div align="right">
978            60
979          </div>
980        </td>
981        <td>moderate</td>
982        <td>Contains more types of data and more language and
983        territory names than the basic level. If the language is
984        associated with an EU country, then the moderate level
985        attempts to complete the data as it pertains to all EU
986        member countries.</td>
987      </tr>
988      <tr>
989        <td nowrap>
990          <div align="right">
991            80
992          </div>
993        </td>
994        <td>modern</td>
995        <td>Contains all fields in normal modern use, including all
996        country names, and currencies in use.</td>
997      </tr>
998      <tr>
999        <td nowrap>
1000          <div align="right">
1001            100
1002          </div>
1003        </td>
1004        <td>comprehensive</td>
1005        <td>Contains complete localizations (or valid inheritance)
1006        for every possible field.</td>
1007      </tr>
1008    </table>
1009    <p>Levels 40 through 80 are based on the definitions and
1010    specifications listed in <strong>8.1-8.4</strong>. However,
1011    these principles are continually being refined by the CLDR
1012    technical committee, and so do not completely reflect the data
1013    that is actually used for coverage determination, which is
1014    under the XPath
1015    <strong>//supplementalData/CoverageLevels</strong>. For a view
1016    of the trunk version of this data<strike>file</strike>, see
1017    <a href=
1018    "https://github.com/unicode-org/cldr/releases/tag/latest/common/supplemental/coverageLevels.xml">
1019    coverageLevels.xml</a>. (As described in the <a href=
1020    "tr35-info.html#Supplemental_Data">introduction to Supplemental
1021    Data</a>, the specific XML filename may change.)</p>
1022    <p class="dtd">&lt;!ELEMENT coverageLevels (
1023    approvalRequirements, coverageVariable*, coverageLevel* )
1024    &gt;<br>
1025    &lt;!ELEMENT coverageLevel EMPTY &gt;<br>
1026    &lt;!ATTLIST coverageLevel inLanguage CDATA #IMPLIED &gt;<br>
1027    &lt;!ATTLIST coverageLevel inScript CDATA #IMPLIED &gt;<br>
1028    &lt;!ATTLIST coverageLevel inTerritory CDATA #IMPLIED &gt;<br>
1029    &lt;!ATTLIST coverageLevel value CDATA #REQUIRED &gt;<br>
1030    &lt;!ATTLIST coverageLevel match CDATA #REQUIRED &gt;</p>
1031    <p>For example, here is an example coverageLevel line.</p>
1032    <pre>&lt;coverageLevel<br>    value="30"
1033      inLanguage="(de|fi)" <br>    match="localeDisplayNames/types/type[@type='phonebook'][@key='collation']"/&gt;</pre>
1034    <p>The coverageLevel elements are read in order, and the first
1035    match results in a coverage level value. The element matches
1036    based on the <span class="attribute">inLanguage</span>,
1037    <span class="attribute">inScript</span>, <span class=
1038    "attribute">inTerritory</span>, and <span class=
1039    "attribute">match</span> attribute values, which are regular
1040    expressions. For example, in the above example, a match occurs
1041    if the language is de or fi, and if the path is a locale
1042    display name for collation=phonebook.</p>
1043    <p>The <span class="attribute">match</span> attribute value
1044    logically has "//ldml/" prefixed before it is applied. In
1045    addition, the "[@" is automatically quoted. Otherwise standard
1046    Perl/Java style regular expression syntax is used.</p>
1047    <p class="dtd">&lt;!ELEMENT coverageVariable EMPTY &gt;<br>
1048    &lt;!ATTLIST coverageVariable key CDATA #REQUIRED &gt;<br>
1049    &lt;!ATTLIST coverageVariable value CDATA #REQUIRED &gt;</p>
1050    <p>The coverageVariable element allows us to create variables
1051    for certain regular expressions that are used frequently in the
1052    coverageLevel definitions above. Each coverage varible must
1053    contain a key / value pair of attributes, which can then be
1054    used to be substituted into a coverageLevel definition
1055    above.</p>
1056    <p>For example, here is an example coverageLevel line using
1057    coverageVariable substitution.</p>
1058    <pre>
1059    &lt;coverageVariable key="%dayTypes" value="(sun|mon|tue|wed|thu|fri|sat)"&gt;<br>
1060&lt;coverageVariable key="%wideAbbr" value="(wide|abbreviated)"&gt;<br>
1061&lt;coverageLevel value="20" match="dates/calendars/calendar[@type='gregorian']/days/dayContext[@type='format']/dayWidth[@type='%wideAbbr']/day[@type='%dayTypes']"/&gt;</pre>
1062    <p>In this example, the coverge variables %dayTypes and
1063    %wideAbbr are used to substitute their respective values into
1064    the match expression. This allows us to reuse the same variable
1065    for other coverageLevel matches that use the same regular
1066    expression fragment.</p>
1067    <p class="dtd"><br>
1068    &lt;!ELEMENT approvalRequirements ( approvalRequirement* )
1069    &gt;<br>
1070    &lt;!ELEMENT approvalRequirement EMPTY &gt;<br>
1071    &lt;!ATTLIST approvalRequirement votes CDATA #REQUIRED&gt;<br>
1072    &lt;!ATTLIST approvalRequirement locales CDATA
1073    #REQUIRED&gt;<br>
1074    &lt;!ATTLIST approvalRequirement paths CDATA
1075    #REQUIRED&gt;<br></p>
1076    <p>The approvalRequirements allows to specify the number of
1077    survey tool votes required for approval, either based on
1078    locale, or path, or both. Certain locales require a higher
1079    voting threshhold (usually 8 votes instead of 4), in order to
1080    promote greater stability in the data. Furthermore, certain
1081    fields that are very high visibility fields, such as number
1082    formats, require a CLDR TC committee member's vote for
1083    approval.</p>
1084    <p>Here is an example of the approvalRequirements section.</p>
1085    <pre>
1086    &lt;approvalRequirements&gt;<br>   &lt;!--  "high bar" items --&gt;
1087                &lt;approvalRequirement votes="20" locales="*" paths="//ldml/numbers/symbols[^/]++/(decimal|group)"/&gt;
1088                &lt;!--  established locales - http://cldr.unicode.org/index/process#TOC-Draft-Status-of-Optimal-Field-Value --&gt;
1089                &lt;approvalRequirement votes="8" locales="ar ca cs da de el es fi fr he hi hr hu it ja ko nb nl pl pt pt_PT ro ru sk sl sr sv th tr uk vi zh zh_Hant" paths=""/&gt;
1090                &lt;!--  all other items --&gt;
1091                &lt;approvalRequirement votes="4" locales="*" paths=""/&gt;<br>&lt;/approvalRequirements&gt;              </pre>
1092    <p>This section specifies that a TC vote (20 votes) is required
1093    for decimal and grouping separators. Furthermore it specifies
1094    that any field in the established locales list (i.e. ar, ca,
1095    cs, etc.) requires 8 votes, and that all other locales require
1096    4 votes only.</p>
1097    <p>For more information on the CLDR Voting process, See
1098    <a href="http://cldr.unicode.org/index/process">http://cldr.unicode.org/index/process</a></p>
1099    <h3>8.1 <a name="Coverage_Level_Definitions" href=
1100    "#Coverage_Level_Definitions" id=
1101    "Coverage_Level_Definitions">Definitions</a></h3>
1102    <ul>
1103      <li><i>Target-Language</i> is the language under
1104      consideration.</li>
1105      <li><i>Target-Territories</i> is the list of territories
1106      found by looking up <i>Target-Language</i> in the
1107      &lt;languageData&gt; elements in <a href=
1108      "tr35-info.html#Supplemental_Language_Data">Supplemental
1109      Language Data</a>.</li>
1110      <li>
1111        <i>Language-List</i> is <i>Target-Language</i>, plus
1112        <ul>
1113          <li><b>basic:</b> Chinese, English, French, German,
1114          Italian, Japanese, Portuguese, Russian, Spanish, Unknown
1115          (de, en, es, fr, it, ja, pt, ru, zh, und</li>
1116          <li><b>moderate:</b> basic + Arabic, Hindi, Korean,
1117          Indonesian, Dutch, Bengali, Turkish, Thai, Polish (ar,
1118          hi, ko, in, nl, bn, tr, th, pl). If an EU language, add
1119          the remaining official EU languages, currently: Danish,
1120          Greek, Finnish, Swedish, Czech, Estonian, Latvian,
1121          Lithuanian, Hungarian, Maltese, Slovak, Slovene (da, el,
1122          fi, sv, cs, et, lv, lt, hu, mt, sk, sl)</li>
1123          <li><b>modern:</b> all languages that are official or
1124          major commercial languages of modern territories</li>
1125        </ul>
1126      </li>
1127      <li><i>Target-Scripts</i> is the list of scripts in which
1128      <i>Target-Language</i> can be customarily written (found by
1129      looking up <i>Target-Language</i> in the &lt;languageData&gt;
1130      elements in <a href=
1131      "tr35-info.html#Supplemental_Language_Data">Supplemental
1132      Language Data</a>.)<i>,</i> plus Unknown (Zzzz)<i>.</i></li>
1133      <li>
1134        <i>Script-List</i> is the <i>Target-Scripts</i> plus the
1135        major scripts used for multiple languages
1136        <ul>
1137          <li>Latin, Simplified Chinese, Traditional Chinese,
1138          Cyrillic, Arabic (Latn, Hans, Hant, Cyrl, Arab)</li>
1139        </ul>
1140      </li>
1141      <li>
1142        <i>Territory-List</i> is the list of territories formed by
1143        taking the <i>Target-Territories</i> and adding:
1144        <ul>
1145          <li><b>basic:</b> Brazil, China, France, Germany, India,
1146          Italy, Japan, Russia, United Kingdom, United States,
1147          Unknown (BR, CN, DE, GB, FR, IN, IT, JP, RU, US, ZZ)</li>
1148          <li><b>moderate:</b> basic + Spain, Canada, Korea,
1149          Mexico, Australia, Netherlands, Switzerland, Belgium,
1150          Sweden, Turkey, Austria, Indonesia, Saudi Arabia, Norway,
1151          Denmark, Poland, South Africa, Greece, Finland, Ireland,
1152          Portugal, Thailand, Hong Kong SAR China, Taiwan (ES, BE,
1153          SE, TR, AT, ID, SA, NO, DK, PL, ZA, GR, FI, IE, PT, TH,
1154          HK, TW). If an EU language, add the remaining member EU
1155          countries: Luxembourg, Czech Republic, Hungary, Estonia,
1156          Lithuania, Latvia, Slovenia, Slovakia, Malta (LU, CZ, HU,
1157          ES, LT, LV, SI, SK, MT).</li>
1158          <li><b>modern:</b> all current ISO 3166 territories, plus
1159          the UN M.49 [<a href="tr35.html#UNM49">UNM49</a>] regions
1160          in <a href=
1161          "tr35-info.html#Supplemental_Territory_Containment">Supplemental
1162          Territory Containment</a>.</li>
1163        </ul>
1164      </li>
1165      <li><i>Currency-List</i> is the list of current official
1166      currencies used in any of the territories in
1167      <i>Territory-List</i>, found by looking at the region
1168      elements in <a href=
1169      "tr35-info.html#Supplemental_Territory_Containment">Supplemental
1170      Territory Containment</a>, plus Unknown (XXX).</li>
1171      <li><i>Calendar-List</i> is the set of calendars in customary
1172      use in any of <i>Target-Territories</i>, plus Gregorian.</li>
1173      <li><em>Number-System-List</em> is the set of number systems
1174      in customary use in the language.</li>
1175    </ul>
1176    <h3>8.2 <a name="Coverage_Level_Data_Requirements" href=
1177    "#Coverage_Level_Data_Requirements" id=
1178    "Coverage_Level_Data_Requirements">Data Requirements</a></h3>
1179    <p>The required data to qualify for the level is then the
1180    following.</p>
1181    <ol>
1182      <li>localeDisplayNames
1183        <ol>
1184          <li><i>languages:</i> localized names for all languages
1185          in <i>Language-List.</i></li>
1186          <li><i>scripts:</i> localized names for all scripts in
1187          <i>Script-List</i>.</li>
1188          <li><i>territories:</i> localized names for all
1189          territories in <i>Territory-List</i>.</li>
1190          <li><i>variants, keys, types:</i> localized names for any
1191          in use in <i>Target-Territories</i>; for example, a
1192          translation for PHONEBOOK in a German locale.</li>
1193        </ol>
1194      </li>
1195      <li>dates: all of the following for each calendar in
1196      <i>Calendar-List</i>.
1197        <ol>
1198          <li>calendars: localized names</li>
1199          <li>month names, day names, era names, and quarter names
1200            <ul>
1201              <li>context=format and width=narrow, wide, &amp;
1202              abbreviated</li>
1203              <li>plus context=standAlone and width=narrow, wide,
1204              &amp; abbreviated, <i>if the grammatical forms of
1205              these are different than for context=format.</i></li>
1206            </ul>
1207          </li>
1208          <li>week: minDays, firstDay, weekendStart, weekendEnd
1209            <ul>
1210              <li>if some of these vary in territories in
1211              <i>Territory-List</i>, include territory locales for
1212              those that do.</li>
1213            </ul>
1214          </li>
1215          <li>am, pm, eraNames, eraAbbr</li>
1216          <li>dateFormat, timeFormat: full, long, medium,
1217          short</li>
1218          <li>
1219            <p>intervalFormatFallback</p>
1220          </li>
1221        </ol>
1222      </li>
1223      <li>numbers: symbols, decimalFormats, scientificFormats,
1224      percentFormats, currencyFormats for each number system in
1225      <em>Number-System-List</em>.</li>
1226      <li>currencies: displayNames and symbol for all currencies in
1227      <i>Currency-List</i>, for all plural forms</li>
1228      <li>transforms: (moderate and above) transliteration between
1229      Latin and each other script in <i>Target-Scripts.</i></li>
1230    </ol>
1231    <h3>8.3 <a name="Coverage_Level_Default_Values" href=
1232    "#Coverage_Level_Default_Values" id=
1233    "Coverage_Level_Default_Values">Default Values</a></h3>
1234    <p>Items should <i>only</i> be included if they are not the
1235    same as the default, which is:</p>
1236    <ul>
1237      <li>what is in root, if there is something defined
1238      there.</li>
1239      <li>for timezone IDs: the name computed according to
1240      <i><a href="tr35.html#Time_Zone_Fallback">Appendix J: Time
1241      Zone Display Names</a></i></li>
1242      <li>for collation sequence, the UCA DUCET (Default Unicode
1243      Collation Element Table), as modified by CLDR.
1244        <ul>
1245          <li>however, in that case the locale must be added to the
1246          validSubLocale list in <a href=
1247          "https://github.com/unicode-org/cldr/blob/master/common/collation/root.xml">collation/root.xml</a>.</li>
1248        </ul>
1249      </li>
1250      <li>for currency symbol, language, territory, script names,
1251      variants, keys, types, the internal code identifiers, for
1252      example,
1253        <ul>
1254          <li>currencies: EUR, USD, JPY, ...</li>
1255          <li>languages: en, ja, ru, ...</li>
1256          <li>territories: GB, JP, FR, ...</li>
1257          <li>scripts: Latn, Thai, ...</li>
1258          <li>variants: PHONEBOOK,...</li>
1259        </ul>
1260      </li>
1261    </ul><!-- end section 8 -->
1262    <!-- begin section 9 supplemental metadata -->
1263    <h2>9 <a name="Appendix_Supplemental_Metadata" href=
1264    "#Appendix_Supplemental_Metadata" id=
1265    "Appendix_Supplemental_Metadata">Supplemental Metadata</a></h2>
1266    <p>Note that this section discusses the
1267    <code>&lt;metadata&gt;</code> element within the
1268    <code>&lt;supplementalData&gt;</code> element. For the
1269    per-locale metadata used in tests and the Survey Tool, see
1270    <a href="#Metadata_Elements">10: Locale Metadata
1271    Element</a>.</p>
1272    <p>The supplemental metadata contains information about the
1273    CLDR file itself, used to test validity and provide information
1274    for locale inheritance. A number of these elements are
1275    described in</p>
1276    <ul class="toc">
1277      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix
1278      I: <a href="tr35.html#Inheritance_and_Validity">Inheritance
1279      and Validity</a></li>
1280      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix
1281      K: <a href="tr35.html#Valid_Attribute_Values">Valid Attribute
1282      Values</a></li>
1283      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix
1284      L: <a href="tr35.html#Canonical_Form">Canonical Form</a></li>
1285      <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix
1286      M: <a href="#Coverage_Levels">Coverage Levels</a></li>
1287    </ul>
1288    <h3>9.1 <a name="Supplemental_Alias_Information" href=
1289    "#Supplemental_Alias_Information" id=
1290    "Supplemental_Alias_Information">Supplemental Alias
1291    Information</a></h3>
1292    <p class="dtd">&lt;!ELEMENT alias
1293    (languageAlias*,scriptAlias*,territoryAlias*,subdivisionAlias*,variantAlias*,zoneAlias*)
1294    &gt;<br>
1295    <br>
1296    <em>The following are common attributes for subelements of
1297    &lt;alias&gt;:</em><br>
1298    &lt;!ELEMENT *Alias EMPTY &gt;<br>
1299    &lt;!ATTLIST *Alias type NMTOKEN #IMPLIED &gt;<br>
1300    &lt;!ATTLIST *Alias replacement NMTOKEN #IMPLIED &gt;<br>
1301    &lt;!ATTLIST *Alias reason ( deprecated | overlong )
1302    #IMPLIED&gt;<br>
1303    <br>
1304    <em>The languageAlias has additional reasons</em><br>
1305    &lt;!ATTLIST languageAlias reason ( deprecated | overlong |
1306    macrolanguage | legacy | bibliographic ) #IMPLIED&gt;</p>
1307    <p>This element provides information as to parts of locale IDs
1308    that should be substituted when accessing CLDR data. This
1309    logical substitution should be done to both the locale id, and
1310    to any lookup for display names of languages, territories, and
1311    so on. The replacement for the language and territory types is
1312    more complicated: see <em>Part 1: <a href=
1313    "tr35.html#Contents">Core</a>, Section 3.3.1 <a href=
1314    "tr35.html#BCP_47_Language_Tag_Conversion">BCP 47 Language Tag
1315    Conversion</a></em> for details.</p>
1316    <pre>&lt;alias&gt;
1317  &lt;languageAlias type="in" replacement="id"&gt;
1318  &lt;languageAlias type="sh" replacement="sr"&gt;
1319  &lt;languageAlias type="sh_YU" replacement="sr_Latn_YU"&gt;
1320...
1321  &lt;territoryAlias type="BU" replacement="MM"&gt;
1322...
1323&lt;/alias&gt;</pre>
1324    <p>Attribute values for the *Alias values include the
1325    following:</p>
1326    <table>
1327      <caption>
1328        <a name="Alias_Attribute_Values" href=
1329        "#Alias_Attribute_Values" id="Alias_Attribute_Values">Alias
1330        Attribute Values</a>
1331      </caption>
1332      <tr>
1333        <th scope="col">Attribute</th>
1334        <th scope="col">Value</th>
1335        <th scope="col">Description</th>
1336      </tr>
1337      <tr>
1338        <td>type</td>
1339        <td>NMTOKEN</td>
1340        <td>The code to be replaced</td>
1341      </tr>
1342      <tr>
1343        <td>replacement</td>
1344        <td>NMTOKEN</td>
1345        <td>The code(s) to replace it, space-delimited.</td>
1346      </tr>
1347      <tr>
1348        <td rowspan="5">reason</td>
1349        <td>deprecated</td>
1350        <td>The code in type is deprecated, such as 'iw' by 'he',
1351        or 'CS' by 'RS ME'.</td>
1352      </tr>
1353      <tr>
1354        <td>overlong</td>
1355        <td>The code in type is too long, such as 'eng' by 'en' or
1356        'USA' or '840' by 'US'</td>
1357      </tr>
1358      <tr>
1359        <td>macrolanguage</td>
1360        <td>The code in type is an encompassed languagethat is
1361        replaced by a macrolanguage, such as '<a href=
1362        "https://www.sil.org/iso639-3/documentation.asp?id=arb">arb'</a>
1363        by 'ar'.</td>
1364      </tr>
1365      <tr>
1366        <td>legacy</td>
1367        <td>The code in type is a legacy code that is replaced by
1368        another code for compatiblity with established legacy
1369        usage, such as 'sh' by 'sr_Latn'</td>
1370      </tr>
1371      <tr>
1372        <td>bibliographic</td>
1373        <td>The code in type is a <a href=
1374        "https://www.loc.gov/standards/iso639-2/langhome.html">bibliographic
1375        code</a>, which is replaced by a terminology code, such as
1376        'alb' by 'sq'.</td>
1377      </tr>
1378    </table>
1379    <h3>9.2 <a name="Supplemental_Deprecated_Information" href=
1380    "#Supplemental_Deprecated_Information" id=
1381    "Supplemental_Deprecated_Information">Supplemental Deprecated
1382    Information (Deprecated)</a></h3>
1383    <pre class="dtd">
1384    &lt;!ELEMENT deprecated ( deprecatedItems* ) &gt;
1385&lt;!ATTLIST deprecated draft ( approved | contributed | provisional | unconfirmed | true | false ) #IMPLIED &gt; &lt;!-- true and false are deprecated. --&gt;
1386
1387&lt;!ELEMENT deprecatedItems EMPTY &gt;
1388&lt;!ATTLIST deprecatedItems type ( standard | supplemental | ldml | supplementalData | ldmlBCP47 ) #IMPLIED &gt; &lt;!-- standard | supplemental are deprecated --&gt;
1389&lt;!ATTLIST deprecatedItems elements NMTOKENS #IMPLIED &gt;
1390&lt;!ATTLIST deprecatedItems attributes NMTOKENS #IMPLIED &gt;
1391&lt;!ATTLIST deprecatedItems values CDATA #IMPLIED &gt;</pre>
1392    <p>The deprecated items element was used to indicate elements,
1393    attributes, and attribute values that are deprecated. This
1394    means that the items are valid, but that their usage is
1395    strongly discouraged. This element and its subelements have
1396    been deprecated in favor of <a href=
1397    "tr35.html#DTD_Annotations">DTD Annotations</a>.</p>
1398    <p>Where particular values are deprecated (such as territory
1399    codes like SU for Soviet Union), the names for such codes may
1400    be removed from the common/main translated data after some
1401    period of time. However, typically supplemental information for
1402    deprecated codes is retained, such as containment, likely
1403    subtags, older currency codes usage, etc. The English name may
1404    also be retained, for debugging purposes.</p>
1405    <h3>9.3 <a name="Default_Content" href="#Default_Content" id=
1406    "Default_Content">Default Content</a></h3>
1407    <pre class="dtd">&lt;!ELEMENT defaultContent EMPTY &gt;
1408               &lt;!ATTLIST defaultContent locales NMTOKENS #IMPLIED &gt;</pre>
1409    <p>In CLDR, locales without territory information (or where
1410    needed, script information) provide data appropriate for what
1411    is called the <i>default content locale</i>. For example, the
1412    <i>en</i> locale contains data appropriate for <i>en-US</i>,
1413    while the <i>zh</i> locale contains content for
1414    <i>zh-Hans-CN</i>, and the <i>zh-Hant</i> locale contains
1415    content for <i>zh-Hant-TW</i>. The default content locales
1416    themselves thus inherit all of their contents, and are
1417    empty.</p>
1418    <p>The choice of content is typically based on the largest
1419    literate population of the possible choices. Thus if an
1420    implementation only provides the base language (such as
1421    <i>en</i>), it will still get a complete and consistent set of
1422    data appropriate for a locale which is reasonably likely to be
1423    the one meant. Where other information is available, such as
1424    independent country information, that information can always be
1425    used to pick a different locale (such as <i>en-CA</i> for a
1426    website targeted at Canadian users).</p>
1427    <p>If an implementation is to use a different default locale,
1428    then the data needs to be <i>pivoted</i>; all of the data from
1429    the CLDR for the current default locale pushed out to the
1430    locales that inherit from it, then the new default content
1431    locale's data moved into the base. There are tools in CLDR to
1432    perform this operation.</p>
1433    <p>For the relationship between <span>Inheritance,
1434    DefaultContent, LikelySubtags, and LocaleMatching, see
1435    <strong><em>Section 4.2.6 <a href=
1436    "tr35.html#Inheritance_vs_Related">Inheritance vs Related
1437    Information</a></em></strong>.</span></p>
1438    <!-- end section 9 supp metadata -->
1439    <!-- begin section 10 the metadata element -->
1440    <h2>10 <a name="Metadata_Elements" href="#Metadata_Elements"
1441    id="Metadata_Elements">Locale Metadata
1442    Element<strike>s</strike></a></h2>
1443    <p>Note: This section refers to the per-locale
1444    <code>&lt;metadata&gt;</code> element, containing metadata
1445    about a particular locale. This is in contrast to the <a href=
1446    "#Appendix_Supplemental_Metadata"><em>Supplemental</em>
1447    Metadata</a>, which is in the supplemental tree and is not
1448    specific to a locale.</p>
1449    <p class="dtd">&lt;!ELEMENT metadata ( alias | ( casingData?,
1450    special* ) ) &gt;<br>
1451    &lt;!ELEMENT casingData ( alias | ( casingItem*, special* ) )
1452    &gt;<br>
1453    &lt;!ELEMENT casingItem ( #PCDATA ) &gt;<br>
1454    &lt;!ATTLIST casingItem type CDATA #REQUIRED &gt;<br>
1455    &lt;!ATTLIST casingItem override (true | false) #IMPLIED
1456    &gt;<br>
1457    &lt;!ATTLIST casingItem forceError (true | false) #IMPLIED
1458    &gt;<br></p>
1459    <p>The &lt;metadata&gt; element contains metadata about the
1460    locale for use by the Survey Tool or other tools in checking
1461    locale data; this data is not intended for export as part of
1462    the locale itself.</p>
1463    <p>The &lt;casingItem&gt; element specifies the capitalization
1464    intended for the majority of the data in a given category with
1465    the locale. The purpose is so that warnings can be issued to
1466    translators that anything deviating from that capitalization
1467    should be carefully reviewed. Its type attribute has one of the
1468    values used for the &lt;contextTransformUsage&gt; element
1469    above, with the exception of the special value "all"; its value
1470    is one of the following:</p>
1471    <ul>
1472      <li>lowercase</li>
1473      <li>titlecase</li>
1474    </ul>
1475    <p>The &lt;casingItem&gt; data is generated by a tool based on
1476    the data available in CLDR. In cases where the generated casing
1477    information is incorrect and needs to be manually edited, the
1478    override attribute is set to "true" so that the tool will not
1479    override the manual edits. When the casing information is known
1480    to be both correct and something that should apply to all
1481    elements of the specified type in a given locale, the forceErr
1482    attribute may be set to "true" to force an error instead of a
1483    warning for items that do not match the casing information.</p>
1484    <!-- end section Info-A metadta element -->
1485    <!-- begin section 11 Version Information -->
1486    <h2>11 <a name="Version_Information" href=
1487    "#Version_Information" id="Version_Information">Version
1488    Information</a></h2>
1489    <p class="dtd">&lt;!ELEMENT version EMPTY &gt;<br>
1490    &lt;!ATTLIST version cldrVersion CDATA #FIXED "27" &gt;<br>
1491    &lt;!ATTLIST version unicodeVersion CDATA #FIXED "7.0.0"
1492    &gt;<br></p>
1493    <p>The &lt;cldrVersion&gt; attribute defines the CLDR version
1494    for this data, as published on <a href=
1495    "http://cldr.unicode.org/index/downloads">CLDR
1496    Releases/Downloads</a></p>
1497    <p>The &lt;unicodeVersion&gt; attribute defines the version of
1498    the Unicode standard that is used to interpret data.
1499    Specifically, some data elements such as exemplar characters
1500    are expressed in terms of UnicodeSets. Since UnicodeSets can be
1501    expressed in terms of Unicode properties, their meaning depend
1502    on the Unicode version from which property values are
1503    derived.</p><!-- end section Version Information metadta element -->
1504    <h2>12 <a name="Parent_Locales" href="#Parent_Locales" id=
1505    "Parent_Locales">Parent Locales</a></h2>
1506    <p>The parentLocales data is supplemental data, but is
1507    described in detail in the <a href=
1508    "tr35.html#Parent_Locales">core specification section
1509    4.1.3.</a></p>
1510    <h2>13 <a href="#Unit_Conversion" name="Unit_Conversion">Unit Conversion</a></h2>
1511
1512
1513<p>
1514The unit conversion data (<a href="https://github.com/unicode-org/cldr/blob/master/common/supplemental/units.xml">units.xml</a>) provides the data for converting all of the cldr unit identifiers to base units, and back. That allows conversion between any two convertible units, such as two units of length. For any two convertible units (such as acre and dunum) the first can be converted to the base unit (square-meter), then that base unit can be converted to the second unit.
1515</p>
1516<p class="dtd">
1517&lt;!ELEMENT unitConstants ( unitConstant* ) >
1518</p>
1519<p class="dtd">
1520&lt;!ELEMENT unitConstant EMPTY >
1521</p>
1522<p class="dtd">
1523&lt;!ATTLIST unitConstant constant NMTOKEN #REQUIRED >
1524</p>
1525<p class="dtd">
1526&lt;!ATTLIST unitConstant value CDATA #REQUIRED >
1527</p>
1528<p class="dtd">
1529&lt;!ATTLIST unitConstant status NMTOKEN #IMPLIED >
1530</p>
1531<h2>Constants</h2>
1532
1533
1534<p>
1535The data uses a small set of constants for readability, such as:
1536</p>
1537		  <blockquote>
1538<p>
1539&lt;unitConstant constant=<em>"ft_to_m"</em> value=<em>"0.3048"</em>/>
1540</p>
1541<p>
1542&lt;unitConstant constant=<em>"ft2_to_m2"</em> value=<em>"ft_to_m*ft_to_m"</em>/>
1543</p>
1544</blockquote>
1545<p>
1546The order of the elements in the file is significant.
1547</p>
1548<p>
1549
1550</p>
1551<p>
1552Each constant can have a value based on simple expressions using numbers, previous constants, plus the operators * and /. Parentheses are not allowed. The operator * binds more tightly than /, which may be unexpected. Thus  a * b / c * d is interpreted as (a * b) / (c * d). A consequent of that is that a * b / c * d = a * b / c / d. In the value, the numbers represent rational values. So 0.3048 is interpreted as exactly 3048 / 10000.
1553</p>
1554<p>
1555In the above case, ft2-to-m2 is a conversion constant for going from square feet to square meters. The expression evaluates to 0.09290304. Where the constants cannot be expressed as rationals, or where their interpretation is fluid, that is marked with a status value:
1556</p>
1557<blockquote>
1558&lt;unitConstant constant=<em>"PI"</em> value=<em>"411557987 / 131002976"</em> status=<em>'approximate'</em>/>
1559</blockquote>
1560<p>
1561In such cases, software may decide to use different values for accuracy.
1562</p>
1563<p>
1564An implementation need not use rationals directly for conversion; it could use doubles, for example, if only double accuracy is needed.
1565</p>
1566<h2>Conversion Data</h2>
1567
1568
1569<p class="dtd">
1570&lt;!ELEMENT convertUnits ( convertUnit* ) >
1571</p>
1572<p class="dtd">
1573&lt;!ELEMENT convertUnit EMPTY >
1574</p>
1575<p class="dtd">
1576&lt;!ATTLIST convertUnit source NMTOKEN #REQUIRED >
1577</p>
1578<p class="dtd">
1579&lt;!ATTLIST convertUnit baseUnit NMTOKEN #REQUIRED >
1580</p>
1581<p class="dtd">
1582&lt;!ATTLIST convertUnit factor CDATA #IMPLIED >
1583</p>
1584<p class="dtd">
1585&lt;!ATTLIST convertUnit offset CDATA #IMPLIED >
1586</p>
1587<p>
1588The conversion data provides the data for converting all of the cldr unit identifiers to base units, and back. That allows conversion between any two convertible units, such as two units of length. For any two convertible units (such as acre and dunum) the first can be converted to the base unit (square-meter), then that base unit can be converted to the second unit.
1589</p>
1590<p>
1591The data is expressed as conversions to the base unit. The information can also be used for the conversion back.
1592</p>
1593<p>
1594Examples:
1595</p>
1596	<blockquote>
1597<p>
1598&lt;convertUnit source=<em>'carat'</em> baseUnit=<em>'kilogram'</em> factor=<em>'0.0002'</em>/>
1599</p>
1600<p>
1601&lt;convertUnit source=<em>'gram'</em> baseUnit=<em>'kilogram'</em> factor=<em>'0.001'</em>/>
1602</p>
1603<p>
1604&lt;convertUnit source=<em>'ounce'</em> baseUnit=<em>'kilogram'</em> factor=<em>'lb_to_kg/16'</em> systems=<em>"ussystem uksystem"</em>/>
1605</p>
1606<p>
1607&lt;convertUnit source=<em>'fahrenheit'</em> baseUnit=<em>'kelvin'</em> factor=<em>'5/9'</em> offset=<em>'2298.35/9'</em> systems=<em>"ussystem uksystem"</em>/>
1608</p>
1609</blockquote>
1610<p>
1611For example, to convert from 3 carats to kilograms, the factor 0.0002 is used, resulting in 0.0006. To convert between carats and ounces, first the carets are converted to kilograms, then the kilograms to ounces (by reversing the mapping).
1612</p>
1613<p>
1614The factor and offset use the same structure as in the value in unitConstant ; in particular, * binds more tightly than /.
1615</p>
1616<p>
1617The conversion may also require an offset, such as the following:
1618</p>
1619<blockquote>
1620&lt;convertUnit source=<em>'fahrenheit'</em> baseUnit=<em>'kelvin'</em> factor=<em>'5/9'</em> offset=<em>'2298.35/9'</em> systems=<em>"ussystem uksystem"</em>/>
1621</blockquote>
1622<p>
1623The factor and offset can be simple expressions, just like the values in the unitConstants.
1624</p>
1625<p>
1626Where a factor is not present, the value is 1; where an offset is not present, the value is 0. The systems attribute indicates where the value is not metric; currently the attribute values just include the <em>ussystem</em> and <em>uksystem</em> systems. The term <em>metric</em> is used in a broad sense, and includes units that are simple multiples of metric units, such as pound-metric (= ½ kilogram).
1627</p>
1628<p>
1629For complex units, such as <em>pound-force-per-square-inch</em>, the conversions are computed by combining the conversions of each of the simple units: <em>pound-force</em> and <em>inch</em>. Because the conversions in convertUnit are reversible, the computation can go from complex source unit to complex base unit to complex target units.
1630</p>
1631<p>
1632Here is an example:
1633</p>
1634	<blockquote>
1635<p><strong>
163650 foot-per-minute ⟹ X mile-per-hour</strong> </p>
1637<p>
1638	⟹ source: 1 foot
1639</p>
1640<p>
1641	⟹ factor: 381 / 1250 = 0.3048 meter
1642</p>
1643<p>
1644	⟹ source: 1 minute
1645</p>
1646<p>
1647	⟹ factor: 60 second
1648</p>
1649<p>
1650 ⟹ intermediate: 127 / 500 = 0.254 meter-per-second
1651</p>
1652<p>
1653 ⟹ mile-per-hour
1654</p>
1655<p>
1656	⟹ source: 1 mile
1657</p>
1658<p>
1659	⟹ factor: 201168 / 125 = 1609.344 meter
1660</p>
1661<p>
1662	⟹ source: 1 hour
1663</p>
1664<p>
1665	⟹ factor: 3600 second
1666</p>
1667<p>
1668 ⟹ target: 25 / 44 ≅ 0.5681818 mile-per-hour
1669</p>
1670</blockquote>
1671<p>
1672<strong>Reciprocals. </strong>When you convert a complex unit to another complex unit, you typically convert the source to a complex base unit (like <em>meter-per-cubic-meter</em>), then convert the latter backwards to the desired target. However, there may not be a matching conversion from that complex base unit to the desired target unit. That is the case for converting from <em>mile-per-gallon</em> (used in the US) to <em>liter-per-100-kilometer</em> (used in Europe and elsewhere). When that happens, the reciprocal of the complex base unit is used, as in the following example:
1673</p>
1674	<blockquote>
1675<p><strong>
167650 mile-per-gallon ⟹ X liter-per-100-kilometer
1677</strong></p>
1678<p>
1679	⟹ source: 1 mile
1680</p>
1681<p>
1682	⟹ factor: 201168 / 125 = 1609.344 meter
1683</p>
1684<p>
1685	⟹ source: 1 gallon
1686</p>
1687<p>
1688	⟹ factor: 473176473 / 125000000000 ≅ 0.003785412 cubic-meter
1689</p>
1690<p>
1691 ⟹ intermediate: 2400000000000 / 112903 ≅ 2.125719E7 meter-per-cubic-meter
1692</p>
1693<p>
1694 ⟹ liter-per-100-kilometer
1695</p>
1696<p>
1697	⟹ source: 1 liter
1698</p>
1699<p>
1700	⟹ factor: 1 / 1000 = 0.001 cubic-meter
1701</p>
1702<p>
1703	⟹ source: 1 100-kilometer
1704</p>
1705<p>
1706	⟹ factor: 100000 meter
1707</p>
1708<p>
1709<strong> ⟹ 1/intermediate: 112903 / 2400000000000 ≅ 4.704292E-8 cubic-meter-per-meter</strong>
1710</p>
1711<p>
1712 ⟹ target: 112903 / 24000 ≅ 4.704292 liter-per-100-kilometer
1713</p>
1714</blockquote>
1715<p>
1716This applies to more than just these cases: one can convert from any unit to related reciprocals as in the following example:
1717</p>
1718	<blockquote>
1719<p><strong>
172050 foot-per-minute ⟹ X hour-per-mile</strong> </p>
1721<p>
1722	⟹ source: 1 foot
1723</p>
1724<p>
1725	⟹ factor: 381 / 1250 = 0.3048 meter
1726</p>
1727<p>
1728	⟹ source: 1 minute
1729</p>
1730<p>
1731	⟹ factor: 60 second
1732</p>
1733<p>
1734 ⟹ intermediate: 127 / 500 = 0.254 meter-per-second
1735</p>
1736<p>
1737 ⟹ hour-per-mile
1738</p>
1739<p>
1740	⟹ source: 1 hour
1741</p>
1742<p>
1743	⟹ factor: 3600 second
1744</p>
1745<p>
1746	⟹ source: 1 mile
1747</p>
1748<p>
1749	⟹ factor: 201168 / 125 = 1609.344 meter
1750</p>
1751<p>
1752<strong> ⟹ 1/intermediate: 500 / 127 ≅ 3.937008 second-per-meter</strong>
1753</p>
1754<p>
1755 ⟹ target: 44 / 25 = 1.76 hour-per-mile
1756</p>
1757</blockquote>
1758<h3>Exceptional Cases</h3>
1759
1760
1761<h4>Identities</h4>
1762
1763
1764<p>
1765For completeness, identity mappings are also provided for the base units themselves, such as:
1766</p>
1767<blockquote>
1768        &lt;convertUnit source=<em>'meter'</em> baseUnit=<em>'meter'</em>/>
1769</blockquote>
1770<h4>Aliases</h4>
1771
1772<p>
1773In a few instances the old identifiers are deprecated in favor of regular syntax. Implementations should handle both on input:
1774</p>
1775<blockquote>
1776<p>
1777&lt;unitAlias type=<em>"meter-per-second-squared"</em> replacement=<em>"meter-per-square-second"</em> reason=<em>"deprecated"</em>/>
1778</p>
1779<p>
1780&lt;unitAlias type=<em>"liter-per-100kilometers"</em> replacement=<em>"liter-per-100-kilometer"</em> reason=<em>"deprecated"</em>/>
1781</p>
1782<p>
1783&lt;unitAlias type=<em>"pound-foot"</em> replacement=<em>"pound-force-foot"</em> reason=<em>"deprecated"</em>/>
1784</p>
1785<p>
1786&lt;unitAlias type=<em>"pound-per-square-inch"</em> replacement=<em>"pound-force-per-square-inch"</em> reason=<em>"deprecated"</em>/>
1787</p>
1788</blockquote>
1789<p>
1790These use the standard alias elements in XML, and are also included in the <a href="https://github.com/unicode-org/cldr/blob/master/common/supplemental/units.xml">units.xml</a> file.
1791</p>
1792<h4>“Duplicate” Units</h4>
1793
1794
1795<p>
1796Some CLDR units are provided simply because they have different names in some languages. For example, year and year-person, or foodcalorie and kilocalorie. One CLDR unit is not convertible (temperature-generic) it is only used for the translation (where the exact unit would be understood by context).
1797</p>
1798<h4>Discarding Offsets</h4>
1799
1800
1801<p>
1802The temperature units are special. When they represent a scale, they have an offset. But where they represent an amount, such as in complex units, they do not. So celsius-per-second is the same as kelvin-per-second.
1803</p>
1804<h3>Unresolved Units</h3>
1805
1806
1807<p>
1808Some SI units contain the same units in the numerator and denominator, so those cannot be resolved. For example, if cubic-meter-per-meter were always resolved, then <em>consumption</em> (like “liter-per-kilometer”) could not be distinguished from <em>area</em> (square-meter).
1809</p>
1810<p>
1811However, in conversion, it may be necessary to resolve them in order to find a match. For example, kilowatt-hour maps to the base unit kilogram-square-meter-second-per-cubic-second, but that needs to be resolved to kilogram-square-meter-per-square-second in order matched against an  <em>energy.</em>
1812</p>
1813<h2>Quantities and Base Units</h2>
1814
1815
1816<p class="dtd">
1817&lt;!ELEMENT unitQuantities ( unitQuantity* ) >
1818</p>
1819<p class="dtd">
1820&lt;!ELEMENT unitQuantity EMPTY >
1821</p>
1822<p class="dtd">
1823&lt;!ATTLIST unitQuantity baseUnit NMTOKEN #REQUIRED >
1824</p>
1825<p class="dtd">
1826&lt;!ATTLIST unitQuantity quantity NMTOKENS #REQUIRED >
1827</p>
1828<p class="dtd">
1829&lt;!ATTLIST unitQuantity status NMTOKEN #IMPLIED >
1830</p>
1831<p>
1832Conversion is supported between comparable units. Those can be simple units, such as length, or more complex ‘derived’ units that are built up from <em>base units</em>. The &lt;unitQuantities> element provides information on the base units used for conversion. It also supplies information about their <em>quantity</em>: mass, length, time, etc., and whether they are simple or not. </p>
1833<p>Examples: </p>
1834	<blockquote>
1835<p>
1836&lt;unitQuantity baseUnit=<em>'kilogram'</em> quantity=<em>'mass'</em> status=<em>'simple'</em>/>
1837</p>
1838<p>
1839&lt;unitQuantity baseUnit=<em>'meter-per-second'</em> quantity=<em>'speed'</em>/>
1840</p>
1841</blockquote>
1842<p>
1843The order of the elements in the file is significant, since it is used in
1844
1845<a href="#Unit_Identifier_Normalization" >Unit_Identifier_Normalization</a>.
1846<p>
1847  The quantity values themselves are informative. Therer mayreflecting that <em>force per area</em> can be referenced as either <em>pressure</em> or <em>stress</em>, for example). The quantity for a complex unit that has a reciprocal is formed by prepending “inverse-” to the quantity, such as <em>inverse-consumption.</em>
1848</p>
1849<p>
1850The base units for the quantities and the quantities themselves are based on <a href="https://www.nist.gov/pml/special-publication-811">NIST special publication 811</a> and the earlier <a href="https://www.govinfo.gov/content/pkg/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4/pdf/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4.pdf">NIST Special Publication 1038</a>. In some cases, a different unit is chosen for the base. For example, a <em>revolution</em> (360°) is chosen for the base unit for angles instead of the SI <em>radian</em>, and <em>item</em> instead of the SI <em>mole</em>.  Additional base units are added where necessary, such as <em>bit</em> and <em>pixel</em>.
1851</p>
1852<p>
1853This data is not necessary for conversion, but is needed for
1854
1855  <a href="#Unit_Identifier_Normalization" >Unit_Identifier_Normalization</a>. Some of the unitQuantity elements are not needed to convert CLDR units, but are included for completeness. Example:
1856</p>
1857
1858<blockquote>
1859        &lt;unitQuantity baseUnit=<em>'ampere-per-square-meter'</em> quantity=<em>'current-density'</em>/>
1860</blockquote>
1861<h3>UnitType vs Quantity</h3>
1862
1863
1864<p>
1865The unitType (as in “length-meter”) is not the same as the quantity. It is often broader: for example, the unitType <em>electric</em> corresponds to the quantities <em>electric-current, electric-resistance, </em>and<em> voltage</em>. The unitType itself is also informative, and can be dropped from a long unit identifier to get a still-unique short unit identifier.
1866</p>
1867<h3><a href="#Unit_Identifier_Normalization" name="Unit_Identifier_Normalization">Unit Identifier Normalization</a></h3>
1868
1869
1870<p>
1871There are many possible ways to construct complex units. For comparison of unit identifiers, an implementation can normalize in the following way:
1872</p>
1873		  <ol>
1874
1875<li>Convert all but the first -per- to simple multiplication. The result then has the format of /numerator ( -per- denominator)?/ <ul>
1876
1877 <li>foot-per-second-per-second ⇒  foot-per-second-second
1878</ul>
1879</li>
1880  <li>Within each of the numerator and denominator:</li>
1881  <li>Convert multiple instances of a unit into the appropriate power.
1882  <ul>
1883    <li>foot-per-second-second ⇒ foot-per-square-second
1884    </li>
1885    <li>kilogram-meter-kilogram ⇒ meter-square-kilogram
1886    </li>
1887  </ul>
1888  <li>For each single unit, disregarding prefixes and powers, find its base unit using &lt;convertUnit>, then get the order of that base unit among the unitQuantity elements in the <a href="https://github.com/unicode-org/cldr/blob/master/common/supplemental/units.xml">units.xml</a>. Then sort the single units by that order.
1889      <ul>
1890      <li>meter-square-kilogram => square-kilogram-meter
1891      </li>
1892      <li>meter-square-gram ⇒ square-gram-meter
1893      </li>
1894    </ul>
1895    </li>
1896  <li>If two single units have the same simple unit but different SI prefixes, such as "kilometer-meter", sort the higher-power SI prefixes first.  </li>
1897  <li>Within private-use single units, sort by the simple unit alphabetically.</li>
1898</ol>
1899
1900<p>
1901  The examples in #4 are due to the following ordering of the unitQuantity elements:
1902</p><ol>
1903
1904<li>&lt;unitQuantity baseUnit=<em>'candela'</em> quantity=<em>'luminous-intensity'</em> status=<em>'simple'</em>/>
1905<li>&lt;unitQuantity baseUnit=<em>'kilogram'</em> quantity=<em>'mass'</em> status=<em>'simple'</em>/>
1906<li>&lt;unitQuantity baseUnit=<em>'meter'</em> quantity=<em>'length'</em> status=<em>'simple'</em>/>
1907<li>…</li></ol>
1908
1909<h2>Mixed Units</h2>
1910
1911
1912<p>
1913Mixed units, or unit sequences, are units with the same base unit  which are listed in sequence.  Common examples are feet and inches, meters and centimeters, and hours, minutes, and seconds.  Mixed unit identifiers are expressed using the "-and-" infix, as in "foot-and-inch", "meter-and-centimeter", and "hour-and-minute-and-second".
1914</p>
1915<p>
1916Scalar values for mixed units are expressed in the largest unit, according to the sort order discussed above in "Normalization".  For example, numbers for "foot-and-inch" are expressed in feet.
1917</p>
1918<p>
1919Mixed units are expected to be rendered in the order of the tokens in the identifier.  For example, the value 1.25 with the identifier "foot-and-inch" should be rendered as "1 foot and 3 inches" and 1.25 inch-and-foot should be rendered as “3 inches and 1 foot". <strong>NOTE:  </strong>the correct application of this may require adding locales to the regions attribute set.
1920</p>
1921
1922<h2>Testing</h2>
1923
1924
1925<p>
1926The <a href="https://github.com/unicode-org/cldr/blob/master/common/testData/units/unitsTest.txt">unitsTest.txt</a> file supplies a list of all the CLDR units with conversions, for testing implementations. Instructions for use are supplied in the header of the file.
1927</p>
1928    <h2>14 <a href="#Unit_Preferences" name="Unit_Preferences">Unit Preferences</a></h2>
1929
1930
1931<p>
1932Different locales have different preferences for which unit or combination of units is used for a particular usage, such as measuring a person’s height. This is more fine-grained than merely a preference for metric versus US or UK measurement systems. For example, one locale may use meters alone, while another may use centimeters alone or a combination of meters and centimeters; a third may use inches alone, or (informally) a combination of feet and inches.
1933</p>
1934<p>
1935The CLDR data is intended to map from a particular usage — e.g. measuring the height of a person or the fuel consumption of an automobile — to the unit or combination of units typically used for that usage in a given region. Considerations for such a mapping include:
1936</p><ul>
1937
1938<li>The list of possible usages large and open-ended. The intent here is to start with a small set for which there is an urgent need, and expand as necessary.
1939<li>Even for a given usage such a measuring a road distance, there are multiple ranges in use. For example, one set of units may be used for indicating the distance to the next city (kilometers or miles), while another may be used for indicating the distance to the next exit (meters, yards, or feet).
1940<li>There are also differences between more formal usage (official signage, medical records) and more informal usage (conversation, texting).
1941<li>For some usages, the measurement may be expressed using a sequence of units, such as “1 meter, 78 centimeters” or “12 stone, 2 pounds”.</li></ul>
1942
1943<p>
1944The DTD structure is as follows:
1945</p>
1946<p class="dtd">
1947&lt;!ELEMENT unitPreferenceData ( unitPreferences* ) >
1948</p>
1949<p class="dtd">
1950&lt;!ELEMENT unitPreferences ( unitPreference* ) >
1951</p>
1952<p class="dtd">
1953&lt;!ATTLIST unitPreferences category NMTOKEN #REQUIRED >
1954</p>
1955<p class="dtd">
1956&lt;!ATTLIST unitPreferences usage NMTOKENS #REQUIRED >
1957</p>
1958<p class="dtd">
1959&lt;!ELEMENT unitPreference ( #PCDATA ) >
1960</p>
1961<p class="dtd">
1962&lt;!ATTLIST unitPreference regions NMTOKENS #REQUIRED >
1963</p>
1964<p class="dtd">
1965&lt;!ATTLIST unitPreference geq NMTOKEN #IMPLIED >
1966</p>
1967<p class="dtd">
1968&lt;!ATTLIST unitPreference skeleton CDATA #IMPLIED >
1969</p>
1970
1971<table>
1972  <tr>
1973   <td>category
1974   </td>
1975   <td>A unit quantity, such as “area” or “length”. See Section 13 Unit Conversion
1976   </td>
1977  </tr>
1978  <tr>
1979   <td>usage
1980   </td>
1981   <td>A type of usage, such as person-height.
1982   </td>
1983  </tr>
1984  <tr>
1985   <td>regions
1986   </td>
1987   <td>One or more region identifiers (macroregions or regions), subdivision identifiers, or language identifiers, such as 001, US, usca, and de-CH.
1988   </td>
1989  </tr>
1990  <tr>
1991   <td>geq
1992   </td>
1993   <td>A threshold value, in a unit determined by the unitPreference element value. The unitPreference element is only used for values higher than this value (and lower than any higher value).
1994<p>
1995The value must be non-negative. For picking negative units (-3 meters), use the absolute value to pick the unit.
1996   </td>
1997  </tr>
1998  <tr>
1999   <td>skeleton
2000   </td>
2001   <td>A skeleton in the ICU number format syntax, that can be used to format unit
2002   </td>
2003  </tr>
2004</table>
2005
2006<p><strong>Note:</strong> As of CLDR 37, the &lt;unitPreference&gt; geq attribute replaces
2007the now-deprecated &lt;unitPreferences&gt; scope attribute.</p>
2008
2009<p>
2010Example:
2011</p>
2012		  <blockquote>
2013<p>
2014        &lt;unitPreferences category=<em>"length"</em> usage=<em>"default"</em>>
2015</p>
2016			  <blockquote>
2017<p>
2018            &lt;unitPreference regions=<em>"001"</em>>kilometer&lt;/unitPreference>
2019</p>
2020<p>
2021            &lt;unitPreference regions=<em>"001"</em>>meter&lt;/unitPreference>
2022</p>
2023<p>
2024            &lt;unitPreference regions=<em>"001"</em>>centimeter&lt;/unitPreference>
2025</p>
2026<p>
2027            &lt;unitPreference regions=<em>"US GB"</em>>mile&lt;/unitPreference>
2028</p>
2029<p>
2030            &lt;unitPreference regions=<em>"US GB"</em>>foot&lt;/unitPreference>
2031</p>
2032<p>
2033            &lt;unitPreference regions=<em>"US GB"</em>>inch&lt;/unitPreference>
2034</p>
2035		    </blockquote>
2036<p>
2037        &lt;/unitPreferences>
2038</p>
2039</blockquote>
2040<p>
2041The above information says that for default usage, in the US people use mile, foot, and inch, where people in the rest of the world (001) use kilometer, meter, and centimeter.
2042Take another example:</p>
2043		  <blockquote>
2044<p>
2045        &lt;unitPreferences category=<em>"length"</em> usage=<em>"road"</em>>
2046</p>
2047			  <blockquote>
2048<p>
2049            &lt;unitPreference regions=<em>"001"</em> geq=<em>"0.9"</em>>kilometer&lt;/unitPreference>
2050</p>
2051<p>
2052            &lt;unitPreference regions=<em>"001"</em> geq=<em>"300.0"</em> skeleton=<em>"precision-increment/50"</em>>meter&lt;/unitPreference>
2053</p>
2054<p>
2055            &lt;unitPreference regions=<em>"001"</em> skeleton=<em>"precision-increment/10"</em>>meter&lt;/unitPreference>
2056</p>
2057<p>
2058            &lt;unitPreference regions=<em>"001"</em>>meter&lt;/unitPreference>
2059</p>
2060<p>
2061            &lt;unitPreference regions=<em>"US"</em> geq=<em>"0.5"</em>>mile&lt;/unitPreference>
2062</p>
2063<p>
2064            &lt;unitPreference regions=<em>"US"</em> geq=<em>"100.0"</em> skeleton=<em>"precision-increment/50"</em>>foot&lt;/unitPreference>
2065</p>
2066<p>
2067            &lt;unitPreference regions=<em>"US"</em> skeleton=<em>"precision-increment/10"</em>>foot&lt;/unitPreference>
2068</p>
2069<p>
2070            &lt;unitPreference regions=<em>"GB"</em> geq=<em>"0.5"</em>>mile&lt;/unitPreference>
2071</p>
2072<p>
2073            &lt;unitPreference regions=<em>"GB"</em> geq=<em>"100.0"</em> skeleton=<em>"precision-increment/50"</em>>yard&lt;/unitPreference>
2074</p>
2075<p>
2076            &lt;unitPreference regions=<em>"GB"</em>>yard&lt;/unitPreference>
2077</p>
2078<p>
2079            &lt;unitPreference regions=<em>"SE"</em> geq=<em>"0.1"</em>>mile-<span style="text-decoration:underline;">scandinavian</span>&lt;/unitPreference>
2080</p>
2081		    </blockquote>
2082<p>
2083        &lt;/unitPreferences>
2084</p>
2085</blockquote>
2086<p>
2087The intended usage is to take the measure to be formatted, and the desired category, usage, and region and find the best match as follows.
2088</p>
2089<ul>
2090
2091<li>First, see if there is an exact match, producing a list of one or more unitPreference elements. For example, length/road/GB has a match above, giving
2092<blockquote>
2093<p>
2094            &lt;unitPreference regions=<em>"GB"</em> geq=<em>"0.5"</em>>mile&lt;/unitPreference>
2095</p>
2096<p>
2097            &lt;unitPreference regions=<em>"GB"</em> geq=<em>"100.0"</em> skeleton=<em>"precision-increment/50"</em>>yard&lt;/unitPreference>
2098</p>
2099<p>
2100            &lt;unitPreference regions=<em>"GB"</em>>yard&lt;/unitPreference>
2101</p>
2102  </blockquote>
2103</li>
2104		  <li>If there is no match for the category, then the data is not available.</li>
2105		  <li>Otherwise, given the category: <ul>
2106
2107 <li>If there is an exact match for the usage, but not for the region, try region=”001”.</li></ul>
2108
2109  <li>The specification allows for <a href="https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html">containment regions</a> , <a href="https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_subdivisions.html">region subdivisions</a>.
2110  <li>While in version 37 only 001 is used, in the future the data may contain others.
2111  <li>The fallback is: subdivision2 ⇒ subdivision1 ⇒ region/country ⇒ subcontinent ⇒ continent ⇒ world
2112  <li>Example:
2113<blockquote>
2114<table>
2115  <tr>
2116   <td>
2117<strong>Region/subdivision</strong>
2118   </td>
2119   <td><strong>Code</strong>
2120   </td>
2121  </tr>
2122  <tr>
2123   <td>Blackpool
2124   </td>
2125   <td>gbbpl
2126   </td>
2127  </tr>
2128  <tr>
2129   <td>England
2130   </td>
2131   <td>gbeng
2132   </td>
2133  </tr>
2134  <tr>
2135   <td>United Kingdom
2136   </td>
2137   <td>GB
2138   </td>
2139  </tr>
2140  <tr>
2141   <td>Northern Europe
2142   </td>
2143   <td>154
2144   </td>
2145  </tr>
2146  <tr>
2147   <td>Europe
2148   </td>
2149   <td>150
2150   </td>
2151  </tr>
2152  <tr>
2153   <td>World
2154   </td>
2155   <td>001
2156   </td>
2157  </tr>
2158</table>
2159</blockquote>
2160
2161 <li>If there is an exact match for the region, but not for the usage,   <ul>
2162
2163  <li>If the usage has multiple parts (eg land-agriculture-grain) drop the last part (eg land-agriculture)
2164  <li>Repeat dropping the last part and trying the result (eg land)
2165  <li>If you eliminate all of them, try usage=”default”.
2166 <li>If there is no exact match for either one, try usage=”default”, region=”001”. That will always match.</li> </ul>
2167	 </li> </ul>
2168
2169<p>
2170Once you have a list of unitPreference elements, find the applicable unitPreference. For a given category, usage, and set of regions (eg “US  GB”), the units are ordered from largest to smallest.
2171</p>
2172<ul>
2173<li>The geq item gives the value for the unit in the element value (or for the largest unit for mixed units). For example,<ul>
2174	<li>...geq=<em>"0.5"</em>>mile&lt;...  means 0.9 kilometers</li>
2175 <li>...geq=<em>"100.0"</em>>foot:inch&lt;...  means 100 feet</li></ul></li>
2176<li>If there is no geq attribute, then the implicit value is 1.0.</li>
2177<li>Implementations will probably convert the values into the base units, so that the comparison is fast. Thus the above would be converted internally to something like: <ul>
2178
2179 <li>≥ 804.672 meters ⇒ mile</li>
2180	<li>≥ 30.48 meters ⇒ foot:inch</li></ul></li>
2181<li>Search for the first matching unitPreference for the input measure. If there is no match (eg &lt; 100 feet in the above example), take the last unitPreference.
2182That is, the last unitPreference is effectively geq=&quot;0&quot;</li>
2183</ul>
2184
2185<p>
2186Once a matching unitPreference element is found:
2187</p><ul>
2188
2189<li>The unit is the element value
2190<li>The skeleton (if there is one) supplies formatting information for the unit. API settings may allow that to be overridden.
2191  <ul>
2192    <li>The syntax and semantics for the skeleton value are defined by the <a href="https://unicode-org.github.io/icu/userguide/format_parse/numbers/skeletons.html">ICU Number Skeletons</a> document.</li>
2193  </ul>
2194<li>If the unit is mixed (eg foot:inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers.
2195<li>If the skeleton is missing, the default is skeleton=&quot;<strong>precision-integer/@@*</strong>&quot;. However, the client can also override or tune the number formatting.</li></ul>
2196
2197<h3>Constraints</h3>
2198
2199<ul>
2200
2201<li>For a given category, there is always a “default” usage.
2202<li>For a given category, and usage: <ul>
2203
2204 <li>There is always a 001 region.
2205 <li>None of the sets of regions can overlap. That is, you can’t have “US” on one line and “US GB” on another.  You <em>can</em> have two lines with “US”, for different sizes of units.</li></ul></li>
2206<li>For a given category, usage, and region-set <ul>
2207	<li>The unitPreferences are in descending order.</li>
2208	</ul>
2209  </li>
2210</ul>
2211
2212<h3>Caveats</h3>
2213		  <p>The extended unit support is still being developed further. See the Known Issues on the release page for futher information.</p>
2214
2215    <hr>
2216    <p class="copyright">Copyright © 2001–2020 Unicode, Inc. All
2217    Rights Reserved. The Unicode Consortium makes no expressed or
2218    implied warranty of any kind, and assumes no liability for
2219    errors or omissions. No liability is assumed for incidental and
2220    consequential damages in connection with or arising out of the
2221    use of the information or programs contained or accompanying
2222    this technical report. The Unicode <a href=
2223    "https://unicode.org/copyright.html">Terms of Use</a> apply.</p>
2224    <p class="copyright">Unicode and the Unicode logo are
2225    trademarks of Unicode, Inc., and are registered in some
2226    jurisdictions.</p>
2227  </div>
2228</body>
2229</html>
2230