• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1## Unicode Technical Standard #35
2
3# Unicode Locale Data Markup Language (LDML)<br/>Part 6: Supplemental
4
5|Version|46         |
6|-------|-----------|
7|Editors|Steven Loomis (<a href="mailto:srloomis@unicode.org">srloomis@unicode.org</a>) and <a href="tr35.md#Acknowledgments">other CLDR committee members|
8
9For the full header, summary, and status, see [Part 1: Core](tr35.md).
10
11### _Summary_
12
13This document describes parts of an XML format (_vocabulary_) for the exchange of structured locale data. This format is used in the [Unicode Common Locale Data Repository](https://www.unicode.org/cldr/).
14
15This is a partial document, describing only those parts of the LDML that are relevant for supplemental data. For the other parts of the LDML see the [main LDML document](tr35.md) and the links above.
16
17### _Status_
18
19<!-- _This is a draft document which may be updated, replaced, or superseded by other documents at any time.
20Publication does not imply endorsement by the Unicode Consortium.
21This is not a stable document; it is inappropriate to cite this document as other than a work in progress._ -->
22
23_This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium.
24This is a stable document and may be used as reference material or cited as a normative reference by other specifications._
25
26> _**A Unicode Technical Standard (UTS)** is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS._
27
28_Please submit corrigenda and other comments with the CLDR bug reporting form [[Bugs](https://cldr.unicode.org/index/bug-reports)].
29Related information that is useful in understanding this document is found in the [References](#References).
30For the latest version of the Unicode Standard see [[Unicode](https://www.unicode.org/versions/latest/)].
31For more information see [About Unicode Technical Reports](https://www.unicode.org/reports/about-reports.html) and the [Specifications FAQ](https://www.unicode.org/faq/specifications.html).
32Unicode Technical Reports are governed by the Unicode [Terms of Use](https://www.unicode.org/copyright.html)._
33
34## <a name="Parts" href="#Parts">Parts</a>
35
36The LDML specification is divided into the following parts:
37
38*   Part 1: [Core](tr35.md#Contents) (languages, locales, basic structure)
39*   Part 2: [General](tr35-general.md#Contents) (display names & transforms, etc.)
40*   Part 3: [Numbers](tr35-numbers.md#Contents) (number & currency formatting)
41*   Part 4: [Dates](tr35-dates.md#Contents) (date, time, time zone formatting)
42*   Part 5: [Collation](tr35-collation.md#Contents) (sorting, searching, grouping)
43*   Part 6: [Supplemental](tr35-info.md#Contents) (supplemental data)
44*   Part 7: [Keyboards](tr35-keyboards.md#Contents) (keyboard mappings)
45*   Part 8: [Person Names](tr35-personNames.md#Contents) (person names)
46*   Part 9: [MessageFormat](tr35-messageFormat.md#Contents) (message format)
47
48## <a name="Contents" href="#Contents">Contents of Part 6, Supplemental</a>
49
50* Introduction [Supplemental Data](#Supplemental_Data)
51* [Territory Data](#Territory_Data)
52  * [Supplemental Territory Containment](#Supplemental_Territory_Containment)
53  * [Subdivision Containment](#Subdivision_Containment)
54  * [Supplemental Territory Information](#Supplemental_Territory_Information)
55  * [Territory-Based Preferences](#Territory_Based_Preferences)
56    * [Preferred Units for Specific Usages](#Preferred_Units_For_Usage)
57  * [`<rgScope>`: Scope of the “rg” Locale Key](#rgScope)
58* [Supplemental Language Data](#Supplemental_Language_Data)
59* [Supplemental Language Grouping](#Supplemental_Language_Grouping)
60* [Supplemental Code Mapping](#Supplemental_Code_Mapping)
61* ~~[Telephone Code Data](#Telephone_Code_Data)~~ (Deprecated)
62* ~~[Postal Code Validation (Deprecated)](#Postal_Code_Validation)~~
63* [Supplemental Character Fallback Data](#Supplemental_Character_Fallback_Data)
64* [Coverage Levels](#Coverage_Levels)
65  * [Definitions](#Coverage_Level_Definitions)
66  * [Data Requirements](#Coverage_Level_Data_Requirements)
67  * [Default Values](#Coverage_Level_Default_Values)
68* [Supplemental Metadata](#Appendix_Supplemental_Metadata)
69  * [Supplemental Alias Information](#Supplemental_Alias_Information)
70    * Table: [Alias Attribute Values](#Alias_Attribute_Values)
71  * ~~[Supplemental Deprecated Information (Deprecated)](#Supplemental_Deprecated_Information)~~
72  * [Default Content](#Default_Content)
73* [Locale Metadata Elements](#Metadata_Elements)
74* [Version Information](#Version_Information)
75* [Parent Locales](#Parent_Locales)
76* [Unit Conversion](#Unit_Conversion)
77  * [Unit Parsing Data](#unit-parsing-data)
78  * [Unit Prefixes](#unit-prefixes)
79  * [Constants](#constants)
80  * [Conversion Data](#conversion-data)
81    * [Derived Unit System](#derived-unit-system)
82    * [Conversion Mechanisms](#conversion-mechanisms)
83    * [Exceptional Cases](#exceptional-cases)
84      * [Identities](#identities)
85      * [Aliases](#aliases)
86      * [“Duplicate” Units](#duplicate-units)
87      * [Discarding Offsets](#discarding-offsets)
88    * [Unresolved Units](#unresolved-units)
89* [Quantities and Base Units](#quantities-and-base-units)
90  * [UnitType vs Quantity](#unittype-vs-quantity)
91  * [Unit Identifier Normalization](#Unit_Identifier_Normalization)
92* [Mixed Units](#mixed-units)
93* [Testing](#testing)
94* [Unit Preferences](#Unit_Preferences)
95  * [Unit Preferences Overrides](#Unit_Preferences_Overrides)
96    * [Compute override units](#compute-override-units)
97    * [Compute  regions](#compute--regions)
98    * [Compute the category](#compute-the-category)
99  * [Unit Preferences Data](#Unit_Preferences_Data)
100    * [Examples:](#examples)
101    * [Compute the preferred output unit](#compute-the-preferred-output-unit)
102    * [Search the ranked units](#search-the-ranked-units)
103  * [Constraints](#constraints)
104    * [Examples](#examples)
105* [Unit APIs](#unit-apis)
106
107## Introduction <a name="Supplemental_Data" href="#Supplemental_Data">Supplemental Data</a>
108
109The following represents the format for additional supplemental information. This is information that is important for internationalization and proper use of CLDR, but is not contained in the locale hierarchy. It is not localizable, nor is it overridden by locale data. The current CLDR data can be viewed in the [Supplemental Charts](https://www.unicode.org/cldr/charts/46/supplemental/index.html).
110
111```xml
112<!ELEMENT supplementalData (version, generation?, cldrVersion?, currencyData?, territoryContainment?, subdivisionContainment?, languageData?, territoryInfo?, postalCodeData?, calendarData?, calendarPreferenceData?, weekData?, timeData?, measurementData?, unitPreferenceData?, timezoneData?, characters?, transforms?, metadata?, codeMappings?, parentLocales?, likelySubtags?, metazoneInfo?, plurals?, telephoneCodeData?, numberingSystems?, bcp47KeywordMappings?, gender?, references?, languageMatching?, dayPeriodRuleSet*, metaZones?, primaryZones?, windowsZones?, coverageLevels?, idValidity?, rgScope?) >
113```
114
115The data in CLDR is presently split into multiple files: supplementalData.xml, supplementalMetadata.xml, characters.xml, likelySubtags.xml, ordinals.xml, plurals.xml, telephoneCodeData.xml, genderList.xml, plus transforms (see _Part 2 [Transforms](tr35-general.md#Transforms)_ and _Part 2 [Transform Rule Syntax](tr35-general.md#Transform_Rules_Syntax)_). The split is just for convenience: logically, they are treated as though they were a single file. Future versions of CLDR may split the data in a different fashion. Do not depend on any specific XML filename or path for supplemental data.
116
117Note that [Chapter 10](#Metadata_Elements) presents information about metadata that is maintained on a per-locale basis. It is included in this section because it is not intended to be used as part of the locale itself.
118
119## <a name="Territory_Data" href="#Territory_Data">Territory Data</a>
120
121### <a name="Supplemental_Territory_Containment" href="#Supplemental_Territory_Containment">Supplemental Territory Containment</a>
122
123```xml
124<!ELEMENT territoryContainment ( group* ) >
125<!ELEMENT group EMPTY >
126<!ATTLIST group type NMTOKEN #REQUIRED >
127<!ATTLIST group contains NMTOKENS #IMPLIED >
128<!ATTLIST group grouping ( true | false ) #IMPLIED >
129<!ATTLIST group status ( deprecated, grouping ) #IMPLIED >
130```
131
132The following data provides information that shows groupings of countries (regions). The data is based on the [[UNM49](tr35.md#UNM49)]. There is one special code, `QO` , which is used for outlying areas of Oceania that are typically uninhabited. The territory containment forms a tree with the following levels:
133
134+ World
135  + Continent
136    + Subcontinent
137      + Country
138
139Excluding groupings, in this tree:
140
141*   All non-overlapping regions form a strict tree rooted at World.
142*   All leaf-nodes (country) are always at depth 4. Some of these “country” regions are actually parts of other countries, such as Hong Kong (part of China). Such relationships are not part of the containment data.
143
144For a chart showing the relationships (plus the included timezones), see the [Territory Containment Chart](https://www.unicode.org/cldr/charts/46/supplemental/territory_containment_un_m_49.html). The XML structure has the following form.
145
146```xml
147<territoryContainment>
148
149    <group type="001" contains="002 009 019 142 150"/> <!--World -->
150    <group type="011" contains="BF BJ CI CV GH GM GN GW LR ML MR NE NG SH SL SN TG"/> <!--Western Africa -->
151    <group type="013" contains="BZ CR GT HN MX NI PA SV"/> <!--Central America -->
152    <group type="014" contains="BI DJ ER ET KE KM MG MU MW MZ RE RW SC SO TZ UG YT ZM ZW"/> <!--Eastern Africa -->
153    <group type="142" contains="030 035 062 145"/> <!--Asia -->
154    <group type="145" contains="AE AM AZ BH CY GE IL IQ JO KW LB OM PS QA SA SY TR YE"/> <!--Western Asia -->
155    <group type="015" contains="DZ EG EH LY MA SD TN"/> <!--Northern Africa -->
156...
157```
158
159There are groupings that don't follow this regular structure, such as:
160
161```xml
162<group type="003" contains="013 021 029" grouping="true"/> <!--North America -->
163```
164
165These are marked with the attribute `grouping="true"`.
166
167When groupings have been deprecated but kept around for backwards compatibility, they are marked with the attribute `status="deprecated"`, like this:
168
169```xml
170<group type="029" contains="AN" status="deprecated"/> <!--Caribbean -->
171```
172
173When the containment relationship itself is a grouping, it is marked with the attribute `status="grouping"`, like this:
174
175```xml
176<group type="150" contains="EU" status="grouping"/> <!--Europe -->
177```
178
179That is, the type value isn’t a grouping, but if you filter out groupings you can drop this containment. In the example above, EU is a grouping, and contained in 150.
180
181### <a name="Subdivision_Containment" href="#Subdivision_Containment">Subdivision Containment</a>
182
183```xml
184<!ELEMENT subdivisionContainment ( subgroup* ) >
185
186<!ELEMENT subgroup EMPTY >
187<!ATTLIST subgroup type NMTOKEN #REQUIRED >
188<!ATTLIST subgroup contains NMTOKENS #IMPLIED >
189```
190
191The subdivision containment data is similar to the territory containment. It is based on ISO 3166-2 data, but may diverge from it in the future.
192
193```xml
194<subgroup type="BD" contains="bda bdb bdc bdd bde bdf bdg bdh" />
195<subgroup type="bda" contains="bd02 bd06 bd07 bd25 bd50 bd51" />
196```
197
198The `type` is a [`unicode_region_subtag`](tr35.md#unicode_region_subtag) (territory) identifier for the top level of containment, or a [`unicode_subdivision_id`](tr35.md#unicode_subdivision_id) for lower levels of containment when there are multiple levels. The `contains` value is a space-delimited list of one or more [`unicode_subdivision_id`](tr35.md#unicode_subdivision_id) values. In the example above, subdivision bda contains other subdivisions bd02, bd06, bd07, bd25, bd50, bd51.
199
200Note: Formerly (in CLDR 28 through 30):
201
202* The `type` attribute could only contain a `unicode_region_subtag`;
203* The `contains` attribute contained `unicode_subdivision_suffix` values; these are not unique across multiple territories, so...
204* For lower containment levels, a now-deprecated subtype `attribute` was used to specify the parent `unicode_subdivision_suffix`.
205
206\* The type attribute contained only a `unicode_region_subtag` `unicode_subdivision_suffix` values were used in the `contains` attribute; these are not unique across multiple territories, so for lower levels a now-deprecated
207
208### <a name="Supplemental_Territory_Information" href="#Supplemental_Territory_Information">Supplemental Territory Information</a>
209
210```xml
211<!ELEMENT territory ( languagePopulation* ) >
212<!ATTLIST territory type NMTOKEN #REQUIRED >
213<!ATTLIST territory gdp NMTOKEN #REQUIRED >
214<!ATTLIST territory literacyPercent NMTOKEN #REQUIRED >
215<!ATTLIST territory population NMTOKEN #REQUIRED >
216
217<!ELEMENT languagePopulation EMPTY >
218<!ATTLIST languagePopulation type NMTOKEN #REQUIRED >
219<!ATTLIST languagePopulation literacyPercent NMTOKEN #IMPLIED >
220<!ATTLIST languagePopulation writingPercent NMTOKEN #IMPLIED >
221<!ATTLIST languagePopulation populationPercent NMTOKEN #REQUIRED >
222<!ATTLIST languagePopulation officialStatus (de_facto_official | official | official_regional | official_minority) #IMPLIED >
223```
224
225This data provides testing information for language and territory populations. The main goal is to provide approximate figures for the literate, functional population for each language in each territory: that is, the population that is able to read and write each language, and is comfortable enough to use it with computers. For a chart of this data, see [Territory-Language Information](https://www.unicode.org/cldr/charts/46/supplemental/territory_language_information.html).
226
227_Example_
228
229```xml
230<territory type="AO" gdp="175500000000" literacyPercent="70.4" population="19088100"> <!--Angola-->
231    <languagePopulation type="pt" populationPercent="67" officialStatus="official"/> <!--Portuguese-->
232    <languagePopulation type="umb" populationPercent="29"/> <!--Umbundu-->
233    <languagePopulation type="kmb" writingPercent="10" populationPercent="25" references="R1034"/> <!--Kimbundu-->
234    <languagePopulation type="ln" populationPercent="0.67" references="R1010"/> <!--Lingala-->
235</territory>
236```
237
238Note that reliable information is difficult to obtain; the information in CLDR is an estimate culled from different sources, including the World Bank, CIA Factbook, and others. The GDP and country literacy figures are taken from the World Bank where available, otherwise supplemented by FactBook data and other sources. The GDP figures are “PPP (constant 2000 international $)”. Much of the per-language data is taken from the Ethnologue, but is supplemented and processed using many other sources, including per-country census data. (The focus of the Ethnologue is native speakers, which includes people who are not literate, and excludes people who are functional second-language users.) Some references are marked in the XML files, with attributes such as `references="R1010"` .
239
240The percentages may add up to more than 100% due to multilingual populations, or may be less than 100% due to illiteracy or because the data has not yet been gathered or processed. Languages with smaller populations might not be included.
241
242The following describes the meaning of some of these terms—as used in CLDR—in more detail.
243
244<a name="literacy_percent" href="#literacy_percent">literacy percent for the territory</a> — an estimate of the percentage of the country’s population that is functionally literate.
245
246<a name="language_population_percent" href="#language_population_percent">language population percent</a> — an estimate of the number of people who are functional in that language in that country, including both first and second language speakers. The level of fluency is that necessary to use a UI on a computer, smartphone, or similar devices, rather than complete fluency.
247
248<a name="literacy_percent_for_langPop" href="#literacy_percent_for_langPop">literacy percent for language population</a> — Within the set of people who are functional in the corresponding language (as specified by [language population percent](#language_population_percent)), this is an estimate of the percentage of those people who are functionally literate in that language, that is, who are _capable_ of reading or writing in that language, even if they do not regularly use it for reading or writing. If not specified, this defaults to the [literacy percent for the territory](#literacy_percent).
249
250<a name="writing_percent" href="#writing_percent">writing percent</a> — Within the set of people who are functional in the corresponding language (as specified by [language population percent](#language_population_percent)), this is an estimate of the percentage of those people who regularly read or write a significant amount in that language. Ideally, the regularity would be measured as “7-day actives”. If it is known that the language is not widely or commonly written, but there are no solid figures, the value is typically given 1%-5%.
251
252For a language such as Swiss German, which is typically not written, even though nearly the whole native Germanophone population _could_ write in Swiss German, the [literacy percent for language population](#literacy_percent_for_langPop) is high, but the [writing percent](#writing_percent) is low.
253
254<a name="official_language" href="#official_language">official language</a> — as used in CLDR, a language that can generally be used in all communications with a central government. That is, people can expect that essentially all communication from the government is available in that language (ballots, information pamphlets, legal documents, …) and that they can use that language in any communication to the central government (petitions, forms, filing lawsuits, …).
255
256Official languages for a country in this sense are not necessarily the same as those with official legal status in the country. For example, Irish is declared to be an official language in Ireland, but English has no such formal status in the United States. Languages such as the latter are called _de facto_ official languages. As another example, German has legal status in Italy, but cannot be used in all communications with the central government, and is thus not an official language _of Italy_ for CLDR purposes. It is, however, an _official regional language_. Other languages are declared to be official, but can’t actually be used for all communication with any major governmental entity in the country. There is no intention to mark such nominally official languages as “official” in the CLDR data.
257
258<a name="official_regional_language" href="#official_regional_language">official regional language</a> — a language that is official (_de jure_ or _de facto_) in a major region within a country, but does not qualify as an official language of the country as a whole. For example, it can be used in an official petition to a provincial government, but not the central government. The term “major” is meant to distinguish from smaller-scale usage, such as for a town or village.
259
260### <a name="Territory_Based_Preferences" href="#Territory_Based_Preferences">Territory-Based Preferences</a>
261
262The default preference for several locale items is based solely on a [unicode_region_subtag](tr35.md#unicode_region_subtag), which may either be specified as part of a [unicode_language_id](tr35.md#unicode_language_id), inferred from other locale ID elements using the [Likely Subtags](tr35.md#Likely_Subtags) mechanism, or provided explicitly using an “rg” [Region Override](tr35.md#RegionOverride) locale key. For more information on this process see [Locale Inheritance and Matching](tr35.md#Locale_Inheritance). The specific items that are handled in this way are:
263
264* Default calendar (see [Calendar Preference Data](tr35-dates.md#Calendar_Preference_Data))
265* Default week conventions (first day of week and weekend days; see [Week Data](tr35-dates.md#Week_Data))
266* Default hour cycle (see [Time Data](tr35-dates.md#Time_Data))
267* Default currency (see [Supplemental Currency Data](tr35-numbers.md#Supplemental_Currency_Data))
268* Default measurement system and paper size (see [Measurement System Data](tr35-general.md#Measurement_System_Data))
269* Default units for specific usage (see [Preferred Units for Specific Usages](#Preferred_Units_For_Usage), below)
270
271The mu, ms, and rg keys also interact with the base locale and the unit preferences. For more information, see _[Unit Preferences](#Unit_Preferences)._
272
273#### <a name="Preferred_Units_For_Usage" href="#Preferred_Units_For_Usage">Preferred Units for Specific Usages</a>
274
275The determination of preferred units depends on the locale identifer: the keys mu, ms, rg, the base locale (language, script, region) and the user preferences.
276_For information about preferred units and unit conversion, see [Unit Conversion](#Unit_Conversion) and [Unit Preferences](#Unit_Preferences)._
277
278### <a name="rgScope" href="#rgScope">`<rgScope>`: Scope of the “rg” Locale Key</a>
279
280The supplemental `<rgScope>` element specifies the data paths for which the region used for data lookup is determined by the value of any “rg” key present in the locale identifier (see [Region Override](tr35.md#RegionOverride) and [Region Priority Inheritance](tr35.md#Region_Priority_Inheritance)). If no “rg” key is present, the region used for lookup is determined as usual: from the unicode_region_subtag if present, else inferred from the unicode_language_subtag. The DTD structure is as follows:
281
282```xml
283<!ELEMENT rgScope ( rgPath* ) >
284
285<!ELEMENT rgPath EMPTY >
286<!ATTLIST rgPath path CDATA #REQUIRED >
287```
288
289The `<rgScope>` element contains a list of `<rgPath>` elements, each of which specifies a datapath for which any “rg” key determines the region for lookup. For example:
290
291```xml
292<rgScope>
293    <rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*'][@cashDigits='*'][@cashRounding='*']" draft="provisional" />
294    <rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*'][@cashRounding='*']" draft="provisional" />
295    <rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*']" draft="provisional" />
296    <rgPath path="//supplementalData/calendarPreferenceData/calendarPreference[@territories='#'][@ordering='*']" draft="provisional" />
297    ...
298    <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*'][@scope='*']/unitPreference[@regions='#'][@alt='*']" draft="provisional" />
299    <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*'][@scope='*']/unitPreference[@regions='#']" draft="provisional" />
300    <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*']/unitPreference[@regions='#'][@alt='*']" draft="provisional" />
301    <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*']/unitPreference[@regions='#']" draft="provisional" />
302</rgScope>
303```
304
305The exact format of the path is provisional in CLDR 29, but as currently shown:
306
307*   An attribute value of `'*'` indicates that the path applies regardless of the value of the attribute.
308*   Each path must have exactly one attribute whose value is marked here as `'#'`; in actual data items with this path, the corresponding value is a list of region codes. It is the region codes in this list that are compared with the region specified by the “rg” key to determine which data item to use for this path.
309
310## <a name="Supplemental_Language_Data" href="#Supplemental_Language_Data">Supplemental Language Data</a>
311
312```xml
313<!ELEMENT languageData ( language* ) >
314<!ELEMENT language EMPTY >
315<!ATTLIST language type NMTOKEN #REQUIRED >
316<!ATTLIST language scripts NMTOKENS #IMPLIED >
317<!ATTLIST language territories NMTOKENS #IMPLIED >
318<!ATTLIST language variants NMTOKENS #IMPLIED >
319<!ATTLIST language alt NMTOKENS #IMPLIED >
320```
321
322The language data is used for consistency checking and testing. It provides a list of which languages are used with which scripts and in which countries. To a large extent, however, the territory list has been superseded by the data in _[Supplemental Territory Information](#Supplemental_Territory_Information)_ .
323
324```xml
325<languageData>
326    <language type="af" scripts="Latn" territories="ZA" />
327    <language type="am" scripts="Ethi" territories="ET" />
328    <language type="ar" scripts="Arab" territories="AE BH DZ EG IN IQ JO KW LB LY MA OM PS QA SA SD SY TN YE" />
329    ...
330```
331
332If the language is not a modern language, or the script is not a modern script, or the language not a major language of the territory, then the `alt` attribute is set to secondary.
333
334```xml
335    <language type="fr" scripts="Latn" territories="IT US" alt="secondary" />
336    ...
337```
338
339## <a name="Supplemental_Language_Grouping" href="#Supplemental_Language_Grouping">Supplemental Language Grouping</a>
340
341```xml
342<!ELEMENT languageGroups ( languageGroup* ) >
343<!ELEMENT languageGroup ( #PCDATA ) >
344<!ATTLIST languageGroup parent NMTOKEN #REQUIRED >
345```
346
347The language groups supply language containment. For example, the following indicates that aav is the Unicode language code for a language group that contains caq, crv, etc.
348
349```xml
350<languageGroup parent="fiu">chm et fi fit fkv hu izh kca koi krl kv liv mdf mns mrj myv smi udm vep vot vro</languageGroup>
351```
352
353The vast majority of the languageGroup data is extracted from Wikidata, but may be overridden in some cases. The Wikidata information is more fine-grained, but makes use of language groups that don't have ISO or Unicode language codes. Those language groups are omitted from the data. For example, Wikidata has the following child-parent chain: only the first and last elements are present in the language groups.
354
355| Name                      | Wikidata Code                                    | Language Code |
356| ------------------------- | ------------------------------------------------ | ------------- |
357| Finnish                   | [Q1412](https://www.wikidata.org/wiki/Q1412)     | fi |
358| Finnic languages          | [Q33328](https://www.wikidata.org/wiki/Q33328)   |
359| Finno-Samic languages     | [Q163652](https://www.wikidata.org/wiki/Q163652) |
360| Finno-Volgaic languages   | [Q161236](https://www.wikidata.org/wiki/Q161236) |
361| Finno-Permic languages    | [Q161240](https://www.wikidata.org/wiki/Q161240) |
362| Finno-Ugric languages     | [Q79890](https://www.wikidata.org/wiki/Q79890)   | fiu |
363
364## <a name="Supplemental_Code_Mapping" href="#Supplemental_Code_Mapping">Supplemental Code Mapping</a>
365
366```xml
367<!ELEMENT codeMappings (languageCodes*, territoryCodes*, currencyCodes*) >
368
369<!ELEMENT languageCodes EMPTY >
370<!ATTLIST languageCodes type NMTOKEN #REQUIRED>
371<!ATTLIST languageCodes alpha3 NMTOKEN #REQUIRED>
372
373<!ELEMENT territoryCodes EMPTY >
374<!ATTLIST territoryCodes type NMTOKEN #REQUIRED>
375<!ATTLIST territoryCodes numeric NMTOKEN #REQUIRED>
376<!ATTLIST territoryCodes alpha3 NMTOKEN #REQUIRED>
377<!ATTLIST territoryCodes fips10 NMTOKEN #IMPLIED>
378<!ATTLIST territoryCodes internet NMTOKENS #IMPLIED> [deprecated]
379
380<!ELEMENT currencyCodes EMPTY >
381<!ATTLIST currencyCodes type NMTOKEN #REQUIRED>
382<!ATTLIST currencyCodes numeric NMTOKEN #REQUIRED>
383```
384
385The code mapping information provides mappings between the subtags used in the CLDR locale IDs (from BCP 47) and other coding systems or related information. The language codes are only provided for those codes that have two letters in BCP 47 to their ISO three-letter equivalents. The territory codes provide mappings to numeric (UN M.49 [[UNM49](tr35.md#UNM49)] codes, equivalent to ISO numeric codes), ISO three-letter codes, FIPS 10 codes, and the internet top-level domain codes.
386
387The alphabetic codes are only provided where different from the type. For example:
388
389```xml
390<territoryCodes type="AA" numeric="958" alpha3="AAA" />
391<territoryCodes type="AD" numeric="020" alpha3="AND" fips10="AN" />
392<territoryCodes type="AE" numeric="784" alpha3="ARE" />
393...
394<territoryCodes type="GB" numeric="826" alpha3="GBR" fips10="UK" />
395...
396<territoryCodes type="QU" numeric="967" alpha3="QUU" internet="EU" />
397...
398<territoryCodes type="XK" numeric="983" alpha3="XKK" />
399...
400```
401
402Where there is no corresponding code, sometimes private use codes are used, such as the numeric code for XK.
403
404The currencyCodes are mappings from three letter currency codes to numeric values (ISO 4217, see [Current currency & funds code list](https://www.six-group.com/en/products-services/financial-information/data-standards.html#scrollTo=maintenance-agency)). The mapping currently covers only current codes and does not include historic currencies. For example:
405
406```xml
407<currencyCodes type="AED" numeric="784" />
408<currencyCodes type="AFN" numeric="971" />
409...
410<currencyCodes type="EUR" numeric="978" />
411...
412<currencyCodes type="ZAR" numeric="710" />
413<currencyCodes type="ZMW" numeric="967" />
414```
415
416## ~~<a name="Telephone_Code_Data" href="#Telephone_Code_Data">Telephone Code Data</a>~~ (Deprecated)
417
418Deprecated in CLDR v34, and data removed.
419The data and structure for phone numbers changes quite often, so the recommended alternative is the open-source library [libphonenumber](https://github.com/google/libphonenumber#what-is-it).
420
421```xml
422<!ELEMENT telephoneCodeData ( codesByTerritory* ) >
423
424<!ELEMENT codesByTerritory ( telephoneCountryCode+ ) >
425<!ATTLIST codesByTerritory territory NMTOKEN #REQUIRED >
426
427<!ELEMENT telephoneCountryCode EMPTY >
428<!ATTLIST telephoneCountryCode code NMTOKEN #REQUIRED >
429<!ATTLIST telephoneCountryCode from NMTOKEN #IMPLIED >
430<!ATTLIST telephoneCountryCode to NMTOKEN #IMPLIED >
431```
432
433This data specifies the mapping between ITU telephone country codes [[ITUE164](tr35.md#ITUE164)] and CLDR-style territory codes (ISO 3166 2-letter codes or non-corresponding UN M.49 [[UNM49](tr35.md#UNM49)] 3-digit codes). There are several things to note:
434
435* A given telephone country code may map to multiple CLDR territory codes; +1 (North America Numbering Plan) covers the US and Canada, as well as many islands in the Caribbean and some in the Pacific
436* Some telephone country codes are for global services (for example, some satellite services), and thus correspond to territory code 001.
437* The mappings change over time (territories move from one telephone code to another). These changes are usually planned several years in advance, and there may be a period during which either telephone code can be used to reach the territory. While the CLDR telephone code data is not intended to include past changes, it is intended to incorporate known information on planned future changes, using `from` and `to` date attributes to indicate when mappings are valid.
438
439A subset of the telephone code data might look like the following (showing a past mapping change to illustrate the from and to attributes):
440
441```xml
442<codesByTerritory territory="001">
443    <telephoneCountryCode code="800"/> <!-- International Freephone Service -->
444    <telephoneCountryCode code="808"/> <!-- International Shared Cost Services (ISCS) -->
445    <telephoneCountryCode code="870"/> <!-- Inmarsat Single Number Access Service (SNAC) -->
446</codesByTerritory>
447<codesByTerritory territory="AS"> <!-- American Samoa -->
448    <telephoneCountryCode code="1" from="2004-10-02"/> <!-- +1 684 in North America Numbering Plan -->
449    <telephoneCountryCode code="684" to="2005-04-02"/> <!-- +684 now a spare code -->
450</codesByTerritory>
451<codesByTerritory territory="CA">
452    <telephoneCountryCode code="1"/> <!-- North America Numbering Plan -->
453</codesByTerritory>
454```
455
456## ~~<a name="Postal_Code_Validation" href="#Postal_Code_Validation">Postal Code Validation (Deprecated)</a>~~
457
458Deprecated in v27. Please see other services that are kept up to date, such as <https://github.com/google/libaddressinput>
459
460```xml
461<!ELEMENT postalCodeData (postCodeRegex*) >
462<!ELEMENT postCodeRegex (#PCDATA) >
463<!ATTLIST postCodeRegex territoryId NMTOKEN #REQUIRED >
464```
465
466The Postal Code regex information can be used to validate postal codes used in different countries. In some cases, the regex is quite simple, such as for Germany:
467
468```xml
469<postCodeRegex territoryId="DE" >\d{5}</postCodeRegex>
470```
471
472The US code is slightly more complicated, since there is an optional portion:
473
474```xml
475<postCodeRegex territoryId="US" >\d{5}([ \-]\d{4})?</postCodeRegex>
476```
477
478The most complicated currently is the UK.
479
480## <a name="Supplemental_Character_Fallback_Data" href="#Supplemental_Character_Fallback_Data">Supplemental Character Fallback Data</a>
481
482```xml
483<!ELEMENT characters ( character-fallback*) >
484
485<!ELEMENT character-fallback ( character* ) >
486<!ELEMENT character (substitute*) >
487<!ATTLIST character value CDATA #REQUIRED >
488
489<!ELEMENT substitute (#PCDATA) >
490```
491
492The `characters` element provides a way for non-Unicode systems, or systems that only support a subset of Unicode characters, to transform CLDR data. It gives a list of characters with alternative values that can be used if the main value is not available. For example:
493
494```xml
495<characters>
496    <character-fallback>
497        <character value="ß">
498        <substitute>ss</substitute>
499    </character>
500    <character value="Ø">
501        <substitute>Ö</substitute>
502        <substitute>O</substitute>
503    </character>
504    <character value="₧">
505        <substitute>Pts</substitute>
506    </character>
507    <character value="₣">
508        <substitute>Fr.</substitute>
509    </character>
510    </character-fallback>
511</characters>
512```
513
514The ordering of the `substitute` elements indicates the preference among them.
515
516That is, this data provides recommended fallbacks for use when a charset or supported repertoire does not contain a desired character. There is more than one possible fallback: the recommended usage is that when a character _value_ is not in the desired repertoire the following process is used, whereby the first value that is wholly in the desired repertoire is used.
517
518* `toNFC`(_value_)
519* other canonically equivalent sequences, if there are any
520* the explicit _substitutes_ value (in order)
521* `toNFKC`(_value_)
522
523## <a name="Coverage_Levels" href="#Coverage_Levels">Coverage Levels</a>
524
525The following describes the structure used to set coverage levels used for CLDR.
526That structure is used in CLDR tooling, and can also be used by consumers of CLDR data, such as described in [Data Size Reduction](tr35.md#Data_Size).
527
528The following lists the coverage levels. The qualifications for each level may change between releases of CLDR, and more detailed information for each level is on [Coverage Levels](https://cldr.unicode.org/index/cldr-spec/coverage-levels). Each level adds to what is in the lower level, so Basic includes all of Core, Moderate all of Basic, and so on.
529
530| Code  | Level         | Description    |
531| ----: | ------------- | -------------- |
532| 0     | undetermined  | Does not meet any of the following levels. |
533| 10    | core          | Core Locale — Has minimal data about the language and writing system that is required before other information can be added using the CLDR survey tool. |
534| 40    | basic         | Selectable Locale — Minimal locale data necessary for a "selectable" locale in a platform UI. Very basic number and datetime formatting, etc. |
535| 60    | moderate      | Document Content Locale — Minimal locale data for applications such as spreadsheets and word processors to support general document content internationalization: formatting number, datetime, currencies, sorting, plural handling, and so on. |
536| 80    | modern        | UI Locale — Contains all fields in normal modern use, including all CLDR locale names, country names, timezone names, currencies in use, and so on. |
537| 100   | comprehensive | Above modern level; typically more data than is needed in most implementations. |
538
539The Basic through Modern levels are based on the definitions and specifications listed below.
540
541```xml
542<!ELEMENT coverageLevels ( approvalRequirements, coverageVariable*, coverageLevel* ) >
543<!ELEMENT coverageLevel EMPTY >
544<!ATTLIST coverageLevel inLanguage CDATA #IMPLIED >
545<!ATTLIST coverageLevel inScript CDATA #IMPLIED >
546<!ATTLIST coverageLevel inTerritory CDATA #IMPLIED >
547<!ATTLIST coverageLevel value CDATA #REQUIRED >
548<!ATTLIST coverageLevel match CDATA #REQUIRED >
549```
550
551For example, here is an example coverageLevel line.
552
553```xml
554<coverageLevel
555    value="30"
556    inLanguage="(de|fi)"
557    match="localeDisplayNames/types/type[@type='phonebook'][@key='collation']"/>
558```
559
560The `coverageLevel` elements are read in order, and the first match results in a coverage level value. The element matches based on the `inLanguage`, `inScript`, `inTerritory`, and `match` attribute values, which are regular expressions. For example, in the above example, a match occurs if the language is de or fi, and if the path is a locale display name for `collation=phonebook`.
561
562The `match` attribute value logically has `//ldml/` prefixed before it is applied. In addition, the `[@` is automatically quoted. Otherwise standard Perl/Java style regular expression syntax is used.
563
564```xml
565<!ELEMENT coverageVariable EMPTY >
566<!ATTLIST coverageVariable key CDATA #REQUIRED >
567<!ATTLIST coverageVariable value CDATA #REQUIRED >
568```
569
570The `coverageVariable` element allows us to create variables for certain regular expressions that are used frequently in the coverageLevel definitions above. Each coverage variable must contain a `key` / `value` pair of attributes, which can then be used to be substituted into a coverageLevel definition above.
571
572For example, here is an example coverageLevel line using coverageVariable substitution.
573
574```xml
575<coverageVariable key="%dayTypes" value="(sun|mon|tue|wed|thu|fri|sat)">
576<coverageVariable key="%wideAbbr" value="(wide|abbreviated)">
577<coverageLevel value="20" match="dates/calendars/calendar[@type='gregorian']/days/dayContext[@type='format']/dayWidth[@type='%wideAbbr']/day[@type='%dayTypes']"/>
578```
579
580In this example, the coverge variables %dayTypes and %wideAbbr are used to substitute their respective values into the match expression. This allows us to reuse the same variable for other coverageLevel matches that use the same regular expression fragment.
581
582```xml
583<!ELEMENT approvalRequirements ( approvalRequirement* ) >
584<!ELEMENT approvalRequirement EMPTY >
585<!ATTLIST approvalRequirement votes CDATA #REQUIRED >
586<!ATTLIST approvalRequirement locales CDATA #REQUIRED >
587<!ATTLIST approvalRequirement paths CDATA #REQUIRED >
588```
589
590The approvalRequirements allows to specify the number of survey tool votes required for approval, either based on locale, or path, or both. Certain locales require a higher voting threshold (usually 8 votes instead of 4), in order to promote greater stability in the data. Furthermore, certain fields that are very high visibility fields, such as number formats, require a CLDR TC committee member's vote for approval.
591
592`votes=` can be a numeric value, or it can be of the form `=vetter` where `vetter` is one of the `VoteResolver.Level` enumerated values.
593It can also be `=LOWER_BAR` (8) or `=HIGH_BAR` (same as `=tc`)  referring to the `VoteResolver` constants of the same names.
594
595Here is an example of the approvalRequirements section.
596
597```xml
598<approvalRequirements>
599    <!--  "high bar" items -->
600    <approvalRequirement votes="=HIGH_BAR" locales="*" paths="//ldml/numbers/symbols[^/]++/(decimal|group)"/>
601    <!--  established locales - https://cldr.unicode.org/index/process#h.rm00w9v03ia8 -->
602    <approvalRequirement votes="=LOWER_BAR" locales="ar ca cs da de el es fi fr he hi hr hu it ja ko nb nl pl pt pt_PT ro ru sk sl sr sv th tr uk vi zh zh_Hant" paths=""/>
603    <!--  all other items -->
604    <approvalRequirement votes="=vetter" locales="*" paths=""/>
605</approvalRequirements>
606```
607
608This section specifies that a TC vote (20 votes) is required for decimal and grouping separators. Furthermore it specifies that any field in the established locales list (i.e. ar, ca, cs, etc.) requires 8 votes, and that all other locales require 4 votes only.
609
610For more information on the CLDR Voting process, see [https://cldr.unicode.org/index/process](https://cldr.unicode.org/index/process)
611
612### <a name="Coverage_Level_Definitions" href="#Coverage_Level_Definitions">Definitions</a>
613This is a snapshot of the contents of certain variables. The actual definitions in the coverageLevels.xml file may vary from these descriptions.
614
615* _Target-Language_ is the language under consideration.
616* _Target-Territories_ is the list of territories found by looking up _Target-Language_ in the `<languageData>` elements in [Supplemental Language Data](tr35-info.md#Supplemental_Language_Data).
617* _Language-List_ is _Target-Language_, plus
618  * **moderate:** Chinese, English, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Unknown; Arabic, Hindi, Korean, Indonesian, Dutch, Bengali, Turkish, Thai, Polish (de, en, es, fr, it, ja, pt, ru, zh, und, ar, hi, ko, in, nl, bn, tr, th, pl). If an EU language, add the remaining official EU languages.
619  * **modern:** all languages that are official or major commercial languages of modern territories
620* _Target-Scripts_ is the list of scripts in which _Target-Language_ can be customarily written (found by looking up _Target-Language_ in the `<languageData>` elements in [Supplemental Language Data](tr35-info.md#Supplemental_Language_Data))_,_ plus Unknown (Zzzz)_._
621* _Script-List_ is the _Target-Scripts_ plus the major scripts used for multiple languages
622  * Latin, Simplified Chinese, Traditional Chinese, Cyrillic, Arabic (Latn, Hans, Hant, Cyrl, Arab)
623* _Territory-List_ is the list of territories formed by taking the _Target-Territories_ and adding:
624  * **moderate:** Brazil, China, France, Germany, India, Italy, Japan, Russia, United Kingdom, United States, Unknown; Spain, Canada, Korea, Mexico, Australia, Netherlands, Switzerland, Belgium, Sweden, Turkey, Austria, Indonesia, Saudi Arabia, Norway, Denmark, Poland, South Africa, Greece, Finland, Ireland, Portugal, Thailand, Hong Kong SAR China, Taiwan (BR, CN, DE, GB, FR, IN, IT, JP, RU, US, ZZ, ES, BE, SE, TR, AT, ID, SA, NO, DK, PL, ZA, GR, FI, IE, PT, TH, HK, TW). If an EU language, add the remaining member EU countries.
625  * **modern:** all current ISO 3166 territories, plus the UN M.49 [[UNM49](tr35.md#UNM49)] regions in [Supplemental Territory Containment](tr35-info.md#Supplemental_Territory_Containment).
626* _Currency-List_ is the list of current official currencies used in any of the territories in _Territory-List_, found by looking at the `region` elements in [Supplemental Territory Containment](tr35-info.md#Supplemental_Territory_Containment), plus Unknown (XXX).
627* _Calendar-List_ is the set of calendars in customary use in any of _Target-Territories_, plus Gregorian.
628* _Number-System-List_ is the set of number systems in customary use in the language.
629
630### <a name="Coverage_Level_Data_Requirements" href="#Coverage_Level_Data_Requirements">Data Requirements</a>
631
632The required data to qualify for each level based on these definitions is then the following.
633
6341. localeDisplayNames
635   1. _languages:_ localized names for all languages in _Language-List._
636   2. _scripts:_ localized names for all scripts in _Script-List_.
637   3. _territories:_ localized names for all territories in _Territory-List_.
638   4. _variants, keys, types:_ localized names for any in use in _Target-Territories_; for example, a translation for PHONEBOOK in a German locale.
639
6402. dates: all of the following for each calendar in _Calendar-List_.
641   1. calendars: localized names
642   2. month names, day names, era names, and quarter names
643      * context=format and width=narrow, wide, & abbreviated
644      * plus context=standAlone and width=narrow, wide, & abbreviated, _if the grammatical forms of these are different than for context=format._
645   3. week: minDays, firstDay, weekendStart, weekendEnd
646      * if some of these vary in territories in _Territory-List_, include territory locales for those that do.
647   4. am, pm, eraNames, eraAbbr
648   5. dateFormat, timeFormat: full, long, medium, short
649   6. intervalFormatFallback
650
6513. numbers: symbols, decimalFormats, scientificFormats, percentFormats, currencyFormats for each number system in _Number-System-List_.
6524. currencies: displayNames and symbol for all currencies in _Currency-List_, for all plural forms
6535. transforms: (moderate and above) transliteration between Latin and each other script in _Target-Scripts._
654
655### <a name="Coverage_Level_Default_Values" href="#Coverage_Level_Default_Values">Default Values</a>
656
657Items should _only_ be included if they are not the same as the default, which is:
658
659* what is in root, if there is something defined there.
660* for timezone IDs: the name computed according to _[Appendix J: Time Zone Display Names](tr35.md#Time_Zone_Fallback)_
661* for collation sequence, the UCA DUCET (Default Unicode Collation Element Table), as modified by CLDR.
662  * however, in that case the locale must be added to the validSubLocale list in [collation/root.xml](https://github.com/unicode-org/cldr/blob/main/common/collation/root.xml).
663* for currency symbol, language, territory, script names, variants, keys, types, the internal code identifiers, for example,
664  * currencies: EUR, USD, JPY, ...
665  * languages: en, ja, ru, ...
666  * territories: GB, JP, FR, ...
667  * scripts: Latn, Thai, ...
668  * variants: PHONEBOOK, ...
669
670## <a name="Appendix_Supplemental_Metadata" href="#Appendix_Supplemental_Metadata">Supplemental Metadata</a>
671
672Note that this section discusses the `<metadata>` element within the `<supplementalData>` element. For the per-locale metadata used in tests and the Survey Tool, see [10: Locale Metadata Element](#Metadata_Elements).
673
674The supplemental metadata contains information about the CLDR file itself, used to test validity and provide information for locale inheritance. A number of these elements are described in
675
676* Appendix I: [Inheritance and Validity](tr35.md#Inheritance_and_Validity)
677* Appendix K: [Valid Attribute Values](tr35.md#Valid_Attribute_Values)
678* Appendix L: [Canonical Form](tr35.md#Canonical_Form)
679* Appendix M: [Coverage Levels](#Coverage_Levels)
680
681### <a name="Supplemental_Alias_Information" href="#Supplemental_Alias_Information">Supplemental Alias Information</a>
682
683```xml
684<!ELEMENT alias (languageAlias*,scriptAlias*,territoryAlias*,subdivisionAlias*,variantAlias*,zoneAlias*) >
685```
686
687_The following are common attributes for subelements of `<alias>`:_
688
689```xml
690<!ELEMENT *Alias EMPTY >
691<!ATTLIST *Alias type NMTOKEN #IMPLIED >
692<!ATTLIST *Alias replacement NMTOKEN #IMPLIED >
693<!ATTLIST *Alias reason ( deprecated | overlong ) #IMPLIED >
694```
695
696_The `languageAlias` has additional reasons_
697
698```xml
699<!ATTLIST languageAlias reason ( deprecated | overlong | macrolanguage | legacy | bibliographic ) #IMPLIED >
700```
701
702This element provides information as to parts of locale IDs that should be substituted when accessing CLDR data. This logical substitution should be done to both the locale id, and to any lookup for display names of languages, territories, and so on. The replacement for the language and territory types is more complicated: see _Part 1: [Core](tr35.md#Contents), [BCP 47 Language Tag Conversion](tr35.md#BCP_47_Language_Tag_Conversion)_ for details.
703
704```xml
705<alias>
706    <languageAlias type="in" replacement="id">
707    <languageAlias type="sh" replacement="sr">
708    <languageAlias type="sh_YU" replacement="sr_Latn_YU">
709    ...
710    <territoryAlias type="BU" replacement="MM">
711    ...
712</alias>
713```
714
715Attribute values for the \*Alias values include the following:
716
717###### Table: <a name="Alias_Attribute_Values" href="#Alias_Attribute_Values">Alias Attribute Values</a>
718
719| Attribute   | Value         | Description |
720| ----------- | ------------- | ----------- |
721| type        | NMTOKEN       | The code to be replaced |
722| replacement | NMTOKEN       | The code(s) to replace it, space-delimited. |
723| reason      | deprecated    | The code in type is deprecated, such as 'iw' by 'he', or 'CS' by 'RS ME'. |
724|             | overlong      | The code in type is too long, such as 'eng' by 'en' or 'USA' or '840' by 'US' |
725|             | macrolanguage | The code in type is an encompassed language that is replaced by a macrolanguage, such as '[arb'](https://iso639-3.sil.org/code/arb) by 'ar'. |
726|             | legacy        | The code in type is a legacy code that is replaced by another code for compatibility with established legacy usage, such as 'sh' by 'sr_Latn' |
727|             | bibliographic | The code in type is a [bibliographic code](https://www.loc.gov/standards/iso639-2/langhome.html), which is replaced by a terminology code, such as 'alb' by 'sq'. |
728
729### ~~<a name="Supplemental_Deprecated_Information" href="#Supplemental_Deprecated_Information">Supplemental Deprecated Information (Deprecated)</a>~~
730
731```xml
732<!ELEMENT deprecated ( deprecatedItems* ) >
733<!ATTLIST deprecated draft ( approved | contributed | provisional | unconfirmed | true | false ) #IMPLIED > <!-- true and false are deprecated. -->
734
735<!ELEMENT deprecatedItems EMPTY >
736<!ATTLIST deprecatedItems type ( standard | supplemental | ldml | supplementalData | ldmlBCP47 ) #IMPLIED > <!-- standard | supplemental are deprecated -->
737<!ATTLIST deprecatedItems elements NMTOKENS #IMPLIED >
738<!ATTLIST deprecatedItems attributes NMTOKENS #IMPLIED >
739<!ATTLIST deprecatedItems values CDATA #IMPLIED >
740```
741
742The `deprecatedItems` element was used to indicate elements, attributes, and attribute values that are deprecated. This means that the items are valid, but that their usage is strongly discouraged. This element and its subelements have been deprecated in favor of [DTD Annotations](tr35.md#DTD_Annotations).
743
744Where particular values are deprecated (such as territory codes like SU for Soviet Union), the names for such codes may be removed from the common/main translated data after some period of time. However, typically supplemental information for deprecated codes is retained, such as containment, likely subtags, older currency codes usage, etc. The English name may also be retained, for debugging purposes.
745
746### <a name="Default_Content" href="#Default_Content">Default Content</a>
747
748```xml
749<!ELEMENT defaultContent EMPTY >
750<!ATTLIST defaultContent locales NMTOKENS #IMPLIED >
751```
752
753In CLDR, locales without territory information (or where needed, script information) provide data appropriate for what is called the _default content locale_. For example, the _en_ locale contains data appropriate for _en-US_, while the _zh_ locale contains content for _zh-Hans-CN_, and the _zh-Hant_ locale contains content for _zh-Hant-TW_. The default content locales themselves thus inherit all of their contents, and are empty.
754
755The choice of content is typically based on the largest literate population of the possible choices. Thus if an implementation only provides the base language (such as _en_), it will still get a complete and consistent set of data appropriate for a locale which is reasonably likely to be the one meant. Where other information is available, such as independent country information, that information can always be used to pick a different locale (such as _en-CA_ for a website targeted at Canadian users).
756
757If an implementation is to use a different default locale, then the data needs to be _pivoted_; all of the data from the CLDR for the current default locale pushed out to the locales that inherit from it, then the new default content locale's data moved into the base. There are tools in CLDR to perform this operation.
758
759For the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see **_[Inheritance vs Related Information](tr35.md#Inheritance_vs_Related)_**.
760
761## <a name="Metadata_Elements" href="#Metadata_Elements">Locale Metadata Elements</a>
762
763Note: This section refers to the per-locale `<metadata>` element, containing metadata about a particular locale. This is in contrast to the [_Supplemental_ Metadata](#Appendix_Supplemental_Metadata), which is in the supplemental tree and is not specific to a locale.
764
765```xml
766<!ELEMENT metadata ( alias | ( casingData?, special* ) ) >
767<!ELEMENT casingData ( alias | ( casingItem*, special* ) ) >
768<!ELEMENT casingItem ( #PCDATA ) >
769<!ATTLIST casingItem type CDATA #REQUIRED >
770<!ATTLIST casingItem override (true | false) #IMPLIED >
771<!ATTLIST casingItem forceError (true | false) #IMPLIED >
772```
773
774The `<metadata>` element contains metadata about the locale for use by the Survey Tool or other tools in checking locale data; this data is not intended for export as part of the locale itself.
775
776The `<casingItem>` element specifies the capitalization intended for the majority of the data in a given category with the locale. The purpose is so that warnings can be issued to translators that anything deviating from that capitalization should be carefully reviewed. Its `type` attribute has one of the values used for the `<contextTransformUsage>` element above, with the exception of the special value "all"; its value is one of the following:
777
778* lowercase
779* titlecase
780
781The `<casingItem>` data is generated by a tool based on the data available in CLDR. In cases where the generated casing information is incorrect and needs to be manually edited, the `override` attribute is set to `true` so that the tool will not override the manual edits. When the casing information is known to be both correct and something that should apply to all elements of the specified type in a given locale, the `forceErr` attribute may be set to `true` to force an error instead of a warning for items that do not match the casing information.
782
783## <a name="Version_Information" href="#Version_Information">Version Information</a>
784
785```xml
786<!ELEMENT version EMPTY >
787<!ATTLIST version cldrVersion CDATA #FIXED "27" >
788<!ATTLIST version unicodeVersion CDATA #FIXED "7.0.0" >
789```
790
791The `cldrVersion` attribute defines the CLDR version for this data, as published on [CLDR Releases/Downloads](https://cldr.unicode.org/index/downloads).
792
793The `unicodeVersion` attribute defines the version of the Unicode standard that is used to interpret data. Specifically, some data elements such as exemplar characters are expressed in terms of UnicodeSets. Since UnicodeSets can be expressed in terms of Unicode properties, their meaning depends on the Unicode version from which property values are derived.
794
795## <a name="Parent_Locales" href="#Parent_Locales">Parent Locales</a>
796
797The parentLocales data is supplemental data, but is described in detail in the [core specification section 4.1.3.](tr35.md#Parent_Locales)
798
799## <a name="Unit_Conversion" href="#Unit_Conversion">Unit Conversion</a>
800
801The unit conversion data ([units.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml)) provides the data for converting all of the cldr unit identifiers to base units, and back. That allows conversion between any two convertible units, such as two units of length. For any two convertible units (such as acre and dunum) the first can be converted to the base unit (square-meter), then that base unit can be converted to the second unit.
802
803### Unit Parsing Data
804
805<!ELEMENT unitIdComponents ( unitIdComponent* ) >
806
807<!ELEMENT unitIdComponent EMPTY >
808<!ATTLIST unitIdComponent type NMTOKEN #REQUIRED >
809<!ATTLIST unitIdComponent values NMTOKENS #REQUIRED >
810
811These elements provide support for parsing unit identifiers, as described in [Unit Elements](tr35-general.md#Unit_Elements).
812Each of the values has tokens with specific functions, identified by the type.
813For example the following values can be suffixes in a simple_unit identifier such as `quart-imperial`.
814
815```
816<unitIdComponent type="suffix" values="force imperial luminosity mass metric person radius scandinavian troy unit us"/>
817````
818
819### Unit Prefixes
820```xml
821<!ELEMENT unitPrefixes ( unitPrefix* ) >
822
823<!ELEMENT unitPrefix EMPTY >
824<!ATTLIST unitPrefix type NMTOKEN #REQUIRED >
825<!ATTLIST unitPrefix symbol NMTOKEN #REQUIRED >
826<!ATTLIST unitPrefix power10 NMTOKEN #IMPLIED >
827<!ATTLIST unitPrefix power2 NMTOKEN #IMPLIED >
828```
829
830This data lists the SI prefixes that can be applied to units (typically limited to prefixable units),
831such as the following:
832```xml
833<unitPrefixes>
834	<unitPrefix type='quecto' symbol='q' power10='-30'/>
835...
836	<unitPrefix type='micro' symbol='μ' power10='-6'/>
837...
838	<unitPrefix type='giga' symbol='G' power10='9'/>
839...
840	<unitPrefix type='quetta' symbol='Q' power10='30'/>
841	<unitPrefix type='kibi' symbol='Ki' power2='10'/>
842...
843	<unitPrefix type='yobi' symbol='Yi' power2='80'/>
844</unitPrefixes>
845```
846The information includes the SI prefix and symbol, and the power of 10 or power of 2
847(for binary prefixes, intended for use with digital units).
848
849Note that the translated short form of a unit prefix is not the same as the localized symbol.
850The localized symbol may be the same for most Latin-script languages,
851but depending on the customary use in a language they can be in a different script
852or use different letters even in Latin-script languages. They are, however, the same in the root locale.
853
854The newer prefixes (quecto-, ronto-, -ronna, -quetta) are not yet being translated,
855because the appropriate translated versions have not yet been well established across languages.
856
857### Constants
858
859
860```xml
861<!ELEMENT unitConstants ( unitConstant* ) >
862
863<!ELEMENT unitConstant EMPTY >
864<!ATTLIST unitConstant constant NMTOKEN #REQUIRED >
865<!ATTLIST unitConstant value CDATA #REQUIRED >
866<!ATTLIST unitConstant status NMTOKEN #IMPLIED >
867<!ATTLIST unitConstant description CDATA #IMPLIED >
868```
869
870Many of the elements allow for a common @description attribute, to disambiguate the main attribute value or to explain the choice of other values. For example:
871```xml
872<unitConstant constant="glucose_molar_mass" value="180.1557"
873  description="derivation from the mean atomic weights according to STANDARD ATOMIC WEIGHTS 2019 on https://ciaaw.org/atomic-weights.htm"/>
874```
875
876The data uses a small set of constants for readability, such as:
877
878```xml
879<unitConstant constant="ft_to_m" value="0.3048" />
880<unitConstant constant="ft2_to_m2" value="ft_to_m*ft_to_m" />
881```
882The order of the elements in the file is significant.
883
884Each constant can have a value based on simple expressions using numbers, previous constants, plus the operators * and /. Parentheses are not allowed. The operator * binds more tightly than /, which may be unexpected. Thus a * b / c * d is interpreted as (a * b) / (c * d). A consequence of that is that a * b / c * d = a * b / c / d. In the value, the numbers represent rational values. So 0.3048 is interpreted as exactly 3048 / 10000.
885
886In the above case, ft2-to-m2 is a conversion constant for going from square feet to square meters. The expression evaluates to 0.09290304. Where the constants cannot be expressed as rationals, or where their interpretation is fluid, that is marked with a status value:
887
888```xml
889<unitConstant constant="PI" value="411557987 / 131002976" status='approximate' />
890```
891
892In such cases, software may decide to use different values for accuracy.
893
894An implementation need not use rationals directly for conversion; it could use doubles, for example, if only double accuracy is needed.
895
896### Conversion Data
897
898```xml
899<!ELEMENT convertUnits ( convertUnit* ) >
900
901<!ELEMENT convertUnit EMPTY >
902
903<!ATTLIST convertUnit source NMTOKEN #REQUIRED >
904
905<!ATTLIST convertUnit baseUnit NMTOKEN #REQUIRED >
906
907<!ATTLIST convertUnit factor CDATA #IMPLIED >
908
909<!ATTLIST convertUnit offset CDATA #IMPLIED >
910
911<!ATTLIST convertUnit special NMTOKEN #IMPLIED >
912
913<!ATTLIST convertUnit systems NMTOKENS #IMPLIED >
914
915<!ATTLIST convertUnit description CDATA #IMPLIED >
916```
917
918The conversion data provides the data for converting all of the cldr unit identifiers to base units, and back. That allows conversion between any two convertible units, such as two units of length. For any two convertible units (such as acre and dunum) the first can be converted to the base unit (square-meter), then that base unit can be converted to the second unit.
919
920The data is expressed as conversions to the base unit from the source unit. The information can also be used for the conversion back.
921
922Examples:
923
924```xml
925<convertUnit source='carat' baseUnit='kilogram' factor='0.0002'/>
926
927<convertUnit source='gram' baseUnit='kilogram' factor='0.001'/>
928
929<convertUnit source='ounce' baseUnit='kilogram' factor='lb_to_kg/16' systems="ussystem uksystem"/>
930
931<convertUnit source='fahrenheit' baseUnit='kelvin' factor='5/9' offset='2298.35/9' systems="ussystem uksystem"/>
932```
933
934For example, to convert from 3 carats to kilograms, the factor 0.0002 is used, resulting in 0.0006. To convert between carats and ounces, first the carets are converted to kilograms, then the kilograms to ounces (by reversing the mapping).
935
936The factor and offset use the same structure as in the value in unitConstant; in particular, * binds more tightly than /.
937
938The conversion may also require an offset, such as the following:
939
940```xml
941<convertUnit source='fahrenheit' baseUnit='kelvin' factor='5/9' offset='2298.35/9' systems="ussystem uksystem"/>
942```
943
944The factor and offset can be simple expressions, just like the values in the unitConstants.
945
946Where a factor is not present, the value is 1; where an offset is not present, the value is 0.
947
948Instead of using `factor` and possibly `offset`, the `convertUnit` element can specify a `special` conversion that cannot be described by factor and offset (and this attribute cannot be used in conunction with factor and offset). For example:
949
950```xml
951<convertUnit source='beaufort' baseUnit='meter-per-second' special='beaufort' systems="metric_adjacent"/>
952```
953
954The only `special` conversion currently supported is for beaufort.
955
956The `systems` attribute indicates the measurement system(s) or other characteristics of a set of unts. Multiple values may be given; for example, a unit could be marked as systems="`si_acceptable` `metric_adjacent` `prefixable`".
957
958The allowed attributes are the following:
959
960Attribute Value   | Description
961------------      | -------------
962`si`              | The _International System of Units (SI)_ See [NIST Guide to the SI, Chapter 4: The Two Classes of SI Units and the SI Prefixes](https://www.nist.gov/pml/special-publication-811/nist-guide-si-chapter-4-two-classes-si-units-and-si-prefixes). Examples: meter, ampere.
963`si_acceptable`   | Units acceptable for use with the SI. See [NIST Guide to the SI, Chapter 5: Units Outside the SI](https://www.nist.gov/pml/special-publication-811/nist-guide-si-chapter-5-units-outside-si). Examples: hour, liter, knot, hectare.
964`metric`          | A superset of the _si_ units
965`metric_adjacent` | Units commonly accepted in some countries that follow the metric system. Examples: month, arc-second, pound-metric (= ½ kilogram), mile-scandinavian.
966`ussystem`        | The inch-pound system as used in the US, also called _US Customary Units_.
967`uksystem`        | The inch-pound system as used in the UK, also called _British Imperial Units_, differing mostly in units of volume
968`jpsystem`        | Traditional units used in Japan. For examples, see [Japanese units of measurement](https://en.wikipedia.org/wiki/Japanese_units_of_measurement).
969`astronomical`    | Additional units used in astronomy. Examples: parsec, light-year, earth-mass
970`person_age`      | Special units used for people’s ages in some languages. Except for translation, they have the same system as the associated regular units.
971`currency`        | Currency units. These are constructed algorithmically from the Unicode currency identifiers, and do not occur in the child elements of `convertUnits`. Examples: curr-usd (US dollar), curr-eur (Euro).
972`prefixable`      | Those units that typically use SI prefixes or the [IEC binary prefixes](https://www.nist.gov/pml/special-publication-811/nist-guide-si-appendix-d-bibliography#05). This can include measures like `parsec` that are not SI units. It allows implementations to group those units together, and to do sanity checks on the prefix+unit combinations, if they choose. However, implementations may choose to allow prefixes on other units, especially since there is a significant variance in usage: even a term like `megafoot` might be acceptable in some contexts.
973
974Over time, additional systems may be added, and the systems for a particular unit may be refined.
975
976#### Derived Unit System
977
978The systems attributes also apply to compound units, and are computed in the following way.
979
9801. The `prefixable` system is only applicable to base_components, and is thus removed
9812. The `number_prefixes`, `dimensionality_prefix`, `si_prefix`, and `binary_prefix` are ignored
982   * Example: systems(square-kilometer) = systems(meter)
9833. Currency units have the `currency` system
984   * Example: systems(curr-usd) = {currency}
9854. Units linked by `-and-`, `-per-`, and *adjacency* are resolved using a modified intersection, where:
986   1. The intersection of {… si …} and {… si_acceptable … } is {… si_acceptable …}
987   2. The intersection of {… metric …} and {… metric_adjacent … } is {… metric_adjacent …}
988
989Examples:
990```
991systems(liter-per-hectare)
992	= {si_acceptable metric} ∪ {si_acceptable metric}
993	= {si_acceptable metric}
994systems(meter-per-hectare)
995	= {si metric} ∩ {si_acceptable metric}
996	= {si_acceptable metric}
997systems(mile-scandinavian-per-hour)
998	= {metric_adjacent} ∩ {si_acceptable metric_adjacent}
999	= {metric_adjacent}
1000```
1001
1002#### Conversion Mechanisms
1003
1004CLDR follows conversion values where possible from:
1005* [NIST Special Publication 1038](https://www.govinfo.gov/content/pkg/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4/pdf/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4.pdf)
1006* [International Astronomical Union General Assembly](https://arxiv.org/pdf/1510.07674.pdf)
1007
1008See also [NIST Guide to the SI, Chapter 4: The Two Classes of SI Units and the SI Prefixes](https://www.nist.gov/pml/special-publication-811/nist-guide-si-chapter-4-two-classes-si-units-and-si-prefixes)
1009
1010For complex units, such as _pound-force-per-square-inch_, the conversions are computed by combining the conversions of each of the simple units: _pound-force_ and _inch_. Because the conversions in convertUnit are reversible, the computation can go from complex source unit to complex base unit to complex target units.
1011
1012Here is an example:
1013
1014> **50 foot-per-minute ⟹ X mile-per-hour**
1015> ⟹ source: 1 foot
1016> ⟹ factor: 381 / 1250 = 0.3048 meter
1017> ⟹ source: 1 minute
1018> ⟹ factor: 60 second
1019> ⟹ intermediate: 127 / 500 = 0.254 meter-per-second
1020> ⟹ mile-per-hour
1021> ⟹ source: 1 mile
1022> ⟹ factor: 201168 / 125 = 1609.344 meter
1023> ⟹ source: 1 hour
1024> ⟹ factor: 3600 second
1025> ⟹ target: 25 / 44 ≅ 0.5681818 mile-per-hour
1026
1027**Reciprocals.** When you convert a complex unit to another complex unit, you typically convert the source to a complex base unit (like _meter-per-cubic-meter_), then convert the latter backwards to the desired target. However, there may not be a matching conversion from that complex base unit to the desired target unit. That is the case for converting from _mile-per-gallon_ (used in the US) to _liter-per-100-kilometer_ (used in Europe and elsewhere). When that happens, the reciprocal of the complex base unit is used, as in the following example:
1028
1029> **50 mile-per-gallon ⟹ X liter-per-100-kilometer**
1030> ⟹ source: 1 mile
1031> ⟹ factor: 201168 / 125 = 1609.344 meter
1032> ⟹ source: 1 gallon
1033> ⟹ factor: 473176473 / 125000000000 ≅ 0.003785412 cubic-meter
1034> ⟹ intermediate: 2400000000000 / 112903 ≅ 2.125719E7 meter-per-cubic-meter
1035> ⟹ liter-per-100-kilometer
1036> ⟹ source: 1 liter
1037> ⟹ factor: 1 / 1000 = 0.001 cubic-meter
1038> ⟹ source: 1 100-kilometer
1039> ⟹ factor: 100000 meter
1040> **⟹ 1/intermediate: 112903 / 2400000000000 ≅ 4.704292E-8 cubic-meter-per-meter**
1041> ⟹ target: 112903 / 24000 ≅ 4.704292 liter-per-100-kilometer
1042
1043This applies to more than just these cases: one can convert from any unit to related reciprocals as in the following example:
1044
1045> **50 foot-per-minute ⟹ X hour-per-mile**
1046> ⟹ source: 1 foot
1047> ⟹ factor: 381 / 1250 = 0.3048 meter
1048> ⟹ source: 1 minute
1049> ⟹ factor: 60 second
1050> ⟹ intermediate: 127 / 500 = 0.254 meter-per-second
1051> ⟹ hour-per-mile
1052> ⟹ source: 1 hour
1053> ⟹ factor: 3600 second
1054> ⟹ source: 1 mile
1055> ⟹ factor: 201168 / 125 = 1609.344 meter
1056> **⟹ 1/intermediate: 500 / 127 ≅ 3.937008 second-per-meter**
1057> ⟹ target: 44 / 25 = 1.76 hour-per-mile
1058
1059#### Exceptional Cases
1060
1061##### Identities
1062
1063For completeness, identity mappings are also provided for the base units themselves, such as:
1064
1065```xml
1066<convertUnit source='meter' baseUnit='meter' />
1067```
1068
1069##### Aliases
1070
1071In a few instances the old identifiers are deprecated in favor of regular syntax. Implementations should handle both on input:
1072
1073```xml
1074<unitAlias type="meter-per-second-squared" replacement="meter-per-square-second" reason="deprecated"/>
1075<unitAlias type="liter-per-100kilometers" replacement="liter-per-100-kilometer" reason="deprecated"/>
1076<unitAlias type="pound-foot" replacement="pound-force-foot" reason="deprecated"/>
1077<unitAlias type="pound-per-square-inch" replacement="pound-force-per-square-inch" reason="deprecated"/>
1078```
1079
1080These use the standard alias elements in XML, and are also included in the [units.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml) file.
1081
1082##### “Duplicate” Units
1083
1084Some CLDR units are provided simply because they have different names in some languages. For example, year and year-person, or foodcalorie and kilocalorie. One CLDR unit is not convertible (temperature-generic), it is only used for the translation (where the exact unit would be understood by context).
1085
1086##### Discarding Offsets
1087
1088The temperature units are special. When they represent a scale, they have an offset. But where they represent an amount, such as in complex units, they do not. So celsius-per-second is the same as kelvin-per-second.
1089
1090#### Unresolved Units
1091
1092Some SI units contain the same units in the numerator and denominator, so those cannot be resolved. For example, if cubic-meter-per-meter were always resolved, then _consumption_ (like “liter-per-kilometer”) could not be distinguished from _area_ (square-meter).
1093
1094However, in conversion, it may be necessary to resolve them in order to find a match. For example, kilowatt-hour maps to the base unit kilogram-square-meter-second-per-cubic-second, but that needs to be resolved to kilogram-square-meter-per-square-second in order matched against an _energy._
1095
1096## Quantities and Base Units
1097
1098```xml
1099<!ELEMENT unitQuantities ( unitQuantity* ) >
1100
1101<!ELEMENT unitQuantity EMPTY >
1102
1103<!ATTLIST unitQuantity baseUnit NMTOKEN #REQUIRED >
1104
1105<!ATTLIST unitQuantity quantity NMTOKENS #REQUIRED >
1106
1107<!ATTLIST unitQuantity status NMTOKEN #IMPLIED >
1108
1109<!ATTLIST unitQuantity description CDATA #IMPLIED >
1110```
1111
1112Conversion is supported between comparable units. Those can be simple units, such as length, or more complex ‘derived’ units that are built up from _base units_. The `<unitQuantities>` element provides information on the base units used for conversion. It also supplies information about their _quantity_: mass, length, time, etc., and whether they are simple or not.
1113
1114Examples:
1115
1116```xml
1117<unitQuantity baseUnit='kilogram' quantity='mass' status='simple'/>
1118<unitQuantity baseUnit='meter-per-second' quantity='speed'/>
1119```
1120
1121The order of the elements in the file is significant, since it is used in [Unit_Identifier_Normalization](#Unit_Identifier_Normalization).
1122
1123The quantity values themselves are informative. For example, _force per area_ can be referenced as either _pressure_ or _stress_. The quantity for a complex unit that has a reciprocal is formed by prepending “inverse-” to the quantity, such as _inverse-consumption._
1124
1125The base units for the quantities and the quantities themselves are based on [NIST Special Publication 811](https://www.nist.gov/pml/special-publication-811) and the earlier [NIST Special Publication 1038](https://www.govinfo.gov/content/pkg/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4/pdf/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4.pdf). In some cases, a different unit is chosen for the base. For example, a _revolution_ (360°) is chosen for the base unit for angles instead of the SI _radian_, and _item_ instead of the SI _mole_. Additional base units are added where necessary, such as _bit_ and _pixel_.
1126
1127This data is not necessary for conversion, but is needed for [Unit_Identifier_Normalization](#Unit_Identifier_Normalization). Some of the `unitQuantity` elements are not needed to convert CLDR units, but are included for completeness. Example:
1128
1129```xml
1130<unitQuantity baseUnit='ampere-per-square-meter' quantity='current-density'/>
1131```
1132
1133### UnitType vs Quantity
1134
1135The unitType (as in “length-meter”) is not the same as the quantity. It is often broader: for example, the unitType _electric_ corresponds to the quantities _electric-current, electric-resistance,_ and _voltage_. The unitType itself is also informative, and can be dropped from a long unit identifier to get a still-unique short unit identifier.
1136
1137### <a name="Unit_Identifier_Normalization" href="#Unit_Identifier_Normalization">Unit Identifier Normalization</a>
1138
1139There are many possible ways to construct complex units. For comparison of unit identifiers, an implementation can normalize in the following way:
1140
11411. Convert all but the first -per- to simple multiplication. The result then has the format of /numerator ( -per- denominator)?/
1142   * foot-per-second-per-second ⇒ foot-per-second-second
11432. Within each of the numerator and denominator:
11443. Convert multiple instances of a unit into the appropriate power.
1145   * foot-per-second-second ⇒ foot-per-square-second
1146   * kilogram-meter-kilogram ⇒ meter-square-kilogram
11474. For each single unit, disregarding prefixes and powers, get the order of the _simple_ unit among the `unitQuantity` elements in the [units.xml](https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml). Sort the single units by that order, using a stable sort. If there are private-use single units, sort them after all the non-private use single units.
1148   * meter-square-kilogram => square-kilogram-meter
1149   * meter-square-gram ⇒ square-gram-meter
11505. As an edge case, there could be two adjacent single units with the same _simple_ unit but different prefixes, such as _meter-kilometer_. In that case, sort the larger prefixes first, such as _kilometer-meter_ or _kibibyte-kilobyte_.
11516. Within private-use single units, sort by the simple unit alphabetically.
1152
1153The examples in #4 are due to the following ordering of the `unitQuantity` elements:
1154
1155```xml
11561.  <unitQuantity baseUnit='candela' quantity='luminous-intensity' status='simple'/>
11572.  <unitQuantity baseUnit='kilogram' quantity='mass' status='simple'/>
11583.  <unitQuantity baseUnit='meter' quantity='length' status='simple'/>
11594.  …
1160```
1161
1162## Mixed Units
1163
1164Mixed units, or unit sequences, are units with the same base unit which are listed in sequence.
1165Common examples are feet and inches; meters and centimeters; hours, minutes, and seconds; degrees, minutes, and seconds.
1166Mixed unit identifiers are expressed using the "-and-" infix, as in "foot-and-inch", "meter-and-centimeter", "hour-and-minute-and-second", "degree-and-arc-minute-and-arc-second."
1167
1168Scalar values for mixed units are expressed in the largest unit, according to the sort order discussed above in "Normalization".
1169For example, numbers for "foot-and-inch" are expressed in feet.
1170
1171Mixed unit identifiers should be from highest to lowest (eg foot-and-inch instead of inch-and-foot), and that is reflected in the display.
1172If it turns out that some locales present certain mixed units in a different order, additional structure will be needed in CLDR.
1173
1174Only the lowest unit can have decimal fractions; the higher units will be integers, so no "3.5 feet 3 inches".
1175If a number is negative, then only the highest unit shows the minus sign: eg, "-3 hours 27 minutes".
1176If one of the units is zero, then it is normally omitted: eg, "3 feet" instead of "3 feet 0 inches".
1177However, when all of the units would be omitted, then the highest unit is shown with zero: eg "0 feet".
1178
1179Implementations may offer mechanisms to control the precision of the formatted mixed unit. Examples include, but are not limited to:
1180* An implementation could apply the precision of a number formatter to the final unit.
1181  However, this mechanisim has a couple of disadvantages, such as matching precision across user preferences. For example, suppose the input amount is 1.5254 and the precision is 2 decimals.
1182    * Locale A uses decimal degrees and gets 1.53°.
1183    * Locale B uses degrees, minutes, seconds, and gets 1° 31′ 31.44″
1184	* Locale B has an unnecessarily precise result: the equivalent of 1.52540 in precision.
1185* An implementation could allow a percentage precision;
1186  thus 1612 meters with ±1% precision would be represented by **1 mile** rather than **1 mile 9 feet**.
1187
1188The default behavior is to round the lowest unit to the nearest integer.
1189Thus 1.99959 degree-and-arc-minute-and-arc-second would be (before rounding) **1 degree 59 minutes 58.524 seconds**.
1190After rounding it would be **1 degree 59 minutes 59 seconds**.
1191
1192If the lowest unit would round to zero, or round up to the size of the next higher unit, then the next higher unit is rounded instead, recursively.
1193Thus 1.999862 degree-and-arc-minute-and-arc-second would be (before rounding) **1 degree 59 minutes 59.5032 degrees**.
1194After rounding the last unit it would be **1 degree 59 minutes 60 seconds**, which rounds up to **1 degree 60 minutes**, which rounds up to  **2 degrees**.
1195This behavior can be determined before having to compute the lower units:
1196for example, where rounding to the second, if the remainder in degrees is below 1/120 degrees or above 119/120 degrees, then the degrees can be rounded without computing the minutes or seconds.
1197
1198## Testing
1199
1200The files in the directory [cldr/common/testData/units/](https://github.com/unicode-org/cldr/tree/main/common/testData/units) are provided for testing implementations.
12011. The [unitsTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitsTest.txt) file supplies a list of all the CLDR units with conversions
12022. The [unitPreferencesTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitPreferencesTest.txt) file supplied tests for user preferences
12033. The [unitLocalePreferencesTest.txt](https://github.com/unicode-org/cldr/blob/main/common/testData/units/unitLocalePreferencesTest.txt) file provides examples for testing the interactions between locale identifiers and unit preferences.
1204
1205Instructions for use are supplied in the header of the file.
1206
1207## <a name="Unit_Preferences" href="#Unit_Preferences">Unit Preferences</a>
1208
1209Different locales have different preferences for which unit or combination of units is used for a particular usage, such as measuring a person’s height. This is more fine-grained than merely a preference for metric versus US or UK measurement systems. For example, one locale may use meters alone, while another may use centimeters alone or a combination of meters and centimeters; a third may use inches alone, or (informally) a combination of feet and inches.
1210
1211### <a name="Unit_Preferences_Overrides" href="#Unit_Preferences_Overrides">Unit Preferences Overrides</a>
1212
1213The determination of preferred units uses the user preference data together with **input unit**, the **input usage**, and the **input locale identifer**.
1214Within the locale identifier, the subtags that can affect the result are:
1215  * the value of the keys mu, ms, and rg
1216  * the region in the locale identifier (if there is one)
1217  * and otherwise the likely region subtag for the locale identifier
1218
1219The strongest priority is the mu key, then the ms key, then the rg key.
1220Beyond that the region of the locale identifer is used, and if not present, the likely-subtag region.
1221For example:
1222
1223|   | Locale                                | Result     | Comment                                                            |
1224|---|---------------------------------------|------------|--------------------------------------------------------------------|
1225| 1 | en-u-rg-uszzzz-ms-ussystem-mu-celsius | Celsius    | despite the rg and ms settings for US, and the likely region of US |
1226| 2 | en-u-rg-uszzzz-ms-metric              | Celsius    | despite the rg setting for US, and the likely region of US         |
1227| 3 | en-u-rg-dezzzz.                       | Celsius    | despite the likely region of US                                    |
1228| 4 | en-DE                                 | Celsius    | because explicit region is DE                                      |
1229| 5 | en                                    | Fahrenheit | because the likely region for en with no region is US              |
1230
1231If any key-values are invalid, then they are ignored. Thus the following constructs are ignored:
1232
1233| subtags | reason |
1234| --- | --- |
1235| -mu-smoot | invalid unit |
1236| -ms-stanford | invalid unit system |
1237| -rg-abzzzz | invalid region 'AB' ‡|
1238| -AB | invalid region 'AB'|
1239
1240‡ Only the region portion is currently used.
1241The -rg-abzzzz is ignored because AB is invalid;
1242if it were -rg-ustuvxy, it would not be ignored because US is valid.
1243The table below shows when the region portion is valid or not.
1244
1245| Key-value | Region | Valid? | Comment |
1246| --- | --- | --- | --- |
1247| -rg-usut | US | Yes | Both the region portion (US) and the subdivision portion (ut = Utah) are valid. |
1248| -rg-uszzzz | US | Yes | Both the region portion (US) and the subdivision portion (zzzz = all) are valid. |
1249| -rg-usabc | US | Yes | The region portion (US) is valid, but the subdivision portion (abc) is not. |
1250| -rg-abzzzz | AB | No, ignored | The region portion (AB) is invalid, and thus the -rg is ignored, not matter that the subdivision portion (zzzz) is. |
1251
1252The following algorithm is used to compute the override units, regions, and category.
1253The latter two items are used in the [Unit Preferences Data](#Unit_Preferences_Data).
1254
1255#### Compute override units
1256If there is a valid -mu value then let the **output unit** be the that value, and return it.
1257This terminates the algorithm; there is no need to use the unit preferences information.
1258
1259#### Compute  regions
1260If there is no valid -mu value, the following steps are used to determine a region R from the **input locale identifer**.
1261(and optionally a Unit Systems Match (USM)):
1262
12631. If there is a valid -ms value then let USM  be the corresponding value in column 2 of the table below.
1264Otherwise FR is not used. In either case continue with step 2.
12652. If there is a valid -rg region portion of the rg value, let R be that region, and go to Compute the category.
1266	* See the table above for the examples `usut`, `usabc`, and `abcdef`
12674. If there is a valid region in the locale, let R be that region, and go to Compute the category.
12685. Otherwise, compute the likely subtags for the locale.
1269     1. If there is a likely region, then let R be that region, and go to Compute the category.
1270	 2. Otherwise, let R be 001, and go to Compute the category
1271
1272| Key-Value   | Unit Systems Match          | Fallback Region for Unit Preferences |
1273|-------------|-----------------------------|--------------------------------------|
1274| ms-metric   | metric OR metric_adjacent   | 001                                  |
1275| ms-ussystem | ussystem                    | US                                   |
1276| ms-uksystem | uksystem                    | UK                                   |
1277
1278#### Compute the category
1279
1280A **category** is determined as follows from the input unit:
1281
12821. From the input unit, use the conversion data in [baseUnit](tr35-info.md#Unit_Conversion) and let the **input base unit** be the baseUnit attribute value.
1283    * eg, for `pound-force` the baseUnit is `kilogram-meter-per-square-second`.
12842. If there is no such base unit (such as for a an unusual unit like `ampere-pound-per-foot-square-minute`),
1285   convert the input unit to a combination of base units, reduce to lowest terms, and normalize.
1286   Let the **input base unit** be that value.
1287       * eg, `ampere-pound-per-foot-square-minute` ⇒ `kilogram-ampere-per-meter-square-second`
12883. If the **input base unit** has a unitQuantity element, then let the **category** be the quantity attribute value.
1289       * eg, `force` from `<unitQuantity baseUnit='kilogram-meter-per-square-second' quantity='force'/>`
12904. If the **input base unit** does not have a unitQuantity, let the output unit be the input base unit.
1291   An implementation may also set it to an equivalent metric/SI unit, as in the example below.
1292   This terminates the algorithm; there is no need to use the unit preferences information.
1293      * For example, for `ampere-pound-per-foot-square-minute` an implementation could return `kilogram-ampere-per-meter-square-second` or `pascal-ampere`.
1294      * That is, an implementation can use shorter metric/SI units as long as long as the combination is equivalent in value.
1295
1296### <a name="Unit_Preferences_Data" href="#Unit_Preferences_Data">Unit Preferences Data</a>
1297
1298The CLDR data is intended to map from a particular usage — e.g. measuring the height of a person or the fuel consumption of an automobile — to the unit or combination of units typically used for that usage in a given region. Considerations for such a mapping include:
1299
1300* The list of possible usages is large and open-ended, and will be extended in the future.
1301* Even for a given usage such a measuring a road distance, there are different choices of units based on the particular distance.
1302  For example, one set of units may be used for indicating the distance to the next city (kilometers or miles), while another may be used for indicating the distance to the next exit (meters, yards, or feet).
1303* There are also differences between more formal usage (official signage, medical records) and more informal usage (conversation, texting).
1304* For some usages, the measurement may be expressed using a sequence of units, such as “1 meter, 78 centimeters” or “12 stone, 2 pounds”.
1305
1306The DTD structure is as follows:
1307
1308```xml
1309<!ELEMENT unitPreferenceData ( unitPreferences* ) >
1310
1311<!ELEMENT unitPreferences ( unitPreference* ) >
1312<!ATTLIST unitPreferences category NMTOKEN #REQUIRED >
1313<!ATTLIST unitPreferences usage NMTOKENS #REQUIRED >
1314
1315<!ELEMENT unitPreference ( #PCDATA ) >
1316<!ATTLIST unitPreference regions NMTOKENS #REQUIRED >
1317<!ATTLIST unitPreference geq NMTOKEN #IMPLIED >
1318<!ATTLIST unitPreference skeleton CDATA #IMPLIED >
1319```
1320
1321| Term | Description |
1322|---|---|
1323| category | A unit quantity, such as “area” or “length”. See [Unit Conversion](#Unit_Conversion) |
1324| usage | A type of usage, such as person-height. |
1325| regions | One or more region identifiers (macroregions or regions), such as 001, US. (Note that this field may be extended in the future to also include subdivision identifiers and/or language identifiers, such as usca, and de-CH.) |
1326| geq | A threshold value, in a unit determined by the unitPreference element value. The unitPreference element is only used for values higher than this value (and lower than any higher value).<br/>The value must be non-negative. For picking negative units (-3 meters), use the absolute value to pick the unit. |
1327| skeleton | A skeleton in the ICU number format syntax, that is to be used to format the output unit amount. |
1328
1329
1330Logically, the unit preferences data is a map from categories to a map of usages to a map of regions to a list of ranked units and optional formats.
1331
1332**Note:** As of CLDR 37, the `<unitPreference>` `geq` attribute replaces the now-deprecated `<unitPreferences>` `scope` attribute.
1333
1334#### Examples:
1335
1336```xml
1337<unitPreferences category="length" usage="default">
1338    <unitPreference regions="001">kilometer</unitPreference>
1339    <unitPreference regions="001">meter</unitPreference>
1340    <unitPreference regions="001">centimeter</unitPreference>
1341    <unitPreference regions="US GB">mile</unitPreference>
1342    <unitPreference regions="US GB">foot</unitPreference>
1343    <unitPreference regions="US GB">inch</unitPreference>
1344</unitPreferences>
1345```
1346
1347The above information says that for default usage, in the US people use mile, foot, and inch, where people in the rest of the world (001) use kilometer, meter, and centimeter. Take another example:
1348
1349```xml
1350<unitPreferences category="length" usage="road">
1351    <unitPreference regions="001" geq="0.9">kilometer</unitPreference>
1352    <unitPreference regions="001" geq="300.0" skeleton="precision-increment/50">meter</unitPreference>
1353    <unitPreference regions="001" skeleton="precision-increment/10">meter</unitPreference>
1354    <unitPreference regions="001">meter</unitPreference>
1355    <unitPreference regions="US" geq="0.5">mile</unitPreference>
1356    <unitPreference regions="US" geq="100.0" skeleton="precision-increment/50">foot</unitPreference>
1357    <unitPreference regions="US" skeleton="precision-increment/10">foot</unitPreference>
1358    <unitPreference regions="GB" geq="0.5">mile</unitPreference>
1359    <unitPreference regions="GB" geq="100.0" skeleton="precision-increment/50">yard</unitPreference>
1360    <unitPreference regions="GB">yard</unitPreference>
1361    <unitPreference regions="SE" geq="0.1">mile-scandinavian</unitPreference>
1362</unitPreferences>
1363```
1364
1365The following is the algorithm for computing the preferred output unit from the category, usage, region, and USM.
1366
1367#### Compute the preferred output unit
1368
13691. Let category preferences be the result of a lookup of **category** in the unit preferences.
1370    1. If the lookup fails, let the **output unit** be the input base unit or an equivalent metric/SI unit, and return. This terminates the algorithm.
13712. Let category-usage preferences be the result of a lookup of **input usage** in the category preferences.
1372    1. If the lookup fails, let the **input usage** be its containing usage, and repeat. (This will always terminate is always a 'default' usage for each category.)
1373    2. The containing usage is the result of truncating the last '-' and following text, if there is a '-', and other wise 'default'
1374        * For example, land-agriculture-grain ⊂ land-agriculture ⊂ land ⊂ default
13753. Let ranked units be the result of a lookup of R in the category-usage preferences. There may be both region values and [containment regions](https://www.unicode.org/cldr/charts/latest/supplemental/territory_containment_un_m_49.html).
1376    1. If the lookup of R fails, set R to its containing region and repeat. (This will always terminate because 001 is always present.)
1377        * For example, CH (Switzerland) ⊂ 155 (Western Europe) ⊂ 150 (Europe) ⊂ 001 (World).
1378        * This loop can be optimized to only include containing regions that occur in the data (eg, only 001 in LDML 45).
13794. If there is a USM, and the corresponding Fallback Region is different than R, and any of the units in the ranked list don't match the USM, then let the ranked units be the result of a lookup of the Fallback Region in the category-usage preferences.
1380
1381#### Search the ranked units
1382
1383The ranked units will be of the following form:
1384  ```xml
1385  <unitPreference regions="GB" geq="0.5">mile</unitPreference>
1386  <unitPreference regions="GB" geq="100.0" skeleton="precision-increment/50">yard</unitPreference>
1387  <unitPreference regions="GB">yard</unitPreference>
1388  ```
1389
1390* The geq item gives the value for the unit in the element value (or for the largest unit for mixed units). For example,
1391  * `...geq="0.5">mile<...` is ≥ 0.5 miles
1392  * `...geq="100.0">foot-and-inch<...` is  ≥ 100 feet
1393* If there is no `geq` attribute, then the implicit value is 1.0.
1394* Implementations will probably convert the values into the base units, so that the comparison is fast. Thus the above would be converted internally to something like:
1395  * ≥ 804.672 meters ⇒ mile
1396  * ≥ 30.48 meters ⇒ foot-and-inch
1397
13981. Search for the first matching unitPreference for the absolute value of the input measure. If there is no match (eg < 100 feet in the above example), take the last unitPreference. That is, the last unitPreference is effectively geq="0". In the above example, `<unitPreference regions="GB">yard</unitPreference>` is equivalent to `<unitPreference geq="0" regions="GB">yard</unitPreference>`
1399
1400For completeness, when comparing doubles to the geq values:
1401* Negative numbers are treated as if they were positive, so in the above example -804.672 meters will format as "-0.5 mile".
1402* _infinity_, NaN, and -_infinity_ match the largest possible value. Thus -∞ meters will format as "-∞ miles", not "-∞ yards".
1403
14042. Once a matching `unitPreference` element is found:
1405
1406* The unit is the element value
1407* The skeleton (if there is one) supplies formatting information for the unit. API settings may allow that to be overridden.
1408  * The syntax and semantics for the skeleton value are defined by the [ICU Number Skeletons](https://unicode-org.github.io/icu/userguide/format_parse/numbers/skeletons.html) document.
1409* If the skeleton is missing, the default is skeleton="**precision-integer/@@\***". However, the client can also override or tune the number formatting.
1410* If the unit is mixed (eg foot-and-inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers.
1411
1412### Constraints
1413
1414* For a given category, there is always a “default” usage.
1415* For a given category and usage:
1416  * There is always a 001 region.
1417  * None of the sets of regions can overlap. That is, you can’t have “US” on one line and “US GB” on another. You _can_ have two lines with “US”, for different sizes of units.
1418* For a given category, usage, and region-set
1419  * The unitPreferences are in descending order.
1420
1421#### Examples
1422
1423**Example A: xx-SE-u-ms-metric, length, road**
14241. Fetch the data from `<unitPreferences category="length" usage="road">` for xx-SE
1425```
1426<unitPreference regions="SE">mile-scandinavian</unitPreference>
1427<unitPreference regions="SE">kilometer</unitPreference>
1428<unitPreference regions="SE" geq="300.0" skeleton="precision-increment/50">meter</unitPreference>
1429<unitPreference regions="SE" geq="10" skeleton="precision-increment/10">meter</unitPreference>
1430<unitPreference regions="SE" skeleton="precision-increment/1">meter</unitPreference>
1431```
14322. Meter is **metric**, mile-scandinavian is **metric_adjacent** so they both match the key-value ms-**metric**, so no change is made.
1433
1434**Example B: xx-GB-u-ms-ussystem, volume, fluid**
14351. Fetch the data from `<unitPreferences category="volume" usage="fluid">` for xx-GB
1436```
1437<unitPreference regions="GB">gallon-imperial</unitPreference>
1438<unitPreference regions="GB">fluid-ounce-imperial</unitPreference>
1439```
14402. At least one of {gallon-imperial, fluid-ounce-imperial} does not match ms-**ussystem** so the locale is shifted to xx-**US**, and uses the following:
1441```
1442<unitPreference regions="US">gallon</unitPreference>
1443<unitPreference regions="US">quart</unitPreference>
1444<unitPreference regions="US">pint</unitPreference>
1445<unitPreference regions="US">cup</unitPreference>
1446<unitPreference regions="US">fluid-ounce</unitPreference>
1447<unitPreference regions="US">tablespoon</unitPreference>
1448<unitPreference regions="US">teaspoon</unitPreference>
1449```
1450
1451## Unit APIs
1452APIs should clearly allow for both the use of unit preferences with the above process, and for the _invariant use_ of a unit measure.
1453That is, while an application will usually want to obey the preferences for the locale or in the locale ID, there will definitely be instances where it will want to not use them.
1454For example, in showing the weather, an application may want to show:
1455
1456High today: 68°F (20°C)
1457
1458To do that, the application needs to show the first value with the locale information, and then (a) query what the alternative is, and show the temperature in that.
1459As an example, ICU only uses the unit preferences (with rg, ms, and/or mu and the likely region) in formatting units when a **usage** parameter is set.
1460
1461* * *
1462
1463© 2024–2024 Unicode, Inc.
1464This publication is protected by copyright, and permission must be obtained from Unicode, Inc.
1465prior to any reproduction, modification, or other use not permitted by the [Terms of Use](https://www.unicode.org/copyright.html).
1466Specifically, you may make copies of this publication and may annotate and translate it solely for personal or internal business purposes and not for public distribution,
1467provided that any such permitted copies and modifications fully reproduce all copyright and other legal notices contained in the original.
1468You may not make copies of or modifications to this publication for public distribution, or incorporate it in whole or in part into any product or publication without the express written permission of Unicode.
1469
1470Use of all Unicode Products, including this publication, is governed by the Unicode [Terms of Use](https://www.unicode.org/copyright.html).
1471The authors, contributors, and publishers have taken care in the preparation of this publication,
1472but make no express or implied representation or warranty of any kind and assume no responsibility or liability for errors or omissions or for consequential or incidental damages that may arise therefrom.
1473This publication is provided “AS-IS” without charge as a convenience to users.
1474
1475Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.