1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2"https://www.w3.org/TR/html4/loose.dtd"> 3<html> 4<head> 5 <meta name="generator" content= 6 "HTML Tidy for HTML5 for Apple macOS version 5.6.0"> 7 <meta http-equiv="Content-Type" content= 8 "text/html; charset=utf-8"> 9 <meta http-equiv="Content-Language" content="en-us"> 10 <link rel="stylesheet" href= 11 "../reports.css" type="text/css"> 12 <title>UTS #35: Unicode LDML: Supplemental</title> 13 <style type="text/css"> 14 <!-- 15 .dtd { 16 font-family: monospace; 17 font-size: 90%; 18 background-color: #CCCCFF; 19 border-style: dotted; 20 border-width: 1px; 21 } 22 23 .xmlExample { 24 font-family: monospace; 25 font-size: 80% 26 } 27 28 .blockedInherited { 29 font-style: italic; 30 font-weight: bold; 31 border-style: dashed; 32 border-width: 1px; 33 background-color: #FF0000 34 } 35 36 .inherited { 37 font-weight: bold; 38 border-style: dashed; 39 border-width: 1px; 40 background-color: #00FF00 41 } 42 43 .element { 44 font-weight: bold; 45 color: red; 46 } 47 48 .attribute { 49 font-weight: bold; 50 color: maroon; 51 } 52 53 .attributeValue { 54 font-weight: bold; 55 color: blue; 56 } 57 58 li, p { 59 margin-top: 0.5em; 60 margin-bottom: 0.5em 61 } 62 63 h2, h3, h4, table { 64 margin-top: 1.5em; 65 margin-bottom: 0.5em; 66 } 67 --> 68 </style> 69</head> 70<body> 71 <table class="header" width="100%"> 72 <tr> 73 <td class="icon"><a href="https://unicode.org"><img alt= 74 "[Unicode]" src="../logo60s2.gif" 75 width="34" height="33" style= 76 "vertical-align: middle; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px; border-top-width: 0px;"></a> 77 <a class="bar" href= 78 "https://www.unicode.org/reports/">Technical Reports</a></td> 79 </tr> 80 <tr> 81 <td class="gray"> </td> 82 </tr> 83 </table> 84 <div class="body"> 85 <h2 style="text-align: center">Unicode Technical Standard #35</h2> 86 <h1>Unicode Locale Data Markup Language (LDML)<br> 87 Part 6: Supplemental</h1> 88 <!-- At least the first row of this header table should be identical across the parts of this UTS. --> 89 <table border="1" cellpadding="2" cellspacing="0" class="wide"> 90 <tr> 91 <td>Version</td> 92 <td>38</td> 93 </tr> 94 <tr> 95 <td>Editors</td> 96 <td>Steven Loomis (<a href= 97 "mailto:srl@icu-project.org">srl@icu-project.org</a>) and 98 <a href="tr35.html#Acknowledgments">other CLDR committee 99 members</a></td> 100 </tr> 101 </table> 102 <p>For the full header, summary, and status, see <a href= 103 "tr35.html">Part 1: Core</a></p> 104 <h3><i>Summary</i></h3> 105 <p>This document describes parts of an XML format 106 (<i>vocabulary</i>) for the exchange of structured locale data. 107 This format is used in the <a href= 108 "https://unicode.org/cldr/">Unicode Common Locale Data 109 Repository</a>.</p> 110 <p>This is a partial document, describing only those parts of 111 the LDML that are relevant for supplemental data. For the other 112 parts of the LDML see the <a href="tr35.html">main LDML 113 document</a> and the links above.</p> 114 <h3><i>Status</i></h3> 115 116 <!-- NOT YET APPROVED 117 <p> 118 <i class="changed">This is a<b><font color="#ff3333"> 119 draft </font></b>document which may be updated, replaced, or superseded by 120 other documents at any time. Publication does not imply endorsement 121 by the Unicode Consortium. This is not a stable document; it is 122 inappropriate to cite this document as other than a work in 123 progress. 124 </i> 125 </p> 126 END NOT YET APPROVED --> 127 <!-- APPROVED --> 128 <p><i>This document has been reviewed by Unicode members and 129 other interested parties, and has been approved for publication 130 by the Unicode Consortium. This is a stable document and may be 131 used as reference material or cited as a normative reference by 132 other specifications.</i></p> 133 <!-- END APPROVED --> 134 135 <blockquote> 136 <p><i><b>A Unicode Technical Standard (UTS)</b> is an 137 independent specification. Conformance to the Unicode 138 Standard does not imply conformance to any UTS.</i></p> 139 </blockquote> 140 <p><i>Please submit corrigenda and other comments with the CLDR 141 bug reporting form [<a href="tr35.html#Bugs">Bugs</a>]. Related 142 information that is useful in understanding this document is 143 found in the <a href="tr35.html#References">References</a>. For 144 the latest version of the Unicode Standard see [<a href= 145 "tr35.html#Unicode">Unicode</a>]. For a list of current Unicode 146 Technical Reports see [<a href= 147 "tr35.html#Reports">Reports</a>]. For more information about 148 versions of the Unicode Standard, see [<a href= 149 "tr35.html#Versions">Versions</a>].</i></p> 150 <!-- This section of Parts should be identical in all of the parts of this UTS. --> 151 <h2><a name="Parts" href="#Parts" id="Parts">Parts</a></h2> 152 <p>The LDML specification is divided into the following 153 parts:</p> 154 <ul class="toc"> 155 <li>Part 1: <a href="tr35.html#Contents">Core</a> (languages, 156 locales, basic structure)</li> 157 <li>Part 2: <a href="tr35-general.html#Contents">General</a> 158 (display names & transforms, etc.)</li> 159 <li>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a> 160 (number & currency formatting)</li> 161 <li>Part 4: <a href="tr35-dates.html#Contents">Dates</a> 162 (date, time, time zone formatting)</li> 163 <li>Part 5: <a href= 164 "tr35-collation.html#Contents">Collation</a> (sorting, 165 searching, grouping)</li> 166 <li>Part 6: <a href= 167 "tr35-info.html#Contents">Supplemental</a> (supplemental 168 data)</li> 169 <li>Part 7: <a href= 170 "tr35-keyboards.html#Contents">Keyboards</a> (keyboard 171 mappings)</li> 172 </ul> 173 <h2><a name="Contents" href="#Contents" id="Contents">Contents 174 of Part 6, Supplemental</a></h2> 175 <!-- START Generated TOC: CheckHtmlFiles --> 176 <ul class="toc"> 177 <li>1 <a href="#Supplemental_Data">Introduction Supplemental 178 Data</a></li> 179 <li>2 <a href="#Territory_Data">Territory Data</a> 180 <ul class="toc"> 181 <li>2.1 <a href= 182 "#Supplemental_Territory_Containment">Supplemental 183 Territory Containment</a></li> 184 <li>2.2 <a href="#Subdivision_Containment">Subdivision 185 Containment</a></li> 186 <li>2.3 <a href= 187 "#Supplemental_Territory_Information">Supplemental 188 Territory Information</a></li> 189 <li>2.4 <a href= 190 "#Territory_Based_Preferences">Territory-Based 191 Preferences</a> 192 <ul class="toc"> 193 <li>2.4.1 <a href= 194 "#Preferred_Units_For_Usage">Preferred Units for 195 Specific Usages</a> 196 <ul class="toc"> 197 <li>Table: <a href= 198 "#Unit_Preferences">Unit Preference 199 Categories</a></li> 200 </ul> 201 </li> 202 </ul> 203 </li> 204 <li>2.5 <a href="#rgScope"><rgScope>: Scope of the 205 “rg” Locale Key</a></li> 206 </ul> 207 </li> 208 <li>3 <a href="#Supplemental_Language_Data">Supplemental 209 Language Data</a> 210 <ul class="toc"> 211 <li>3.1 <a href= 212 "#Supplemental_Language_Grouping">Supplemental Language 213 Grouping</a></li> 214 </ul> 215 </li> 216 <li>4 <a href="#Supplemental_Code_Mapping">Supplemental Code 217 Mapping</a></li> 218 <li>5 <a href="#Telephone_Code_Data">Telephone Code Data</a> 219 (Deprecated)</li> 220 <li>6 <a href="#Postal_Code_Validation">Postal Code 221 Validation (Deprecated)</a></li> 222 <li>7 <a href= 223 "#Supplemental_Character_Fallback_Data">Supplemental 224 Character Fallback Data</a></li> 225 <li>8 <a href="#Coverage_Levels">Coverage Levels</a> 226 <ul class="toc"> 227 <li>8.1 <a href= 228 "#Coverage_Level_Definitions">Definitions</a></li> 229 <li>8.2 <a href="#Coverage_Level_Data_Requirements">Data 230 Requirements</a></li> 231 <li>8.3 <a href="#Coverage_Level_Default_Values">Default 232 Values</a></li> 233 </ul> 234 </li> 235 <li>9 <a href="#Appendix_Supplemental_Metadata">Supplemental 236 Metadata</a> 237 <ul class="toc"> 238 <li>9.1 <a href= 239 "#Supplemental_Alias_Information">Supplemental Alias 240 Information</a> 241 <ul class="toc"> 242 <li>Table: <a href="#Alias_Attribute_Values">Alias 243 Attribute Values</a></li> 244 </ul> 245 </li> 246 <li>9.2 <a href= 247 "#Supplemental_Deprecated_Information">Supplemental 248 Deprecated Information (Deprecated)</a></li> 249 <li>9.3 <a href="#Default_Content">Default 250 Content</a></li> 251 </ul> 252 </li> 253 <li>10 <a href="#Metadata_Elements">Locale Metadata 254 Elements</a></li> 255 <li>11 <a href="#Version_Information">Version 256 Information</a></li> 257 <li>12 <a href="#Parent_Locales">Parent Locales</a></li> 258 <li>13 <a href="#Unit_Conversion" >Unit Conversion</a></li> 259 <li>14 <a href="#Unit_Preferences">Unit Preferences</a></li> 260 </ul> 261 <!-- END Generated TOC: CheckHtmlFiles --> 262 <h2>1 Introduction <a name="Supplemental_Data" href= 263 "#Supplemental_Data" id="Supplemental_Data">Supplemental 264 Data</a></h2> 265 <p>The following represents the format for additional 266 supplemental information. This is information that is important 267 for internationalization and proper use of CLDR, but is not 268 contained in the locale hierarchy. It is not localizable, nor 269 is it overridden by locale data. The current CLDR data can be 270 viewed in the <a href= 271 "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/index.html"> 272 Supplemental Charts</a>.</p> 273 <p class="dtd"> 274 <!-- t d {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}--> 275 <!ELEMENT supplementalData (version, generation?, 276 cldrVersion?, currencyData?, territoryContainment?, 277 subdivisionContainment?, languageData?, territoryInfo?, 278 postalCodeData?, calendarData?, calendarPreferenceData?, 279 weekData?, timeData?, measurementData?, unitPreferenceData?, 280 timezoneData?, characters?, transforms?, metadata?, 281 codeMappings?, parentLocales?, likelySubtags?, metazoneInfo?, 282 plurals?, telephoneCodeData?, numberingSystems?, 283 bcp47KeywordMappings?, gender?, references?, languageMatching?, 284 dayPeriodRuleSet*, metaZones?, primaryZones?, windowsZones?, 285 coverageLevels?, idValidity?, rgScope?) ></p> 286 <p>The data in CLDR is presently split into multiple files: 287 supplementalData.xml, supplementalMetadata.xml, characters.xml, 288 likelySubtags.xml, ordinals.xml, plurals.xml, 289 telephoneCodeData.xml, genderList.xml, plus transforms (see 290 <i>Part 2 Section 10 <a href= 291 "tr35-general.html#Transforms">Transforms</a></i> and <i>Part 2 292 Section 10.3 <a href= 293 "tr35-general.html#Transform_Rules_Syntax">Transform Rule 294 Syntax</a></i>). The split is just for convenience: logically, 295 they are treated as though they were a single file. Future 296 versions of CLDR may split the data in a different fashion. Do 297 not depend on any specific XML filename or path for 298 supplemental data.</p> 299 <p>Note that <a href="#Metadata_Elements">Chapter 10</a> 300 presents information about metadata that is maintained on a 301 per-locale basis. It is included in this section because it is 302 not intended to be used as part of the locale itself.</p> 303 <h2>2 <a name="Territory_Data" href="#Territory_Data" id= 304 "Territory_Data">Territory Data</a></h2> 305 <h3>2.1 <a name="Supplemental_Territory_Containment" href= 306 "#Supplemental_Territory_Containment" id= 307 "Supplemental_Territory_Containment">Supplemental Territory 308 Containment</a></h3> 309 <p class="dtd"><!ELEMENT territoryContainment ( group* ) 310 ><br> 311 <!ELEMENT group EMPTY ><br> 312 <!ATTLIST group type NMTOKEN #REQUIRED ><br> 313 <!ATTLIST group contains NMTOKENS #IMPLIED ><br> 314 <!ATTLIST group grouping ( true | false ) #IMPLIED ><br> 315 <!ATTLIST group status ( deprecated, grouping ) #IMPLIED 316 ></p> 317 <p>The following data provides information that shows groupings 318 of countries (regions). The data is based on the [<a href= 319 "tr35.html#UNM49">UNM49</a>]. There is one special code, 320 <code>QO</code> , which is used for outlying areas of Oceania 321 that are typically uninhabited. The territory containment forms 322 a tree with the following levels:</p> 323 <p align="center">World</p> 324 <p align="center">Continent</p> 325 <p align="center">Subcontinent</p> 326 <p align="center">Country</p> 327 <p>Excluding groupings, in this tree:<br></p> 328 <ul> 329 <li>All non-overlapping regions form a strict tree rooted at 330 World</li> 331 <li>All leaf-nodes (country) are always at depth 4. Some of 332 these “country” regions are actually parts of other 333 countries, such as Hong Kong (part of China). Such 334 relationships are not part of the containment data.</li> 335 </ul> 336 <p>For a chart showing the relationships (plus the included 337 timezones), see the <a href= 338 "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html"> 339 Territory Containment Chart</a>. The XML structure has the 340 following form.</p> 341 <pre><territoryContainment></pre> 342 <blockquote> 343 <pre> 344 <group type="001" contains="002 009 019 142 150"/> <!--World --> 345<group type="011" contains="BF BJ CI CV GH GM GN GW LR ML MR NE NG SH SL SN TG"/> <!--Western Africa --> 346<group type="013" contains="BZ CR GT HN MX NI PA SV"/> <!--Central America --> 347<group type="014" contains="BI DJ ER ET KE KM MG MU MW MZ RE RW SC SO TZ UG YT ZM ZW"/> <!--Eastern Africa --> 348<group type="142" contains="030 035 062 145"/> <!--Asia --> 349<group type="145" contains="AE AM AZ BH CY GE IL IQ JO KW LB OM PS QA SA SY TR YE"/> <!--Western Asia --> 350<group type="015" contains="DZ EG EH LY MA SD TN"/> <!--Northern Africa --> 351...</pre> 352 </blockquote> 353 <p>There are groupings that don't follow this regular 354 structure, such as:</p> 355 <pre> 356 <group type="003" contains="013 021 029" grouping="true"/> <!--North America --></pre> 357 <p>These are marked with the attribute <span class= 358 "attribute">grouping</span>="<span class= 359 "attributeValue">true</span>".</p> 360 <p>When groupings have been deprecated but kept around for 361 backwards compatibility, they are marked with the attribute 362 <span class="attribute">status</span>="<span class= 363 "attributeValue">deprecated</span>", like this:</p> 364 <pre> 365 <group type="029" contains="AN" status="deprecated"/> <!--Caribbean --></pre> 366 <p>When the containment relationship itself is a grouping, it 367 is marked with the attribute <span class= 368 "attribute">status</span>="<span class= 369 "attributeValue">grouping</span>", like this:</p> 370 <pre> 371 <group type="150" contains="EU" status="grouping"/> <!--Europe --></pre> 372 <p>That is, the type value isn’t a grouping, but if you filter 373 out groupings you can drop this containment. In the example 374 above, EU is a grouping, and contained in 150.</p> 375 <h3>2.2 <a name="Subdivision_Containment" href= 376 "#Subdivision_Containment" id= 377 "Subdivision_Containment">Subdivision Containment</a></h3> 378 <p class="dtd"><!ELEMENT subdivisionContainment ( subgroup* 379 ) ><br> 380 <br> 381 <!ELEMENT subgroup EMPTY ><br> 382 <!ATTLIST subgroup type NMTOKEN #REQUIRED ><br> 383 <!ATTLIST subgroup contains NMTOKENS #IMPLIED ></p> 384 <p>The subdivision containment data is similar to the territory 385 containment. It is based on ISO 3166-2 data, but may diverge 386 from it in the future.</p> 387 <p class="xmlExample"><subgroup type="BD" contains="bda bdb 388 bdc bdd bde bdf bdg bdh"/><br> 389 <subgroup type="bda" contains="bd02 bd06 bd07 bd25 bd50 390 bd51"/></p> 391 <p>The <strong>type</strong> is a <code><a href= 392 "tr35.html#unicode_region_subtag">unicode_region_subtag</a></code> 393 (territory) identifier for the top level of containment, or a 394 <code><a href= 395 "tr35.html#unicode_subdivision_subtag">unicode_subdivision_id</a></code> 396 for lower levels of containment when there are multiple levels. 397 The <strong>contains</strong> value is a space-delimited list 398 of one or more <code><a href= 399 "tr35.html#unicode_subdivision_subtag">unicode_subdivision_id</a></code> 400 values. In the example above, subdivision bda contains other 401 subdivisions bd02, bd06, bd07, bd25, bd50, bd51.</p> 402 <p>Note: Formerly (in CLDR 28 through 30):</p> 403 <ul> 404 <li>The <strong>type</strong> attribute could only contain a 405 <code>unicode_region_subtag</code>;</li> 406 <li>The <strong>contains</strong> attribute contained 407 <code>unicode_subdivision_suffix</code> values; these are not 408 unique across multiple territories, so...</li> 409 <li>For lower containment levels, a now-deprecated subtype 410 <strong>attribute</strong> was used to specify the parent 411 <code>unicode_subdivision_suffix</code>.</li> 412 </ul>* The type attribute contained only a 413 <code>unicode_region_subtag</code> 414 <code>unicode_subdivision_suffix</code> values were used in the 415 <strong>contains</strong> attribute; these are not unique 416 across multiple territories, so for lower levels a 417 now-deprecated 418 <h3>2.3 <a name="Supplemental_Territory_Information" href= 419 "#Supplemental_Territory_Information" id= 420 "Supplemental_Territory_Information">Supplemental Territory 421 Information</a></h3> 422 <p class="dtd"><!ELEMENT territory ( languagePopulation* ) 423 ><br> 424 <!ATTLIST territory type NMTOKEN #REQUIRED ><br> 425 <!ATTLIST territory gdp NMTOKEN #REQUIRED ><br> 426 <!ATTLIST territory literacyPercent NMTOKEN #REQUIRED 427 ><br> 428 <!ATTLIST territory population NMTOKEN #REQUIRED ><br> 429 <br> 430 <!ELEMENT languagePopulation EMPTY ><br> 431 <!ATTLIST languagePopulation type NMTOKEN #REQUIRED ><br> 432 <!ATTLIST languagePopulation literacyPercent NMTOKEN 433 #IMPLIED ><br> 434 <!ATTLIST languagePopulation writingPercent NMTOKEN #IMPLIED 435 ><br> 436 <!ATTLIST languagePopulation populationPercent NMTOKEN 437 #REQUIRED ><br> 438 <!ATTLIST languagePopulation officialStatus 439 (de_facto_official | official | official_regional | 440 official_minority) #IMPLIED ></p> 441 <p>This data provides testing information for language and 442 territory populations. The main goal is to provide approximate 443 figures for the literate, functional population for each 444 language in each territory: that is, the population that is 445 able to read and write each language, and is comfortable enough 446 to use it with computers. For a chart of this data, see 447 <a href='https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_language_information.html'> 448 Territory-Language Information</a>.</p> 449 <p><em>Example</em></p> 450 <pre style='font-size: 70%'> 451 <territory type="AO" gdp="175500000000" literacyPercent="70.4" population="19088100"> <!--Angola--> 452 <languagePopulation type="pt" populationPercent="67" officialStatus="official"/> <!--Portuguese--> 453 <languagePopulation type="umb" populationPercent="29"/> <!--Umbundu--> 454 <languagePopulation type="kmb" writingPercent="10" populationPercent="25" references="R1034"/> <!--Kimbundu--> 455 <languagePopulation type="ln" populationPercent="0.67" references="R1010"/> <!--Lingala--> 456</territory></pre> 457 <p>Note that reliable information is difficult to obtain; the 458 information in CLDR is an estimate culled from different 459 sources, including the World Bank, CIA Factbook, and others. 460 The GDP and country literacy figures are taken from the World 461 Bank where available, otherwise supplemented by FactBook data 462 and other sources. The GDP figures are “PPP (constant 2000 463 international $)”. Much of the per-language data is taken from 464 the Ethnologue, but is supplemented and processed using many 465 other sources, including per-country census data. (The focus of 466 the Ethnologue is native speakers, which includes people who 467 are not literate, and excludes people who are functional 468 second-language users.) Some references are marked in the XML 469 files, with attributes such as <code>references="R1010"</code> 470 .</p> 471 <p>The percentages may add up to more than 100% due to 472 multilingual populations, or may be less than 100% due to 473 illiteracy or because the data has not yet been gathered or 474 processed. Languages with smaller populations might not be 475 included.</p> 476 <p>The following describes the meaning of some of these 477 terms—as used in CLDR—in more detail.</p> 478 <p><a name="literacy_percent" href="#literacy_percent" id= 479 "literacy_percent">literacy percent for the 480 territory</a> — an estimate of the percentage of the 481 country’s population that is functionally literate.</p> 482 <p><a name="language_population_percent" href= 483 "#language_population_percent" id= 484 "language_population_percent">language population 485 percent</a> — an estimate of the number of people who are 486 functional in that language in that country, including both 487 first and second language speakers. The level of fluency is 488 that necessary to use a UI on a computer, smartphone, or 489 similar devices, rather than complete fluency.</p> 490 <p><a name="literacy_percent_for_langPop" href= 491 "#literacy_percent_for_langPop" id= 492 "literacy_percent_for_langPop">literacy percent for language 493 population</a> — Within the set of people who are 494 functional in the corresponding language (as specified by 495 <a href="#language_population_percent">language population 496 percent</a>), this is an estimate of the percentage of those 497 people who are functionally literate in that language, that is, 498 who are <em>capable</em> of reading or writing in that 499 language, even if they do not regularly use it for reading or 500 writing. If not specified, this defaults to the <a href= 501 "#literacy_percent">literacy percent for the territory</a>.</p> 502 <p><a name="writing_percent" href="#writing_percent" id= 503 "writing_percent">writing percent</a> — Within the set of 504 people who are functional in the corresponding language (as 505 specified by <a href="#language_population_percent">language 506 population percent</a>), this is an estimate of the percentage 507 of those people who regularly read or write a significant 508 amount in that language. Ideally, the regularity would be 509 measured as “7-day actives”. If it is known that the language 510 is not widely or commonly written, but there are no solid 511 figures, the value is typically given 1%-5%.</p> 512 <p>For a language such as Swiss German, which is typically not 513 written, even though nearly the whole native Germanophone 514 population <em>could</em> write in Swiss German, the 515 <a href="#literacy_percent_for_langPop">literacy percent for 516 language population</a> is high, but the <a href= 517 "#writing_percent">writing percent</a> is low.</p> 518 <p><a name="official_language" href="#official_language" id= 519 "official_language">official language</a> — as used in 520 CLDR, a language that can generally be used in all 521 communications with a central government. That is, people can 522 expect that essentially all communication from the government 523 is available in that language (ballots, information pamphlets, 524 legal documents, …) and that they can use that language in any 525 communication to the central government (petitions, forms, 526 filing lawsuits,…).</p> 527 <p>Official languages for a country in this sense are not 528 necessarily the same as those with official legal status in the 529 country. For example, Irish is declared to be an official 530 language in Ireland, but English has no such formal status in 531 the United States. Languages such as the latter are 532 called <em>de facto</em> official languages. As 533 another example, German has legal status in Italy, but cannot 534 be used in all communications with the central government, and 535 is thus not an official language <em>of Italy</em> for CLDR 536 purposes. It is, however, an <em>official regional 537 language</em>. Other languages are declared to be official, but 538 can’t actually be used for all communication with any major 539 governmental entity in the country. There is no intention to 540 mark such nominally official languages as “official” in the 541 CLDR data.</p> 542 <p><a name="official_regional_language" href= 543 "#official_regional_language" id= 544 "official_regional_language">official regional 545 language</a> — a language that is official (<em>de 546 jure</em> or <em>de facto</em>) in a major region within a 547 country, but does not qualify as an official language of the 548 country as a whole. For example, it can be used in an official 549 petition to a provincial government, but not the central 550 government. The term “major” is meant to distinguish from 551 smaller-scale usage, such as for a town or village.</p> 552 <h3>2.4 <a name="Territory_Based_Preferences" href= 553 "#Territory_Based_Preferences" id= 554 "Territory_Based_Preferences">Territory-Based 555 Preferences</a></h3> 556 <p>The default preference for several locale items is based 557 solely on a <a href= 558 "tr35.html#unicode_region_subtag">unicode_region_subtag</a>, 559 which may either be specified as part of a <a href= 560 "tr35.html#unicode_language_id">unicode_language_id</a>, 561 inferred from other locale ID elements using the <a href= 562 "tr35.html#Likely_Subtags">Likely Subtags</a> mechanism, or 563 provided explicitly using an “rg” <a href= 564 "tr35.html#RegionOverride">Region Override</a> locale key. For 565 more information on this process see <a href= 566 "tr35.html#Locale_Inheritance">Locale Inheritance and 567 Matching</a>. The specific items that are handled in this way 568 are:</p> 569 <ul> 570 <li>Default calendar (see <a href= 571 "tr35-dates.html#Calendar_Preference_Data">Calendar 572 Preference Data</a>)</li> 573 <li>Default week conventions (first day of week and weekend 574 days; see <a href="tr35-dates.html#Week_Data">Week 575 Data</a>)</li> 576 <li>Default hour cycle (see <a href= 577 "tr35-dates.html#Time_Data">Time Data</a>)</li> 578 <li>Default currency (see <a href= 579 "tr35-numbers.html#Supplemental_Currency_Data">Supplemental 580 Currency Data</a>)</li> 581 <li>Default measurement system and paper size (see <a href= 582 "tr35-general.html#Measurement_System_Data">Measurement 583 System Data</a>)</li> 584 <li>Default units for specific usage (see <a href= 585 "#Preferred_Units_For_Usage">Preferred Units for Specific 586 Usages</a>, below)</li> 587 </ul> 588 <h4>2.4.1 <a name="Preferred_Units_For_Usage" href= 589 "#Preferred_Units_For_Usage" id= 590 "Preferred_Units_For_Usage">Preferred Units for Specific 591 Usages</a></h4> 592 <p><em>For information about preferred units and unit conversion, see Section 13 <a href="#Unit_Conversion" >Unit Conversion</a> and Section 14 <a href="#Unit_Preferences" >Unit Preferences</a>.</em></p> 593 <h3>2.5 <a name="rgScope" href="#rgScope" id= 594 "rgScope"><rgScope>: Scope of the “rg” Locale 595 Key</a></h3> 596 <p>The supplemental <rgScope> element specifies the data 597 paths for which the region used for data lookup is determined 598 by the value of any “rg” key present in the locale identifier 599 (see <a href="tr35.html#RegionOverride">Region Override</a>). 600 If no “rg” key is present, the region used for lookup is 601 determined as usual: from the unicode_region_subtag if present, 602 else inferred from the unicode_language_subtag. The DTD 603 structure is as follows:</p> 604 <p class="dtd"><!ELEMENT rgScope ( rgPath* ) ><br> 605 <br> 606 <!ELEMENT rgPath EMPTY ><br> 607 <!ATTLIST rgPath path CDATA #REQUIRED ><br></p> 608 <p>The <rgScope> element contains a list of 609 <rgPath> elements, each of which specifies a datapath for 610 which any “rg” key determines the region for lookup. For 611 example:</p> 612 <pre> 613 <rgScope> 614 <rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*'][@cashDigits='*'][@cashRounding='*']" draft="provisional" /> 615 <rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*'][@cashRounding='*']" draft="provisional" /> 616 <rgPath path="//supplementalData/currencyData/fractions/info[@iso4217='#'][@digits='*'][@rounding='*']" draft="provisional" /> 617 <rgPath path="//supplementalData/calendarPreferenceData/calendarPreference[@territories='#'][@ordering='*']" draft="provisional" /> 618 ... 619 <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*'][@scope='*']/unitPreference[@regions='#'][@alt='*']" draft="provisional" /> 620 <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*'][@scope='*']/unitPreference[@regions='#']" draft="provisional" /> 621 <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*']/unitPreference[@regions='#'][@alt='*']" draft="provisional" /> 622 <rgPath path="//supplementalData/unitPreferenceData/unitPreferences[@category='*'][@usage='*']/unitPreference[@regions='#']" draft="provisional" /> 623 </rgScope> 624</pre> 625 <p>The exact format of the path is provisional in CLDR 29, but 626 as currently shown:</p> 627 <ul> 628 <li>An attribute value of '*' indicates that the path applies 629 regardless of the value of the attribute.</li> 630 <li>Each path must have exactly one attribute whose value is 631 marked here as '#'; in actual data items with this path, the 632 corresponding value is a list of region codes. It is the 633 region codes in this list that are compared with the region 634 specified by the “rg” key to determine which data item to use 635 for this path.</li> 636 </ul> 637 <h2>3 <a name="Supplemental_Language_Data" href= 638 "#Supplemental_Language_Data" id= 639 "Supplemental_Language_Data">Supplemental Language 640 Data</a></h2> 641 <p class="dtd"><!ELEMENT languageData ( language* ) ><br> 642 <!ELEMENT language EMPTY ><br> 643 <!ATTLIST language type NMTOKEN #REQUIRED ><br> 644 <!ATTLIST language scripts NMTOKENS #IMPLIED ><br> 645 <!ATTLIST language territories NMTOKENS #IMPLIED ><br> 646 <!ATTLIST language variants NMTOKENS #IMPLIED ><br> 647 <!ATTLIST language alt NMTOKENS #IMPLIED ><br> 648 </p> 649 <p>The language data is used for consistency checking and 650 testing. It provides a list of which languages are used with 651 which scripts and in which countries. To a large extent, 652 however, the territory list has been superseded by the data in 653 <em>Section 2.2 <a href= 654 "#Supplemental_Territory_Information">Supplemental Territory 655 Information</a></em> .</p> 656 <pre> <languageData> 657 <language type="af" scripts="Latn" territories="ZA"/> 658 <language type="am" scripts="Ethi" territories="ET"/> 659 <language type="ar" scripts="Arab" territories="AE BH DZ EG IN IQ JO KW LB 660LY MA OM PS QA SA SD SY TN YE"/> 661 ...</pre> 662 <p>If the language is not a modern language, or the script is 663 not a modern script, or the language not a major language of 664 the territory, then the alt attribute is set to secondary.</p> 665 <pre> 666 <language type="fr" scripts="Latn" territories="IT US" alt="secondary" /> 667 ...</pre> 668 <h2>3.1 <a name="Supplemental_Language_Grouping" href= 669 "#Supplemental_Language_Grouping" id= 670 "Supplemental_Language_Grouping">Supplemental Language 671 Grouping</a></h2> 672 <p><!ELEMENT languageGroups ( languageGroup* ) ><br> 673 <!ELEMENT languageGroup ( #PCDATA ) ><br> 674 <!ATTLIST languageGroup parent NMTOKEN #REQUIRED ></p> 675 <p>The language groups supply language containment. For 676 example, the following indicates that aav is the Unicode 677 language code for a language group that contains caq, crv, 678 etc.</p><code><languageGroup 679 parent="<strong>fiu</strong>">chm et <strong>fi</strong> fit 680 fkv hu izh kca koi krl kv liv mdf mns mrj myv smi udm vep vot 681 vro</languageGroup></code> 682 <p>The vast majority of the languageGroup data is extracted 683 from wikidata, but may be overridden in some cases. The 684 wikidata information is more fine-grained, but makes use of 685 language groups that don't have ISO or Unicode language codes. 686 Those language groups are omitted from the data. For example, 687 wikidata has the following child-parent chain: only the first 688 and last elements are present in the language groups.</p> 689 <table> 690 <tr> 691 <td>Name</td> 692 <td>Wikidata Code</td> 693 <td>Language Code</td> 694 </tr> 695 <tr> 696 <td>Finnish</td> 697 <td><a href= 698 "https://www.wikidata.org/wiki/Q1412">Q1412</a></td> 699 <td>fi</td> 700 </tr> 701 <tr> 702 <td>Finnic languages</td> 703 <td><a href= 704 "https://www.wikidata.org/wiki/Q33328">Q33328</a></td> 705 </tr> 706 <tr> 707 <td>Finno-Samic languages</td> 708 <td><a href= 709 "https://www.wikidata.org/wiki/Q163652">Q163652</a></td> 710 </tr> 711 <tr> 712 <td>Finno-Volgaic languages</td> 713 <td><a href= 714 "https://www.wikidata.org/wiki/Q161236">Q161236</a></td> 715 </tr> 716 <tr> 717 <td>Finno-Permic languages</td> 718 <td><a href= 719 "https://www.wikidata.org/wiki/Q161240">Q161240</a></td> 720 </tr> 721 <tr> 722 <td>Finno-Ugric languages</td> 723 <td><a href= 724 "https://www.wikidata.org/wiki/Q79890">Q79890</a></td> 725 <td>fiu</td> 726 </tr> 727 </table><br> 728 <h2>4 <a name="Supplemental_Code_Mapping" href= 729 "#Supplemental_Code_Mapping" id= 730 "Supplemental_Code_Mapping">Supplemental Code Mapping</a></h2> 731 <p class="dtd"><!ELEMENT codeMappings (languageCodes*, 732 territoryCodes*, currencyCodes*) ></p> 733 <p class="dtd"><!ELEMENT languageCodes EMPTY ><br> 734 <!ATTLIST languageCodes type NMTOKEN #REQUIRED><br> 735 <!ATTLIST languageCodes alpha3 NMTOKEN #REQUIRED></p> 736 <p class="dtd"><!ELEMENT territoryCodes EMPTY ><br> 737 <!ATTLIST territoryCodes type NMTOKEN #REQUIRED><br> 738 <!ATTLIST territoryCodes numeric NMTOKEN #REQUIRED><br> 739 <!ATTLIST territoryCodes alpha3 NMTOKEN #REQUIRED><br> 740 <!ATTLIST territoryCodes fips10 NMTOKEN #IMPLIED><br> 741 <!ATTLIST territoryCodes internet NMTOKENS #IMPLIED> 742 [deprecated]</p> 743 <p class="dtd"><!ELEMENT currencyCodes EMPTY ><br> 744 <!ATTLIST currencyCodes type NMTOKEN #REQUIRED><br> 745 <!ATTLIST currencyCodes numeric NMTOKEN #REQUIRED></p> 746 <p>The code mapping information provides mappings between the 747 subtags used in the CLDR locale IDs (from BCP 47) and other 748 coding systems or related information. The language codes are 749 only provided for those codes that have two letters in BCP 47 750 to their ISO three-letter equivalents. The territory codes 751 provide mappings to numeric (UN M.49 [<a href= 752 "tr35.html#UNM49">UNM49</a>] codes, equivalent to ISO numeric 753 codes), ISO three-letter codes, FIPS 10 codes, and the internet 754 top-level domain codes.</p> 755 <p>The alphabetic codes are only provided where different from 756 the type. For example:</p> 757 <pre> 758 <territoryCodes type="AA" numeric="958" alpha3="AAA"/> 759<territoryCodes type="AD" numeric="020" alpha3="AND" fips10="AN"/> 760<territoryCodes type="AE" numeric="784" alpha3="ARE"/> 761... 762<territoryCodes type="GB" numeric="826" alpha3="GBR" fips10="UK"/> 763... 764<territoryCodes type="QU" numeric="967" alpha3="QUU" internet="EU"/> 765... 766<territoryCodes type="XK" numeric="983" alpha3="XKK"/> 767...</pre> 768 <p>Where there is no corresponding code, sometimes private use 769 codes are used, such as the numeric code for XK.</p> 770 <p>The currencyCodes are mappings from three letter currency 771 codes to numeric values (ISO 4217 <a href= 772 "https://www.currency-iso.org/en/home/tables/table-a1.html">Current 773 currency & funds code list</a>.) The mapping currently 774 covers only current codes and does not include historic 775 currencies. For example:</p> 776 <pre> 777<currencyCodes type="AED" numeric="784"/> 778<currencyCodes type="AFN" numeric="971"/> 779... 780<currencyCodes type="EUR" numeric="978"/> 781... 782<currencyCodes type="ZAR" numeric="710"/> 783<currencyCodes type="ZMW" numeric="967"/> 784</pre> 785 <h2>5 <a name="Telephone_Code_Data" href="#Telephone_Code_Data" 786 id="Telephone_Code_Data">Telephone Code Data</a> 787 (Deprecated)</h2> 788 <p>Deprecated in CLDR v34, and data removed.</p> 789 <p class="dtd"><!ELEMENT telephoneCodeData ( 790 codesByTerritory* ) ><br> 791 <br> 792 <!ELEMENT codesByTerritory ( telephoneCountryCode+ ) 793 ><br> 794 <!ATTLIST codesByTerritory territory NMTOKEN #REQUIRED 795 ><br> 796 <br> 797 <!ELEMENT telephoneCountryCode EMPTY ><br> 798 <!ATTLIST telephoneCountryCode code NMTOKEN #REQUIRED 799 ><br> 800 <!ATTLIST telephoneCountryCode from NMTOKEN #IMPLIED 801 ><br> 802 <!ATTLIST telephoneCountryCode to NMTOKEN #IMPLIED ></p> 803 <p>This data specifies the mapping between ITU telephone 804 country codes [<a href="tr35.html#ITUE164">ITUE164</a>] and 805 CLDR-style territory codes (ISO 3166 2-letter codes or 806 non-corresponding UN M.49 [<a href="tr35.html#UNM49">UNM49</a>] 807 3-digit codes). There are several things to note:</p> 808 <ul> 809 <li>A given telephone country code may map to multiple CLDR 810 territory codes; +1 (North America Numbering Plan) covers the 811 US and Canada, as well as many islands in the Caribbean and 812 some in the Pacific</li> 813 <li>Some telephone country codes are for global services (for 814 example, some satellite services), and thus correspond to 815 territory code 001.</li> 816 <li>The mappings change over time (territories move from one 817 telephone code to another). These changes are usually planned 818 several years in advance, and there may be a period during 819 which either telephone code can be used to reach the 820 territory. While the CLDR telephone code data is not intended 821 to include past changes, it is intended to incorporate known 822 information on planned future changes, using "from" and "to" 823 date attributes to indicate when mappings are valid.</li> 824 </ul> 825 <p>A subset of the telephone code data might look like the 826 following (showing a past mapping change to illustrate the from 827 and to attributes):</p> 828 <pre><codesByTerritory territory="001"> 829 <telephoneCountryCode code="800"/> <!-- International Freephone Service --> 830 <telephoneCountryCode code="808"/> <!-- International Shared Cost Services (ISCS) --> 831 <telephoneCountryCode code="870"/> <!-- Inmarsat Single Number Access Service (SNAC) --> 832</codesByTerritory> 833<codesByTerritory territory="AS"> <!-- American Samoa --> 834 <telephoneCountryCode code="1" from="2004-10-02"/> <!-- +1 684 in North America Numbering Plan --> 835 <telephoneCountryCode code="684" to="2005-04-02"/> <!-- +684 now a spare code --> 836</codesByTerritory> 837<codesByTerritory territory="CA"> 838 <telephoneCountryCode code="1"/> <!-- North America Numbering Plan --> 839</codesByTerritory></pre> 840 <h2>6 <a name="Postal_Code_Validation" href= 841 "#Postal_Code_Validation" id="Postal_Code_Validation">Postal 842 Code Validation (Deprecated)</a></h2> 843 <p>Deprecated in v27. Please see other services that are kept 844 up to date, such as:</p> 845 <ul> 846 <li><a href= 847 "https://i18napis.appspot.com/address/data/US">https://i18napis.appspot.com/address/data/US</a></li> 848 <li><a href= 849 "https://i18napis.appspot.com/address/data/CH">https://i18napis.appspot.com/address/data/CH</a></li> 850 <li>...<br></li> 851 </ul> 852 <p class="dtd"><!ELEMENT postalCodeData (postCodeRegex*) 853 ><br> 854 <!ELEMENT postCodeRegex (#PCDATA) ><br> 855 <!ATTLIST postCodeRegex territoryId NMTOKEN 856 #REQUIRED><br></p> 857 <p>The Postal Code regex information can be used to validate 858 postal codes used in different countries. In some cases, the 859 regex is quite simple, such as for Germany:</p> 860 <pre> 861 <postCodeRegex territoryId="DE" >\d{5}</postCodeRegex></pre> 862 <p>The US code is slightly more complicated, since there is an 863 optional portion:</p> 864 <pre> 865 <postCodeRegex territoryId="US" >\d{5}([ \-]\d{4})?</postCodeRegex></pre> 866 <p>The most complicated currently is the UK.</p> 867 <h2>7 <a name="Supplemental_Character_Fallback_Data" href= 868 "#Supplemental_Character_Fallback_Data" id= 869 "Supplemental_Character_Fallback_Data">Supplemental Character 870 Fallback Data</a></h2> 871 <p class="dtd"><!ELEMENT characters ( character-fallback*) 872 ><br> 873 <br> 874 <!ELEMENT character-fallback ( character* ) ><br> 875 <!ELEMENT character (substitute*) ><br> 876 <!ATTLIST character value CDATA #REQUIRED ><br> 877 <br> 878 <!ELEMENT substitute (#PCDATA) ></p> 879 <p>The characters element provides a way for non-Unicode 880 systems, or systems that only support a subset of Unicode 881 characters, to transform CLDR data. It gives a list of 882 characters with alternative values that can be used if the main 883 value is not available. For example:</p> 884 <pre><characters> 885 <character-fallback> 886 <character value = "ß"> 887 <substitute>ss</substitute> 888 </character> 889 <character value = "Ø"> 890 <substitute>Ö</substitute> 891 <substitute>O</substitute> 892 </character> 893 <character value = "<span style= 894"font-size: 150%">₧</span>"> 895 <substitute>Pts</substitute> 896 </character> 897 <character value = "<span style= 898"font-size: 150%">₣</span>"> 899 <substitute>Fr.</substitute> 900 </character> 901 </character-fallback> 902</characters></pre> 903 <p>The ordering of the substitute elements indicates the 904 preference among them.</p>That is, this data provides 905 recommended fallbacks for use when a charset or supported 906 repertoire does not contain a desired character. There is more 907 than one possible fallback: the recommended usage is that when 908 a character <i>value</i> is not in the desired repertoire the 909 following process is used, whereby the first value that is 910 wholly in the desired repertoire is used. 911 <ul> 912 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 913 <code>toNFC</code>(<i>value</i>)</li> 914 <li style="margin-top: 0.5em; margin-bottom: 0.5em">other 915 canonically equivalent sequences, if there are any</li> 916 <li style="margin-top: 0.5em; margin-bottom: 0.5em">the 917 explicit <i>substitutes</i> value (in order)</li> 918 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 919 <code>toNFKC</code>(<i>value</i>)</li> 920 </ul> 921 <h2>8 <a name="Coverage_Levels" href="#Coverage_Levels" id= 922 "Coverage_Levels">Coverage Levels</a></h2> 923 <p>The following describes the coverage levels used for the 924 current version of CLDR. This list will change between releases 925 of CLDR. Each level adds to what is in the lower level.</p> 926 <table border="1" cellpadding="0" cellspacing="1"> 927 <!-- nocaption --> 928 <tr> 929 <th nowrap> 930 <div align="right"> 931 Level 932 </div> 933 </th> 934 <th colspan="2">Description</th> 935 </tr> 936 <tr> 937 <td nowrap> 938 <div align="right"> 939 0 940 </div> 941 </td> 942 <td>undetermined</td> 943 <td>Does not meet any of the following levels.</td> 944 </tr> 945 <tr> 946 <td nowrap> 947 <div align="right"> 948 10 949 </div> 950 </td> 951 <td>core</td> 952 <td>The CLDR "core" data, which is defined as the basic 953 information about the language and writing system that is 954 required before other information can be added using the 955 CLDR survey tool. See <a href= 956 "http://cldr.unicode.org/index/cldr-spec/minimaldata">http://cldr.unicode.org/index/cldr-spec/minimaldata</a></td> 957 </tr> 958 <tr> 959 <td nowrap> 960 <div align="right"> 961 40 962 </div> 963 </td> 964 <td>basic</td> 965 <td>The minimum amount of locale data deemed necessary to 966 create a "viable" locale in CLDR. Contains names for the 967 languages, scripts, and territories associated with the 968 language, numbering systems used in those languages, date 969 and number formats, plus a few key values such as the 970 values in Section 3.1 <a href= 971 "tr35.html#Unknown_or_Invalid_Identifiers">Unknown or 972 Invalid Identifiers</a>. Also contains data associated with 973 the most prominent languages and countries.</td> 974 </tr> 975 <tr> 976 <td nowrap> 977 <div align="right"> 978 60 979 </div> 980 </td> 981 <td>moderate</td> 982 <td>Contains more types of data and more language and 983 territory names than the basic level. If the language is 984 associated with an EU country, then the moderate level 985 attempts to complete the data as it pertains to all EU 986 member countries.</td> 987 </tr> 988 <tr> 989 <td nowrap> 990 <div align="right"> 991 80 992 </div> 993 </td> 994 <td>modern</td> 995 <td>Contains all fields in normal modern use, including all 996 country names, and currencies in use.</td> 997 </tr> 998 <tr> 999 <td nowrap> 1000 <div align="right"> 1001 100 1002 </div> 1003 </td> 1004 <td>comprehensive</td> 1005 <td>Contains complete localizations (or valid inheritance) 1006 for every possible field.</td> 1007 </tr> 1008 </table> 1009 <p>Levels 40 through 80 are based on the definitions and 1010 specifications listed in <strong>8.1-8.4</strong>. However, 1011 these principles are continually being refined by the CLDR 1012 technical committee, and so do not completely reflect the data 1013 that is actually used for coverage determination, which is 1014 under the XPath 1015 <strong>//supplementalData/CoverageLevels</strong>. For a view 1016 of the trunk version of this data<strike>file</strike>, see 1017 <a href= 1018 "https://github.com/unicode-org/cldr/releases/tag/latest/common/supplemental/coverageLevels.xml"> 1019 coverageLevels.xml</a>. (As described in the <a href= 1020 "tr35-info.html#Supplemental_Data">introduction to Supplemental 1021 Data</a>, the specific XML filename may change.)</p> 1022 <p class="dtd"><!ELEMENT coverageLevels ( 1023 approvalRequirements, coverageVariable*, coverageLevel* ) 1024 ><br> 1025 <!ELEMENT coverageLevel EMPTY ><br> 1026 <!ATTLIST coverageLevel inLanguage CDATA #IMPLIED ><br> 1027 <!ATTLIST coverageLevel inScript CDATA #IMPLIED ><br> 1028 <!ATTLIST coverageLevel inTerritory CDATA #IMPLIED ><br> 1029 <!ATTLIST coverageLevel value CDATA #REQUIRED ><br> 1030 <!ATTLIST coverageLevel match CDATA #REQUIRED ></p> 1031 <p>For example, here is an example coverageLevel line.</p> 1032 <pre><coverageLevel<br> value="30" 1033 inLanguage="(de|fi)" <br> match="localeDisplayNames/types/type[@type='phonebook'][@key='collation']"/></pre> 1034 <p>The coverageLevel elements are read in order, and the first 1035 match results in a coverage level value. The element matches 1036 based on the <span class="attribute">inLanguage</span>, 1037 <span class="attribute">inScript</span>, <span class= 1038 "attribute">inTerritory</span>, and <span class= 1039 "attribute">match</span> attribute values, which are regular 1040 expressions. For example, in the above example, a match occurs 1041 if the language is de or fi, and if the path is a locale 1042 display name for collation=phonebook.</p> 1043 <p>The <span class="attribute">match</span> attribute value 1044 logically has "//ldml/" prefixed before it is applied. In 1045 addition, the "[@" is automatically quoted. Otherwise standard 1046 Perl/Java style regular expression syntax is used.</p> 1047 <p class="dtd"><!ELEMENT coverageVariable EMPTY ><br> 1048 <!ATTLIST coverageVariable key CDATA #REQUIRED ><br> 1049 <!ATTLIST coverageVariable value CDATA #REQUIRED ></p> 1050 <p>The coverageVariable element allows us to create variables 1051 for certain regular expressions that are used frequently in the 1052 coverageLevel definitions above. Each coverage varible must 1053 contain a key / value pair of attributes, which can then be 1054 used to be substituted into a coverageLevel definition 1055 above.</p> 1056 <p>For example, here is an example coverageLevel line using 1057 coverageVariable substitution.</p> 1058 <pre> 1059 <coverageVariable key="%dayTypes" value="(sun|mon|tue|wed|thu|fri|sat)"><br> 1060<coverageVariable key="%wideAbbr" value="(wide|abbreviated)"><br> 1061<coverageLevel value="20" match="dates/calendars/calendar[@type='gregorian']/days/dayContext[@type='format']/dayWidth[@type='%wideAbbr']/day[@type='%dayTypes']"/></pre> 1062 <p>In this example, the coverge variables %dayTypes and 1063 %wideAbbr are used to substitute their respective values into 1064 the match expression. This allows us to reuse the same variable 1065 for other coverageLevel matches that use the same regular 1066 expression fragment.</p> 1067 <p class="dtd"><br> 1068 <!ELEMENT approvalRequirements ( approvalRequirement* ) 1069 ><br> 1070 <!ELEMENT approvalRequirement EMPTY ><br> 1071 <!ATTLIST approvalRequirement votes CDATA #REQUIRED><br> 1072 <!ATTLIST approvalRequirement locales CDATA 1073 #REQUIRED><br> 1074 <!ATTLIST approvalRequirement paths CDATA 1075 #REQUIRED><br></p> 1076 <p>The approvalRequirements allows to specify the number of 1077 survey tool votes required for approval, either based on 1078 locale, or path, or both. Certain locales require a higher 1079 voting threshhold (usually 8 votes instead of 4), in order to 1080 promote greater stability in the data. Furthermore, certain 1081 fields that are very high visibility fields, such as number 1082 formats, require a CLDR TC committee member's vote for 1083 approval.</p> 1084 <p>Here is an example of the approvalRequirements section.</p> 1085 <pre> 1086 <approvalRequirements><br> <!-- "high bar" items --> 1087 <approvalRequirement votes="20" locales="*" paths="//ldml/numbers/symbols[^/]++/(decimal|group)"/> 1088 <!-- established locales - http://cldr.unicode.org/index/process#TOC-Draft-Status-of-Optimal-Field-Value --> 1089 <approvalRequirement votes="8" locales="ar ca cs da de el es fi fr he hi hr hu it ja ko nb nl pl pt pt_PT ro ru sk sl sr sv th tr uk vi zh zh_Hant" paths=""/> 1090 <!-- all other items --> 1091 <approvalRequirement votes="4" locales="*" paths=""/><br></approvalRequirements> </pre> 1092 <p>This section specifies that a TC vote (20 votes) is required 1093 for decimal and grouping separators. Furthermore it specifies 1094 that any field in the established locales list (i.e. ar, ca, 1095 cs, etc.) requires 8 votes, and that all other locales require 1096 4 votes only.</p> 1097 <p>For more information on the CLDR Voting process, See 1098 <a href="http://cldr.unicode.org/index/process">http://cldr.unicode.org/index/process</a></p> 1099 <h3>8.1 <a name="Coverage_Level_Definitions" href= 1100 "#Coverage_Level_Definitions" id= 1101 "Coverage_Level_Definitions">Definitions</a></h3> 1102 <ul> 1103 <li><i>Target-Language</i> is the language under 1104 consideration.</li> 1105 <li><i>Target-Territories</i> is the list of territories 1106 found by looking up <i>Target-Language</i> in the 1107 <languageData> elements in <a href= 1108 "tr35-info.html#Supplemental_Language_Data">Supplemental 1109 Language Data</a>.</li> 1110 <li> 1111 <i>Language-List</i> is <i>Target-Language</i>, plus 1112 <ul> 1113 <li><b>basic:</b> Chinese, English, French, German, 1114 Italian, Japanese, Portuguese, Russian, Spanish, Unknown 1115 (de, en, es, fr, it, ja, pt, ru, zh, und</li> 1116 <li><b>moderate:</b> basic + Arabic, Hindi, Korean, 1117 Indonesian, Dutch, Bengali, Turkish, Thai, Polish (ar, 1118 hi, ko, in, nl, bn, tr, th, pl). If an EU language, add 1119 the remaining official EU languages, currently: Danish, 1120 Greek, Finnish, Swedish, Czech, Estonian, Latvian, 1121 Lithuanian, Hungarian, Maltese, Slovak, Slovene (da, el, 1122 fi, sv, cs, et, lv, lt, hu, mt, sk, sl)</li> 1123 <li><b>modern:</b> all languages that are official or 1124 major commercial languages of modern territories</li> 1125 </ul> 1126 </li> 1127 <li><i>Target-Scripts</i> is the list of scripts in which 1128 <i>Target-Language</i> can be customarily written (found by 1129 looking up <i>Target-Language</i> in the <languageData> 1130 elements in <a href= 1131 "tr35-info.html#Supplemental_Language_Data">Supplemental 1132 Language Data</a>.)<i>,</i> plus Unknown (Zzzz)<i>.</i></li> 1133 <li> 1134 <i>Script-List</i> is the <i>Target-Scripts</i> plus the 1135 major scripts used for multiple languages 1136 <ul> 1137 <li>Latin, Simplified Chinese, Traditional Chinese, 1138 Cyrillic, Arabic (Latn, Hans, Hant, Cyrl, Arab)</li> 1139 </ul> 1140 </li> 1141 <li> 1142 <i>Territory-List</i> is the list of territories formed by 1143 taking the <i>Target-Territories</i> and adding: 1144 <ul> 1145 <li><b>basic:</b> Brazil, China, France, Germany, India, 1146 Italy, Japan, Russia, United Kingdom, United States, 1147 Unknown (BR, CN, DE, GB, FR, IN, IT, JP, RU, US, ZZ)</li> 1148 <li><b>moderate:</b> basic + Spain, Canada, Korea, 1149 Mexico, Australia, Netherlands, Switzerland, Belgium, 1150 Sweden, Turkey, Austria, Indonesia, Saudi Arabia, Norway, 1151 Denmark, Poland, South Africa, Greece, Finland, Ireland, 1152 Portugal, Thailand, Hong Kong SAR China, Taiwan (ES, BE, 1153 SE, TR, AT, ID, SA, NO, DK, PL, ZA, GR, FI, IE, PT, TH, 1154 HK, TW). If an EU language, add the remaining member EU 1155 countries: Luxembourg, Czech Republic, Hungary, Estonia, 1156 Lithuania, Latvia, Slovenia, Slovakia, Malta (LU, CZ, HU, 1157 ES, LT, LV, SI, SK, MT).</li> 1158 <li><b>modern:</b> all current ISO 3166 territories, plus 1159 the UN M.49 [<a href="tr35.html#UNM49">UNM49</a>] regions 1160 in <a href= 1161 "tr35-info.html#Supplemental_Territory_Containment">Supplemental 1162 Territory Containment</a>.</li> 1163 </ul> 1164 </li> 1165 <li><i>Currency-List</i> is the list of current official 1166 currencies used in any of the territories in 1167 <i>Territory-List</i>, found by looking at the region 1168 elements in <a href= 1169 "tr35-info.html#Supplemental_Territory_Containment">Supplemental 1170 Territory Containment</a>, plus Unknown (XXX).</li> 1171 <li><i>Calendar-List</i> is the set of calendars in customary 1172 use in any of <i>Target-Territories</i>, plus Gregorian.</li> 1173 <li><em>Number-System-List</em> is the set of number systems 1174 in customary use in the language.</li> 1175 </ul> 1176 <h3>8.2 <a name="Coverage_Level_Data_Requirements" href= 1177 "#Coverage_Level_Data_Requirements" id= 1178 "Coverage_Level_Data_Requirements">Data Requirements</a></h3> 1179 <p>The required data to qualify for the level is then the 1180 following.</p> 1181 <ol> 1182 <li>localeDisplayNames 1183 <ol> 1184 <li><i>languages:</i> localized names for all languages 1185 in <i>Language-List.</i></li> 1186 <li><i>scripts:</i> localized names for all scripts in 1187 <i>Script-List</i>.</li> 1188 <li><i>territories:</i> localized names for all 1189 territories in <i>Territory-List</i>.</li> 1190 <li><i>variants, keys, types:</i> localized names for any 1191 in use in <i>Target-Territories</i>; for example, a 1192 translation for PHONEBOOK in a German locale.</li> 1193 </ol> 1194 </li> 1195 <li>dates: all of the following for each calendar in 1196 <i>Calendar-List</i>. 1197 <ol> 1198 <li>calendars: localized names</li> 1199 <li>month names, day names, era names, and quarter names 1200 <ul> 1201 <li>context=format and width=narrow, wide, & 1202 abbreviated</li> 1203 <li>plus context=standAlone and width=narrow, wide, 1204 & abbreviated, <i>if the grammatical forms of 1205 these are different than for context=format.</i></li> 1206 </ul> 1207 </li> 1208 <li>week: minDays, firstDay, weekendStart, weekendEnd 1209 <ul> 1210 <li>if some of these vary in territories in 1211 <i>Territory-List</i>, include territory locales for 1212 those that do.</li> 1213 </ul> 1214 </li> 1215 <li>am, pm, eraNames, eraAbbr</li> 1216 <li>dateFormat, timeFormat: full, long, medium, 1217 short</li> 1218 <li> 1219 <p>intervalFormatFallback</p> 1220 </li> 1221 </ol> 1222 </li> 1223 <li>numbers: symbols, decimalFormats, scientificFormats, 1224 percentFormats, currencyFormats for each number system in 1225 <em>Number-System-List</em>.</li> 1226 <li>currencies: displayNames and symbol for all currencies in 1227 <i>Currency-List</i>, for all plural forms</li> 1228 <li>transforms: (moderate and above) transliteration between 1229 Latin and each other script in <i>Target-Scripts.</i></li> 1230 </ol> 1231 <h3>8.3 <a name="Coverage_Level_Default_Values" href= 1232 "#Coverage_Level_Default_Values" id= 1233 "Coverage_Level_Default_Values">Default Values</a></h3> 1234 <p>Items should <i>only</i> be included if they are not the 1235 same as the default, which is:</p> 1236 <ul> 1237 <li>what is in root, if there is something defined 1238 there.</li> 1239 <li>for timezone IDs: the name computed according to 1240 <i><a href="tr35.html#Time_Zone_Fallback">Appendix J: Time 1241 Zone Display Names</a></i></li> 1242 <li>for collation sequence, the UCA DUCET (Default Unicode 1243 Collation Element Table), as modified by CLDR. 1244 <ul> 1245 <li>however, in that case the locale must be added to the 1246 validSubLocale list in <a href= 1247 "https://github.com/unicode-org/cldr/blob/master/common/collation/root.xml">collation/root.xml</a>.</li> 1248 </ul> 1249 </li> 1250 <li>for currency symbol, language, territory, script names, 1251 variants, keys, types, the internal code identifiers, for 1252 example, 1253 <ul> 1254 <li>currencies: EUR, USD, JPY, ...</li> 1255 <li>languages: en, ja, ru, ...</li> 1256 <li>territories: GB, JP, FR, ...</li> 1257 <li>scripts: Latn, Thai, ...</li> 1258 <li>variants: PHONEBOOK,...</li> 1259 </ul> 1260 </li> 1261 </ul><!-- end section 8 --> 1262 <!-- begin section 9 supplemental metadata --> 1263 <h2>9 <a name="Appendix_Supplemental_Metadata" href= 1264 "#Appendix_Supplemental_Metadata" id= 1265 "Appendix_Supplemental_Metadata">Supplemental Metadata</a></h2> 1266 <p>Note that this section discusses the 1267 <code><metadata></code> element within the 1268 <code><supplementalData></code> element. For the 1269 per-locale metadata used in tests and the Survey Tool, see 1270 <a href="#Metadata_Elements">10: Locale Metadata 1271 Element</a>.</p> 1272 <p>The supplemental metadata contains information about the 1273 CLDR file itself, used to test validity and provide information 1274 for locale inheritance. A number of these elements are 1275 described in</p> 1276 <ul class="toc"> 1277 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix 1278 I: <a href="tr35.html#Inheritance_and_Validity">Inheritance 1279 and Validity</a></li> 1280 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix 1281 K: <a href="tr35.html#Valid_Attribute_Values">Valid Attribute 1282 Values</a></li> 1283 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix 1284 L: <a href="tr35.html#Canonical_Form">Canonical Form</a></li> 1285 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Appendix 1286 M: <a href="#Coverage_Levels">Coverage Levels</a></li> 1287 </ul> 1288 <h3>9.1 <a name="Supplemental_Alias_Information" href= 1289 "#Supplemental_Alias_Information" id= 1290 "Supplemental_Alias_Information">Supplemental Alias 1291 Information</a></h3> 1292 <p class="dtd"><!ELEMENT alias 1293 (languageAlias*,scriptAlias*,territoryAlias*,subdivisionAlias*,variantAlias*,zoneAlias*) 1294 ><br> 1295 <br> 1296 <em>The following are common attributes for subelements of 1297 <alias>:</em><br> 1298 <!ELEMENT *Alias EMPTY ><br> 1299 <!ATTLIST *Alias type NMTOKEN #IMPLIED ><br> 1300 <!ATTLIST *Alias replacement NMTOKEN #IMPLIED ><br> 1301 <!ATTLIST *Alias reason ( deprecated | overlong ) 1302 #IMPLIED><br> 1303 <br> 1304 <em>The languageAlias has additional reasons</em><br> 1305 <!ATTLIST languageAlias reason ( deprecated | overlong | 1306 macrolanguage | legacy | bibliographic ) #IMPLIED></p> 1307 <p>This element provides information as to parts of locale IDs 1308 that should be substituted when accessing CLDR data. This 1309 logical substitution should be done to both the locale id, and 1310 to any lookup for display names of languages, territories, and 1311 so on. The replacement for the language and territory types is 1312 more complicated: see <em>Part 1: <a href= 1313 "tr35.html#Contents">Core</a>, Section 3.3.1 <a href= 1314 "tr35.html#BCP_47_Language_Tag_Conversion">BCP 47 Language Tag 1315 Conversion</a></em> for details.</p> 1316 <pre><alias> 1317 <languageAlias type="in" replacement="id"> 1318 <languageAlias type="sh" replacement="sr"> 1319 <languageAlias type="sh_YU" replacement="sr_Latn_YU"> 1320... 1321 <territoryAlias type="BU" replacement="MM"> 1322... 1323</alias></pre> 1324 <p>Attribute values for the *Alias values include the 1325 following:</p> 1326 <table> 1327 <caption> 1328 <a name="Alias_Attribute_Values" href= 1329 "#Alias_Attribute_Values" id="Alias_Attribute_Values">Alias 1330 Attribute Values</a> 1331 </caption> 1332 <tr> 1333 <th scope="col">Attribute</th> 1334 <th scope="col">Value</th> 1335 <th scope="col">Description</th> 1336 </tr> 1337 <tr> 1338 <td>type</td> 1339 <td>NMTOKEN</td> 1340 <td>The code to be replaced</td> 1341 </tr> 1342 <tr> 1343 <td>replacement</td> 1344 <td>NMTOKEN</td> 1345 <td>The code(s) to replace it, space-delimited.</td> 1346 </tr> 1347 <tr> 1348 <td rowspan="5">reason</td> 1349 <td>deprecated</td> 1350 <td>The code in type is deprecated, such as 'iw' by 'he', 1351 or 'CS' by 'RS ME'.</td> 1352 </tr> 1353 <tr> 1354 <td>overlong</td> 1355 <td>The code in type is too long, such as 'eng' by 'en' or 1356 'USA' or '840' by 'US'</td> 1357 </tr> 1358 <tr> 1359 <td>macrolanguage</td> 1360 <td>The code in type is an encompassed languagethat is 1361 replaced by a macrolanguage, such as '<a href= 1362 "https://www.sil.org/iso639-3/documentation.asp?id=arb">arb'</a> 1363 by 'ar'.</td> 1364 </tr> 1365 <tr> 1366 <td>legacy</td> 1367 <td>The code in type is a legacy code that is replaced by 1368 another code for compatiblity with established legacy 1369 usage, such as 'sh' by 'sr_Latn'</td> 1370 </tr> 1371 <tr> 1372 <td>bibliographic</td> 1373 <td>The code in type is a <a href= 1374 "https://www.loc.gov/standards/iso639-2/langhome.html">bibliographic 1375 code</a>, which is replaced by a terminology code, such as 1376 'alb' by 'sq'.</td> 1377 </tr> 1378 </table> 1379 <h3>9.2 <a name="Supplemental_Deprecated_Information" href= 1380 "#Supplemental_Deprecated_Information" id= 1381 "Supplemental_Deprecated_Information">Supplemental Deprecated 1382 Information (Deprecated)</a></h3> 1383 <pre class="dtd"> 1384 <!ELEMENT deprecated ( deprecatedItems* ) > 1385<!ATTLIST deprecated draft ( approved | contributed | provisional | unconfirmed | true | false ) #IMPLIED > <!-- true and false are deprecated. --> 1386 1387<!ELEMENT deprecatedItems EMPTY > 1388<!ATTLIST deprecatedItems type ( standard | supplemental | ldml | supplementalData | ldmlBCP47 ) #IMPLIED > <!-- standard | supplemental are deprecated --> 1389<!ATTLIST deprecatedItems elements NMTOKENS #IMPLIED > 1390<!ATTLIST deprecatedItems attributes NMTOKENS #IMPLIED > 1391<!ATTLIST deprecatedItems values CDATA #IMPLIED ></pre> 1392 <p>The deprecated items element was used to indicate elements, 1393 attributes, and attribute values that are deprecated. This 1394 means that the items are valid, but that their usage is 1395 strongly discouraged. This element and its subelements have 1396 been deprecated in favor of <a href= 1397 "tr35.html#DTD_Annotations">DTD Annotations</a>.</p> 1398 <p>Where particular values are deprecated (such as territory 1399 codes like SU for Soviet Union), the names for such codes may 1400 be removed from the common/main translated data after some 1401 period of time. However, typically supplemental information for 1402 deprecated codes is retained, such as containment, likely 1403 subtags, older currency codes usage, etc. The English name may 1404 also be retained, for debugging purposes.</p> 1405 <h3>9.3 <a name="Default_Content" href="#Default_Content" id= 1406 "Default_Content">Default Content</a></h3> 1407 <pre class="dtd"><!ELEMENT defaultContent EMPTY > 1408 <!ATTLIST defaultContent locales NMTOKENS #IMPLIED ></pre> 1409 <p>In CLDR, locales without territory information (or where 1410 needed, script information) provide data appropriate for what 1411 is called the <i>default content locale</i>. For example, the 1412 <i>en</i> locale contains data appropriate for <i>en-US</i>, 1413 while the <i>zh</i> locale contains content for 1414 <i>zh-Hans-CN</i>, and the <i>zh-Hant</i> locale contains 1415 content for <i>zh-Hant-TW</i>. The default content locales 1416 themselves thus inherit all of their contents, and are 1417 empty.</p> 1418 <p>The choice of content is typically based on the largest 1419 literate population of the possible choices. Thus if an 1420 implementation only provides the base language (such as 1421 <i>en</i>), it will still get a complete and consistent set of 1422 data appropriate for a locale which is reasonably likely to be 1423 the one meant. Where other information is available, such as 1424 independent country information, that information can always be 1425 used to pick a different locale (such as <i>en-CA</i> for a 1426 website targeted at Canadian users).</p> 1427 <p>If an implementation is to use a different default locale, 1428 then the data needs to be <i>pivoted</i>; all of the data from 1429 the CLDR for the current default locale pushed out to the 1430 locales that inherit from it, then the new default content 1431 locale's data moved into the base. There are tools in CLDR to 1432 perform this operation.</p> 1433 <p>For the relationship between <span>Inheritance, 1434 DefaultContent, LikelySubtags, and LocaleMatching, see 1435 <strong><em>Section 4.2.6 <a href= 1436 "tr35.html#Inheritance_vs_Related">Inheritance vs Related 1437 Information</a></em></strong>.</span></p> 1438 <!-- end section 9 supp metadata --> 1439 <!-- begin section 10 the metadata element --> 1440 <h2>10 <a name="Metadata_Elements" href="#Metadata_Elements" 1441 id="Metadata_Elements">Locale Metadata 1442 Element<strike>s</strike></a></h2> 1443 <p>Note: This section refers to the per-locale 1444 <code><metadata></code> element, containing metadata 1445 about a particular locale. This is in contrast to the <a href= 1446 "#Appendix_Supplemental_Metadata"><em>Supplemental</em> 1447 Metadata</a>, which is in the supplemental tree and is not 1448 specific to a locale.</p> 1449 <p class="dtd"><!ELEMENT metadata ( alias | ( casingData?, 1450 special* ) ) ><br> 1451 <!ELEMENT casingData ( alias | ( casingItem*, special* ) ) 1452 ><br> 1453 <!ELEMENT casingItem ( #PCDATA ) ><br> 1454 <!ATTLIST casingItem type CDATA #REQUIRED ><br> 1455 <!ATTLIST casingItem override (true | false) #IMPLIED 1456 ><br> 1457 <!ATTLIST casingItem forceError (true | false) #IMPLIED 1458 ><br></p> 1459 <p>The <metadata> element contains metadata about the 1460 locale for use by the Survey Tool or other tools in checking 1461 locale data; this data is not intended for export as part of 1462 the locale itself.</p> 1463 <p>The <casingItem> element specifies the capitalization 1464 intended for the majority of the data in a given category with 1465 the locale. The purpose is so that warnings can be issued to 1466 translators that anything deviating from that capitalization 1467 should be carefully reviewed. Its type attribute has one of the 1468 values used for the <contextTransformUsage> element 1469 above, with the exception of the special value "all"; its value 1470 is one of the following:</p> 1471 <ul> 1472 <li>lowercase</li> 1473 <li>titlecase</li> 1474 </ul> 1475 <p>The <casingItem> data is generated by a tool based on 1476 the data available in CLDR. In cases where the generated casing 1477 information is incorrect and needs to be manually edited, the 1478 override attribute is set to "true" so that the tool will not 1479 override the manual edits. When the casing information is known 1480 to be both correct and something that should apply to all 1481 elements of the specified type in a given locale, the forceErr 1482 attribute may be set to "true" to force an error instead of a 1483 warning for items that do not match the casing information.</p> 1484 <!-- end section Info-A metadta element --> 1485 <!-- begin section 11 Version Information --> 1486 <h2>11 <a name="Version_Information" href= 1487 "#Version_Information" id="Version_Information">Version 1488 Information</a></h2> 1489 <p class="dtd"><!ELEMENT version EMPTY ><br> 1490 <!ATTLIST version cldrVersion CDATA #FIXED "27" ><br> 1491 <!ATTLIST version unicodeVersion CDATA #FIXED "7.0.0" 1492 ><br></p> 1493 <p>The <cldrVersion> attribute defines the CLDR version 1494 for this data, as published on <a href= 1495 "http://cldr.unicode.org/index/downloads">CLDR 1496 Releases/Downloads</a></p> 1497 <p>The <unicodeVersion> attribute defines the version of 1498 the Unicode standard that is used to interpret data. 1499 Specifically, some data elements such as exemplar characters 1500 are expressed in terms of UnicodeSets. Since UnicodeSets can be 1501 expressed in terms of Unicode properties, their meaning depend 1502 on the Unicode version from which property values are 1503 derived.</p><!-- end section Version Information metadta element --> 1504 <h2>12 <a name="Parent_Locales" href="#Parent_Locales" id= 1505 "Parent_Locales">Parent Locales</a></h2> 1506 <p>The parentLocales data is supplemental data, but is 1507 described in detail in the <a href= 1508 "tr35.html#Parent_Locales">core specification section 1509 4.1.3.</a></p> 1510 <h2>13 <a href="#Unit_Conversion" name="Unit_Conversion">Unit Conversion</a></h2> 1511 1512 1513<p> 1514The unit conversion data (<a href="https://github.com/unicode-org/cldr/blob/master/common/supplemental/units.xml">units.xml</a>) provides the data for converting all of the cldr unit identifiers to base units, and back. That allows conversion between any two convertible units, such as two units of length. For any two convertible units (such as acre and dunum) the first can be converted to the base unit (square-meter), then that base unit can be converted to the second unit. 1515</p> 1516<p class="dtd"> 1517<!ELEMENT unitConstants ( unitConstant* ) > 1518</p> 1519<p class="dtd"> 1520<!ELEMENT unitConstant EMPTY > 1521</p> 1522<p class="dtd"> 1523<!ATTLIST unitConstant constant NMTOKEN #REQUIRED > 1524</p> 1525<p class="dtd"> 1526<!ATTLIST unitConstant value CDATA #REQUIRED > 1527</p> 1528<p class="dtd"> 1529<!ATTLIST unitConstant status NMTOKEN #IMPLIED > 1530</p> 1531<h2>Constants</h2> 1532 1533 1534<p> 1535The data uses a small set of constants for readability, such as: 1536</p> 1537 <blockquote> 1538<p> 1539<unitConstant constant=<em>"ft_to_m"</em> value=<em>"0.3048"</em>/> 1540</p> 1541<p> 1542<unitConstant constant=<em>"ft2_to_m2"</em> value=<em>"ft_to_m*ft_to_m"</em>/> 1543</p> 1544</blockquote> 1545<p> 1546The order of the elements in the file is significant. 1547</p> 1548<p> 1549 1550</p> 1551<p> 1552Each constant can have a value based on simple expressions using numbers, previous constants, plus the operators * and /. Parentheses are not allowed. The operator * binds more tightly than /, which may be unexpected. Thus a * b / c * d is interpreted as (a * b) / (c * d). A consequent of that is that a * b / c * d = a * b / c / d. In the value, the numbers represent rational values. So 0.3048 is interpreted as exactly 3048 / 10000. 1553</p> 1554<p> 1555In the above case, ft2-to-m2 is a conversion constant for going from square feet to square meters. The expression evaluates to 0.09290304. Where the constants cannot be expressed as rationals, or where their interpretation is fluid, that is marked with a status value: 1556</p> 1557<blockquote> 1558<unitConstant constant=<em>"PI"</em> value=<em>"411557987 / 131002976"</em> status=<em>'approximate'</em>/> 1559</blockquote> 1560<p> 1561In such cases, software may decide to use different values for accuracy. 1562</p> 1563<p> 1564An implementation need not use rationals directly for conversion; it could use doubles, for example, if only double accuracy is needed. 1565</p> 1566<h2>Conversion Data</h2> 1567 1568 1569<p class="dtd"> 1570<!ELEMENT convertUnits ( convertUnit* ) > 1571</p> 1572<p class="dtd"> 1573<!ELEMENT convertUnit EMPTY > 1574</p> 1575<p class="dtd"> 1576<!ATTLIST convertUnit source NMTOKEN #REQUIRED > 1577</p> 1578<p class="dtd"> 1579<!ATTLIST convertUnit baseUnit NMTOKEN #REQUIRED > 1580</p> 1581<p class="dtd"> 1582<!ATTLIST convertUnit factor CDATA #IMPLIED > 1583</p> 1584<p class="dtd"> 1585<!ATTLIST convertUnit offset CDATA #IMPLIED > 1586</p> 1587<p> 1588The conversion data provides the data for converting all of the cldr unit identifiers to base units, and back. That allows conversion between any two convertible units, such as two units of length. For any two convertible units (such as acre and dunum) the first can be converted to the base unit (square-meter), then that base unit can be converted to the second unit. 1589</p> 1590<p> 1591The data is expressed as conversions to the base unit. The information can also be used for the conversion back. 1592</p> 1593<p> 1594Examples: 1595</p> 1596 <blockquote> 1597<p> 1598<convertUnit source=<em>'carat'</em> baseUnit=<em>'kilogram'</em> factor=<em>'0.0002'</em>/> 1599</p> 1600<p> 1601<convertUnit source=<em>'gram'</em> baseUnit=<em>'kilogram'</em> factor=<em>'0.001'</em>/> 1602</p> 1603<p> 1604<convertUnit source=<em>'ounce'</em> baseUnit=<em>'kilogram'</em> factor=<em>'lb_to_kg/16'</em> systems=<em>"ussystem uksystem"</em>/> 1605</p> 1606<p> 1607<convertUnit source=<em>'fahrenheit'</em> baseUnit=<em>'kelvin'</em> factor=<em>'5/9'</em> offset=<em>'2298.35/9'</em> systems=<em>"ussystem uksystem"</em>/> 1608</p> 1609</blockquote> 1610<p> 1611For example, to convert from 3 carats to kilograms, the factor 0.0002 is used, resulting in 0.0006. To convert between carats and ounces, first the carets are converted to kilograms, then the kilograms to ounces (by reversing the mapping). 1612</p> 1613<p> 1614The factor and offset use the same structure as in the value in unitConstant ; in particular, * binds more tightly than /. 1615</p> 1616<p> 1617The conversion may also require an offset, such as the following: 1618</p> 1619<blockquote> 1620<convertUnit source=<em>'fahrenheit'</em> baseUnit=<em>'kelvin'</em> factor=<em>'5/9'</em> offset=<em>'2298.35/9'</em> systems=<em>"ussystem uksystem"</em>/> 1621</blockquote> 1622<p> 1623The factor and offset can be simple expressions, just like the values in the unitConstants. 1624</p> 1625<p> 1626Where a factor is not present, the value is 1; where an offset is not present, the value is 0. The systems attribute indicates where the value is not metric; currently the attribute values just include the <em>ussystem</em> and <em>uksystem</em> systems. The term <em>metric</em> is used in a broad sense, and includes units that are simple multiples of metric units, such as pound-metric (= ½ kilogram). 1627</p> 1628<p> 1629For complex units, such as <em>pound-force-per-square-inch</em>, the conversions are computed by combining the conversions of each of the simple units: <em>pound-force</em> and <em>inch</em>. Because the conversions in convertUnit are reversible, the computation can go from complex source unit to complex base unit to complex target units. 1630</p> 1631<p> 1632Here is an example: 1633</p> 1634 <blockquote> 1635<p><strong> 163650 foot-per-minute ⟹ X mile-per-hour</strong> </p> 1637<p> 1638 ⟹ source: 1 foot 1639</p> 1640<p> 1641 ⟹ factor: 381 / 1250 = 0.3048 meter 1642</p> 1643<p> 1644 ⟹ source: 1 minute 1645</p> 1646<p> 1647 ⟹ factor: 60 second 1648</p> 1649<p> 1650 ⟹ intermediate: 127 / 500 = 0.254 meter-per-second 1651</p> 1652<p> 1653 ⟹ mile-per-hour 1654</p> 1655<p> 1656 ⟹ source: 1 mile 1657</p> 1658<p> 1659 ⟹ factor: 201168 / 125 = 1609.344 meter 1660</p> 1661<p> 1662 ⟹ source: 1 hour 1663</p> 1664<p> 1665 ⟹ factor: 3600 second 1666</p> 1667<p> 1668 ⟹ target: 25 / 44 ≅ 0.5681818 mile-per-hour 1669</p> 1670</blockquote> 1671<p> 1672<strong>Reciprocals. </strong>When you convert a complex unit to another complex unit, you typically convert the source to a complex base unit (like <em>meter-per-cubic-meter</em>), then convert the latter backwards to the desired target. However, there may not be a matching conversion from that complex base unit to the desired target unit. That is the case for converting from <em>mile-per-gallon</em> (used in the US) to <em>liter-per-100-kilometer</em> (used in Europe and elsewhere). When that happens, the reciprocal of the complex base unit is used, as in the following example: 1673</p> 1674 <blockquote> 1675<p><strong> 167650 mile-per-gallon ⟹ X liter-per-100-kilometer 1677</strong></p> 1678<p> 1679 ⟹ source: 1 mile 1680</p> 1681<p> 1682 ⟹ factor: 201168 / 125 = 1609.344 meter 1683</p> 1684<p> 1685 ⟹ source: 1 gallon 1686</p> 1687<p> 1688 ⟹ factor: 473176473 / 125000000000 ≅ 0.003785412 cubic-meter 1689</p> 1690<p> 1691 ⟹ intermediate: 2400000000000 / 112903 ≅ 2.125719E7 meter-per-cubic-meter 1692</p> 1693<p> 1694 ⟹ liter-per-100-kilometer 1695</p> 1696<p> 1697 ⟹ source: 1 liter 1698</p> 1699<p> 1700 ⟹ factor: 1 / 1000 = 0.001 cubic-meter 1701</p> 1702<p> 1703 ⟹ source: 1 100-kilometer 1704</p> 1705<p> 1706 ⟹ factor: 100000 meter 1707</p> 1708<p> 1709<strong> ⟹ 1/intermediate: 112903 / 2400000000000 ≅ 4.704292E-8 cubic-meter-per-meter</strong> 1710</p> 1711<p> 1712 ⟹ target: 112903 / 24000 ≅ 4.704292 liter-per-100-kilometer 1713</p> 1714</blockquote> 1715<p> 1716This applies to more than just these cases: one can convert from any unit to related reciprocals as in the following example: 1717</p> 1718 <blockquote> 1719<p><strong> 172050 foot-per-minute ⟹ X hour-per-mile</strong> </p> 1721<p> 1722 ⟹ source: 1 foot 1723</p> 1724<p> 1725 ⟹ factor: 381 / 1250 = 0.3048 meter 1726</p> 1727<p> 1728 ⟹ source: 1 minute 1729</p> 1730<p> 1731 ⟹ factor: 60 second 1732</p> 1733<p> 1734 ⟹ intermediate: 127 / 500 = 0.254 meter-per-second 1735</p> 1736<p> 1737 ⟹ hour-per-mile 1738</p> 1739<p> 1740 ⟹ source: 1 hour 1741</p> 1742<p> 1743 ⟹ factor: 3600 second 1744</p> 1745<p> 1746 ⟹ source: 1 mile 1747</p> 1748<p> 1749 ⟹ factor: 201168 / 125 = 1609.344 meter 1750</p> 1751<p> 1752<strong> ⟹ 1/intermediate: 500 / 127 ≅ 3.937008 second-per-meter</strong> 1753</p> 1754<p> 1755 ⟹ target: 44 / 25 = 1.76 hour-per-mile 1756</p> 1757</blockquote> 1758<h3>Exceptional Cases</h3> 1759 1760 1761<h4>Identities</h4> 1762 1763 1764<p> 1765For completeness, identity mappings are also provided for the base units themselves, such as: 1766</p> 1767<blockquote> 1768 <convertUnit source=<em>'meter'</em> baseUnit=<em>'meter'</em>/> 1769</blockquote> 1770<h4>Aliases</h4> 1771 1772<p> 1773In a few instances the old identifiers are deprecated in favor of regular syntax. Implementations should handle both on input: 1774</p> 1775<blockquote> 1776<p> 1777<unitAlias type=<em>"meter-per-second-squared"</em> replacement=<em>"meter-per-square-second"</em> reason=<em>"deprecated"</em>/> 1778</p> 1779<p> 1780<unitAlias type=<em>"liter-per-100kilometers"</em> replacement=<em>"liter-per-100-kilometer"</em> reason=<em>"deprecated"</em>/> 1781</p> 1782<p> 1783<unitAlias type=<em>"pound-foot"</em> replacement=<em>"pound-force-foot"</em> reason=<em>"deprecated"</em>/> 1784</p> 1785<p> 1786<unitAlias type=<em>"pound-per-square-inch"</em> replacement=<em>"pound-force-per-square-inch"</em> reason=<em>"deprecated"</em>/> 1787</p> 1788</blockquote> 1789<p> 1790These use the standard alias elements in XML, and are also included in the <a href="https://github.com/unicode-org/cldr/blob/master/common/supplemental/units.xml">units.xml</a> file. 1791</p> 1792<h4>“Duplicate” Units</h4> 1793 1794 1795<p> 1796Some CLDR units are provided simply because they have different names in some languages. For example, year and year-person, or foodcalorie and kilocalorie. One CLDR unit is not convertible (temperature-generic) it is only used for the translation (where the exact unit would be understood by context). 1797</p> 1798<h4>Discarding Offsets</h4> 1799 1800 1801<p> 1802The temperature units are special. When they represent a scale, they have an offset. But where they represent an amount, such as in complex units, they do not. So celsius-per-second is the same as kelvin-per-second. 1803</p> 1804<h3>Unresolved Units</h3> 1805 1806 1807<p> 1808Some SI units contain the same units in the numerator and denominator, so those cannot be resolved. For example, if cubic-meter-per-meter were always resolved, then <em>consumption</em> (like “liter-per-kilometer”) could not be distinguished from <em>area</em> (square-meter). 1809</p> 1810<p> 1811However, in conversion, it may be necessary to resolve them in order to find a match. For example, kilowatt-hour maps to the base unit kilogram-square-meter-second-per-cubic-second, but that needs to be resolved to kilogram-square-meter-per-square-second in order matched against an <em>energy.</em> 1812</p> 1813<h2>Quantities and Base Units</h2> 1814 1815 1816<p class="dtd"> 1817<!ELEMENT unitQuantities ( unitQuantity* ) > 1818</p> 1819<p class="dtd"> 1820<!ELEMENT unitQuantity EMPTY > 1821</p> 1822<p class="dtd"> 1823<!ATTLIST unitQuantity baseUnit NMTOKEN #REQUIRED > 1824</p> 1825<p class="dtd"> 1826<!ATTLIST unitQuantity quantity NMTOKENS #REQUIRED > 1827</p> 1828<p class="dtd"> 1829<!ATTLIST unitQuantity status NMTOKEN #IMPLIED > 1830</p> 1831<p> 1832Conversion is supported between comparable units. Those can be simple units, such as length, or more complex ‘derived’ units that are built up from <em>base units</em>. The <unitQuantities> element provides information on the base units used for conversion. It also supplies information about their <em>quantity</em>: mass, length, time, etc., and whether they are simple or not. </p> 1833<p>Examples: </p> 1834 <blockquote> 1835<p> 1836<unitQuantity baseUnit=<em>'kilogram'</em> quantity=<em>'mass'</em> status=<em>'simple'</em>/> 1837</p> 1838<p> 1839<unitQuantity baseUnit=<em>'meter-per-second'</em> quantity=<em>'speed'</em>/> 1840</p> 1841</blockquote> 1842<p> 1843The order of the elements in the file is significant, since it is used in 1844 1845<a href="#Unit_Identifier_Normalization" >Unit_Identifier_Normalization</a>. 1846<p> 1847 The quantity values themselves are informative. Therer mayreflecting that <em>force per area</em> can be referenced as either <em>pressure</em> or <em>stress</em>, for example). The quantity for a complex unit that has a reciprocal is formed by prepending “inverse-” to the quantity, such as <em>inverse-consumption.</em> 1848</p> 1849<p> 1850The base units for the quantities and the quantities themselves are based on <a href="https://www.nist.gov/pml/special-publication-811">NIST special publication 811</a> and the earlier <a href="https://www.govinfo.gov/content/pkg/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4/pdf/GOVPUB-C13-f10c2ff9e7af2091314396a2d53213e4.pdf">NIST Special Publication 1038</a>. In some cases, a different unit is chosen for the base. For example, a <em>revolution</em> (360°) is chosen for the base unit for angles instead of the SI <em>radian</em>, and <em>item</em> instead of the SI <em>mole</em>. Additional base units are added where necessary, such as <em>bit</em> and <em>pixel</em>. 1851</p> 1852<p> 1853This data is not necessary for conversion, but is needed for 1854 1855 <a href="#Unit_Identifier_Normalization" >Unit_Identifier_Normalization</a>. Some of the unitQuantity elements are not needed to convert CLDR units, but are included for completeness. Example: 1856</p> 1857 1858<blockquote> 1859 <unitQuantity baseUnit=<em>'ampere-per-square-meter'</em> quantity=<em>'current-density'</em>/> 1860</blockquote> 1861<h3>UnitType vs Quantity</h3> 1862 1863 1864<p> 1865The unitType (as in “length-meter”) is not the same as the quantity. It is often broader: for example, the unitType <em>electric</em> corresponds to the quantities <em>electric-current, electric-resistance, </em>and<em> voltage</em>. The unitType itself is also informative, and can be dropped from a long unit identifier to get a still-unique short unit identifier. 1866</p> 1867<h3><a href="#Unit_Identifier_Normalization" name="Unit_Identifier_Normalization">Unit Identifier Normalization</a></h3> 1868 1869 1870<p> 1871There are many possible ways to construct complex units. For comparison of unit identifiers, an implementation can normalize in the following way: 1872</p> 1873 <ol> 1874 1875<li>Convert all but the first -per- to simple multiplication. The result then has the format of /numerator ( -per- denominator)?/ <ul> 1876 1877 <li>foot-per-second-per-second ⇒ foot-per-second-second 1878</ul> 1879</li> 1880 <li>Within each of the numerator and denominator:</li> 1881 <li>Convert multiple instances of a unit into the appropriate power. 1882 <ul> 1883 <li>foot-per-second-second ⇒ foot-per-square-second 1884 </li> 1885 <li>kilogram-meter-kilogram ⇒ meter-square-kilogram 1886 </li> 1887 </ul> 1888 <li>For each single unit, disregarding prefixes and powers, find its base unit using <convertUnit>, then get the order of that base unit among the unitQuantity elements in the <a href="https://github.com/unicode-org/cldr/blob/master/common/supplemental/units.xml">units.xml</a>. Then sort the single units by that order. 1889 <ul> 1890 <li>meter-square-kilogram => square-kilogram-meter 1891 </li> 1892 <li>meter-square-gram ⇒ square-gram-meter 1893 </li> 1894 </ul> 1895 </li> 1896 <li>If two single units have the same simple unit but different SI prefixes, such as "kilometer-meter", sort the higher-power SI prefixes first. </li> 1897 <li>Within private-use single units, sort by the simple unit alphabetically.</li> 1898</ol> 1899 1900<p> 1901 The examples in #4 are due to the following ordering of the unitQuantity elements: 1902</p><ol> 1903 1904<li><unitQuantity baseUnit=<em>'candela'</em> quantity=<em>'luminous-intensity'</em> status=<em>'simple'</em>/> 1905<li><unitQuantity baseUnit=<em>'kilogram'</em> quantity=<em>'mass'</em> status=<em>'simple'</em>/> 1906<li><unitQuantity baseUnit=<em>'meter'</em> quantity=<em>'length'</em> status=<em>'simple'</em>/> 1907<li>…</li></ol> 1908 1909<h2>Mixed Units</h2> 1910 1911 1912<p> 1913Mixed units, or unit sequences, are units with the same base unit which are listed in sequence. Common examples are feet and inches, meters and centimeters, and hours, minutes, and seconds. Mixed unit identifiers are expressed using the "-and-" infix, as in "foot-and-inch", "meter-and-centimeter", and "hour-and-minute-and-second". 1914</p> 1915<p> 1916Scalar values for mixed units are expressed in the largest unit, according to the sort order discussed above in "Normalization". For example, numbers for "foot-and-inch" are expressed in feet. 1917</p> 1918<p> 1919Mixed units are expected to be rendered in the order of the tokens in the identifier. For example, the value 1.25 with the identifier "foot-and-inch" should be rendered as "1 foot and 3 inches" and 1.25 inch-and-foot should be rendered as “3 inches and 1 foot". <strong>NOTE: </strong>the correct application of this may require adding locales to the regions attribute set. 1920</p> 1921 1922<h2>Testing</h2> 1923 1924 1925<p> 1926The <a href="https://github.com/unicode-org/cldr/blob/master/common/testData/units/unitsTest.txt">unitsTest.txt</a> file supplies a list of all the CLDR units with conversions, for testing implementations. Instructions for use are supplied in the header of the file. 1927</p> 1928 <h2>14 <a href="#Unit_Preferences" name="Unit_Preferences">Unit Preferences</a></h2> 1929 1930 1931<p> 1932Different locales have different preferences for which unit or combination of units is used for a particular usage, such as measuring a person’s height. This is more fine-grained than merely a preference for metric versus US or UK measurement systems. For example, one locale may use meters alone, while another may use centimeters alone or a combination of meters and centimeters; a third may use inches alone, or (informally) a combination of feet and inches. 1933</p> 1934<p> 1935The CLDR data is intended to map from a particular usage — e.g. measuring the height of a person or the fuel consumption of an automobile — to the unit or combination of units typically used for that usage in a given region. Considerations for such a mapping include: 1936</p><ul> 1937 1938<li>The list of possible usages large and open-ended. The intent here is to start with a small set for which there is an urgent need, and expand as necessary. 1939<li>Even for a given usage such a measuring a road distance, there are multiple ranges in use. For example, one set of units may be used for indicating the distance to the next city (kilometers or miles), while another may be used for indicating the distance to the next exit (meters, yards, or feet). 1940<li>There are also differences between more formal usage (official signage, medical records) and more informal usage (conversation, texting). 1941<li>For some usages, the measurement may be expressed using a sequence of units, such as “1 meter, 78 centimeters” or “12 stone, 2 pounds”.</li></ul> 1942 1943<p> 1944The DTD structure is as follows: 1945</p> 1946<p class="dtd"> 1947<!ELEMENT unitPreferenceData ( unitPreferences* ) > 1948</p> 1949<p class="dtd"> 1950<!ELEMENT unitPreferences ( unitPreference* ) > 1951</p> 1952<p class="dtd"> 1953<!ATTLIST unitPreferences category NMTOKEN #REQUIRED > 1954</p> 1955<p class="dtd"> 1956<!ATTLIST unitPreferences usage NMTOKENS #REQUIRED > 1957</p> 1958<p class="dtd"> 1959<!ELEMENT unitPreference ( #PCDATA ) > 1960</p> 1961<p class="dtd"> 1962<!ATTLIST unitPreference regions NMTOKENS #REQUIRED > 1963</p> 1964<p class="dtd"> 1965<!ATTLIST unitPreference geq NMTOKEN #IMPLIED > 1966</p> 1967<p class="dtd"> 1968<!ATTLIST unitPreference skeleton CDATA #IMPLIED > 1969</p> 1970 1971<table> 1972 <tr> 1973 <td>category 1974 </td> 1975 <td>A unit quantity, such as “area” or “length”. See Section 13 Unit Conversion 1976 </td> 1977 </tr> 1978 <tr> 1979 <td>usage 1980 </td> 1981 <td>A type of usage, such as person-height. 1982 </td> 1983 </tr> 1984 <tr> 1985 <td>regions 1986 </td> 1987 <td>One or more region identifiers (macroregions or regions), subdivision identifiers, or language identifiers, such as 001, US, usca, and de-CH. 1988 </td> 1989 </tr> 1990 <tr> 1991 <td>geq 1992 </td> 1993 <td>A threshold value, in a unit determined by the unitPreference element value. The unitPreference element is only used for values higher than this value (and lower than any higher value). 1994<p> 1995The value must be non-negative. For picking negative units (-3 meters), use the absolute value to pick the unit. 1996 </td> 1997 </tr> 1998 <tr> 1999 <td>skeleton 2000 </td> 2001 <td>A skeleton in the ICU number format syntax, that can be used to format unit 2002 </td> 2003 </tr> 2004</table> 2005 2006<p><strong>Note:</strong> As of CLDR 37, the <unitPreference> geq attribute replaces 2007the now-deprecated <unitPreferences> scope attribute.</p> 2008 2009<p> 2010Example: 2011</p> 2012 <blockquote> 2013<p> 2014 <unitPreferences category=<em>"length"</em> usage=<em>"default"</em>> 2015</p> 2016 <blockquote> 2017<p> 2018 <unitPreference regions=<em>"001"</em>>kilometer</unitPreference> 2019</p> 2020<p> 2021 <unitPreference regions=<em>"001"</em>>meter</unitPreference> 2022</p> 2023<p> 2024 <unitPreference regions=<em>"001"</em>>centimeter</unitPreference> 2025</p> 2026<p> 2027 <unitPreference regions=<em>"US GB"</em>>mile</unitPreference> 2028</p> 2029<p> 2030 <unitPreference regions=<em>"US GB"</em>>foot</unitPreference> 2031</p> 2032<p> 2033 <unitPreference regions=<em>"US GB"</em>>inch</unitPreference> 2034</p> 2035 </blockquote> 2036<p> 2037 </unitPreferences> 2038</p> 2039</blockquote> 2040<p> 2041The above information says that for default usage, in the US people use mile, foot, and inch, where people in the rest of the world (001) use kilometer, meter, and centimeter. 2042Take another example:</p> 2043 <blockquote> 2044<p> 2045 <unitPreferences category=<em>"length"</em> usage=<em>"road"</em>> 2046</p> 2047 <blockquote> 2048<p> 2049 <unitPreference regions=<em>"001"</em> geq=<em>"0.9"</em>>kilometer</unitPreference> 2050</p> 2051<p> 2052 <unitPreference regions=<em>"001"</em> geq=<em>"300.0"</em> skeleton=<em>"precision-increment/50"</em>>meter</unitPreference> 2053</p> 2054<p> 2055 <unitPreference regions=<em>"001"</em> skeleton=<em>"precision-increment/10"</em>>meter</unitPreference> 2056</p> 2057<p> 2058 <unitPreference regions=<em>"001"</em>>meter</unitPreference> 2059</p> 2060<p> 2061 <unitPreference regions=<em>"US"</em> geq=<em>"0.5"</em>>mile</unitPreference> 2062</p> 2063<p> 2064 <unitPreference regions=<em>"US"</em> geq=<em>"100.0"</em> skeleton=<em>"precision-increment/50"</em>>foot</unitPreference> 2065</p> 2066<p> 2067 <unitPreference regions=<em>"US"</em> skeleton=<em>"precision-increment/10"</em>>foot</unitPreference> 2068</p> 2069<p> 2070 <unitPreference regions=<em>"GB"</em> geq=<em>"0.5"</em>>mile</unitPreference> 2071</p> 2072<p> 2073 <unitPreference regions=<em>"GB"</em> geq=<em>"100.0"</em> skeleton=<em>"precision-increment/50"</em>>yard</unitPreference> 2074</p> 2075<p> 2076 <unitPreference regions=<em>"GB"</em>>yard</unitPreference> 2077</p> 2078<p> 2079 <unitPreference regions=<em>"SE"</em> geq=<em>"0.1"</em>>mile-<span style="text-decoration:underline;">scandinavian</span></unitPreference> 2080</p> 2081 </blockquote> 2082<p> 2083 </unitPreferences> 2084</p> 2085</blockquote> 2086<p> 2087The intended usage is to take the measure to be formatted, and the desired category, usage, and region and find the best match as follows. 2088</p> 2089<ul> 2090 2091<li>First, see if there is an exact match, producing a list of one or more unitPreference elements. For example, length/road/GB has a match above, giving 2092<blockquote> 2093<p> 2094 <unitPreference regions=<em>"GB"</em> geq=<em>"0.5"</em>>mile</unitPreference> 2095</p> 2096<p> 2097 <unitPreference regions=<em>"GB"</em> geq=<em>"100.0"</em> skeleton=<em>"precision-increment/50"</em>>yard</unitPreference> 2098</p> 2099<p> 2100 <unitPreference regions=<em>"GB"</em>>yard</unitPreference> 2101</p> 2102 </blockquote> 2103</li> 2104 <li>If there is no match for the category, then the data is not available.</li> 2105 <li>Otherwise, given the category: <ul> 2106 2107 <li>If there is an exact match for the usage, but not for the region, try region=”001”.</li></ul> 2108 2109 <li>The specification allows for <a href="https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html">containment regions</a> , <a href="https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_subdivisions.html">region subdivisions</a>. 2110 <li>While in version 37 only 001 is used, in the future the data may contain others. 2111 <li>The fallback is: subdivision2 ⇒ subdivision1 ⇒ region/country ⇒ subcontinent ⇒ continent ⇒ world 2112 <li>Example: 2113<blockquote> 2114<table> 2115 <tr> 2116 <td> 2117<strong>Region/subdivision</strong> 2118 </td> 2119 <td><strong>Code</strong> 2120 </td> 2121 </tr> 2122 <tr> 2123 <td>Blackpool 2124 </td> 2125 <td>gbbpl 2126 </td> 2127 </tr> 2128 <tr> 2129 <td>England 2130 </td> 2131 <td>gbeng 2132 </td> 2133 </tr> 2134 <tr> 2135 <td>United Kingdom 2136 </td> 2137 <td>GB 2138 </td> 2139 </tr> 2140 <tr> 2141 <td>Northern Europe 2142 </td> 2143 <td>154 2144 </td> 2145 </tr> 2146 <tr> 2147 <td>Europe 2148 </td> 2149 <td>150 2150 </td> 2151 </tr> 2152 <tr> 2153 <td>World 2154 </td> 2155 <td>001 2156 </td> 2157 </tr> 2158</table> 2159</blockquote> 2160 2161 <li>If there is an exact match for the region, but not for the usage, <ul> 2162 2163 <li>If the usage has multiple parts (eg land-agriculture-grain) drop the last part (eg land-agriculture) 2164 <li>Repeat dropping the last part and trying the result (eg land) 2165 <li>If you eliminate all of them, try usage=”default”. 2166 <li>If there is no exact match for either one, try usage=”default”, region=”001”. That will always match.</li> </ul> 2167 </li> </ul> 2168 2169<p> 2170Once you have a list of unitPreference elements, find the applicable unitPreference. For a given category, usage, and set of regions (eg “US GB”), the units are ordered from largest to smallest. 2171</p> 2172<ul> 2173<li>The geq item gives the value for the unit in the element value (or for the largest unit for mixed units). For example,<ul> 2174 <li>...geq=<em>"0.5"</em>>mile<... means 0.9 kilometers</li> 2175 <li>...geq=<em>"100.0"</em>>foot:inch<... means 100 feet</li></ul></li> 2176<li>If there is no geq attribute, then the implicit value is 1.0.</li> 2177<li>Implementations will probably convert the values into the base units, so that the comparison is fast. Thus the above would be converted internally to something like: <ul> 2178 2179 <li>≥ 804.672 meters ⇒ mile</li> 2180 <li>≥ 30.48 meters ⇒ foot:inch</li></ul></li> 2181<li>Search for the first matching unitPreference for the input measure. If there is no match (eg < 100 feet in the above example), take the last unitPreference. 2182That is, the last unitPreference is effectively geq="0"</li> 2183</ul> 2184 2185<p> 2186Once a matching unitPreference element is found: 2187</p><ul> 2188 2189<li>The unit is the element value 2190<li>The skeleton (if there is one) supplies formatting information for the unit. API settings may allow that to be overridden. 2191 <ul> 2192 <li>The syntax and semantics for the skeleton value are defined by the <a href="https://unicode-org.github.io/icu/userguide/format_parse/numbers/skeletons.html">ICU Number Skeletons</a> document.</li> 2193 </ul> 2194<li>If the unit is mixed (eg foot:inch) the skeleton applies to the final subunit; the higher subunits are formatted as integers. 2195<li>If the skeleton is missing, the default is skeleton="<strong>precision-integer/@@*</strong>". However, the client can also override or tune the number formatting.</li></ul> 2196 2197<h3>Constraints</h3> 2198 2199<ul> 2200 2201<li>For a given category, there is always a “default” usage. 2202<li>For a given category, and usage: <ul> 2203 2204 <li>There is always a 001 region. 2205 <li>None of the sets of regions can overlap. That is, you can’t have “US” on one line and “US GB” on another. You <em>can</em> have two lines with “US”, for different sizes of units.</li></ul></li> 2206<li>For a given category, usage, and region-set <ul> 2207 <li>The unitPreferences are in descending order.</li> 2208 </ul> 2209 </li> 2210</ul> 2211 2212<h3>Caveats</h3> 2213 <p>The extended unit support is still being developed further. See the Known Issues on the release page for futher information.</p> 2214 2215 <hr> 2216 <p class="copyright">Copyright © 2001–2020 Unicode, Inc. All 2217 Rights Reserved. The Unicode Consortium makes no expressed or 2218 implied warranty of any kind, and assumes no liability for 2219 errors or omissions. No liability is assumed for incidental and 2220 consequential damages in connection with or arising out of the 2221 use of the information or programs contained or accompanying 2222 this technical report. The Unicode <a href= 2223 "https://unicode.org/copyright.html">Terms of Use</a> apply.</p> 2224 <p class="copyright">Unicode and the Unicode logo are 2225 trademarks of Unicode, Inc., and are registered in some 2226 jurisdictions.</p> 2227 </div> 2228</body> 2229</html> 2230