1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2"http://www.w3.org/TR/html4/loose.dtd"> 3<html> 4 5<head> 6<meta http-equiv="Content-Type" content="text/html; charset=utf-8"> 7<meta http-equiv="Content-Language" content="en-us"> 8<link rel="stylesheet" href="http://unicode.org/reports/reports.css" 9 type="text/css"> 10<title>UTS #35: Unicode Locale Data Markup Language</title> 11<style type="text/css"> 12<!-- 13.dtd { 14 font-family: monospace; 15 font-size: 90%; 16 background-color: #CCCCFF; 17 border-style: dotted; 18 border-width: 1px; 19} 20 21.xmlExample { 22 font-family: monospace; 23 font-size: 80% 24} 25 26.blockedInherited { 27 font-style: italic; 28 font-weight: bold; 29 border-style: dashed; 30 border-width: 1px; 31 background-color: #FF0000 32} 33 34.inherited { 35 font-weight: bold; 36 border-style: dashed; 37 border-width: 1px; 38 background-color: #00FF00 39} 40 41.element { 42 font-weight: bold; 43 color: red; 44} 45 46.attribute { 47 font-weight: bold; 48 color: maroon; 49} 50 51.attributeValue { 52 font-weight: bold; 53 color: blue; 54} 55 56li, p { 57 margin-top: 0.5em; 58 margin-bottom: 0.5em 59} 60 61h2, h3, h4, h5, table { 62 margin-top: 1.5em; 63 margin-bottom: 0.5em; 64} 65 66h5 { 67 font-size: medium; 68 font-style: italic 69} 70--> 71</style> 72</head> 73 74<body> 75 76 <table class="header" width="100%"> 77 <tr> 78 <td class="icon"><a href="http://unicode.org"> <img 79 alt="[Unicode]" src="http://unicode.org/webscripts/logo60s2.gif" 80 width="34" height="33" 81 style="vertical-align: middle; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px; border-top-width: 0px;"></a> 82 <a class="bar" href="http://www.unicode.org/reports/">Technical 83 Reports</a></td> 84 </tr> 85 <tr> 86 <td class="gray"> </td> 87 </tr> 88 </table> 89 <div class="body"> 90 <h2 style="text-align: center"> 91 Unicode Technical Standard #35 92 </h2> 93 <h1 style="text-align: center">Unicode Locale Data Markup Language (LDML)</h1> 94 95 <!-- At least the first row of this header table should be identical across the parts of this UTS. --> 96 <table border="1" cellpadding="2" cellspacing="0" class="wide"> 97 <tr> 98 <td>Version</td> 99 <td>34</td> 100 </tr> 101 <tr> 102 <td>Editors</td> 103 <td><a 104 href="https://plus.google.com/114199149796022210033?rel=author"> 105 Mark Davis</a> (<a href="mailto:markdavis@google.com">markdavis@google.com</a>) 106 and <a href="tr35.html#Acknowledgments">other CLDR committee 107 members</a></td> 108 </tr> 109 <tr> 110 <td>Date</td> 111 <td>2018-10-10</td> 112 </tr> 113 <tr> 114 <!-- This link must be made live when posting the final version but is disabled during proposed update stage. --> 115 <td>This Version</td> 116 <td> 117 <a href="http://www.unicode.org/reports/tr35/tr35-53/tr35.html"> 118 http://www.unicode.org/reports/tr35/tr35-53/tr35.html</a></td> 119 </tr> 120 <tr> 121 <td>Previous Version</td> 122 <td> 123 <a href="http://www.unicode.org/reports/tr35/tr35-51/tr35.html">http://www.unicode.org/reports/tr35/tr35-51/tr35.html</a></td> 124 </tr> 125 <tr> 126 <td>Latest Version</td> 127 <td><a href="http://www.unicode.org/reports/tr35/">http://www.unicode.org/reports/tr35/</a></td> 128 </tr> 129 <tr> 130 <td>Corrigenda</td> 131 <td><a href="http://unicode.org/cldr/corrigenda.html">http://unicode.org/cldr/corrigenda.html</a></td> 132 </tr> 133 <tr> 134 <td>Latest Proposed Update</td> 135 <td><a href="http://www.unicode.org/reports/tr35/proposed.html">http://www.unicode.org/reports/tr35/proposed.html</a></td> 136 </tr> 137 <tr> 138 <td>Namespace</td> 139 <td><a href="http://cldr.unicode.org/">http://cldr.unicode.org/</a></td> 140 </tr> 141 <tr> 142 <td>DTDs</td> 143 <td><a href="http://unicode.org/cldr/dtd/34/"> 144 http://unicode.org/cldr/dtd/34/</a></td> 145 </tr> 146 <tr> 147 <td>Revision</td> 148 <td><a href="#Modifications">53</a></td> 149 </tr> 150 </table> 151 <h3> 152 <i>Summary</i> 153 </h3> 154 <p> 155 This document describes an XML format (<i>vocabulary</i>) for the 156 exchange of structured locale data. This format is used in the <a 157 href="http://cldr.unicode.org/">Unicode Common Locale Data 158 Repository</a>. 159 </p> 160 161 <h3> 162 <i>Status</i> 163 </h3> 164 165 <!-- NOT YET APPROVED 166 <p> 167 <i class="changed">This is a<b><font color="#ff3333"> 168 draft </font></b>document which may be updated, replaced, or superseded by 169 other documents at any time. Publication does not imply endorsement 170 by the Unicode Consortium. This is not a stable document; it is 171 inappropriate to cite this document as other than a work in 172 progress. 173 </i> 174 </p> 175 END NOT YET APPROVED --> 176 <!-- APPROVED --> 177 <p> 178 <i>This document has been reviewed by Unicode members and other 179 interested parties, and has been approved for publication by the 180 Unicode Consortium. This is a stable document and may be used as 181 reference material or cited as a normative reference by other 182 specifications.</i> 183 </p> 184 <!-- END APPROVED --> 185 186 <blockquote> 187 <p> 188 <i><b>A Unicode Technical Standard (UTS)</b> is an independent 189 specification. Conformance to the Unicode Standard does not imply 190 conformance to any UTS.</i> 191 </p> 192 </blockquote> 193 <p> 194 <i>Please submit corrigenda and other comments with the CLDR bug 195 reporting form [<a href="http://cldr.unicode.org/index/bug-reports">Bugs</a>]. 196 Related information that is useful in understanding this document is 197 found in the <a href="#References">References</a>. For the latest 198 version of the Unicode Standard see [<a 199 href="http://www.unicode.org/versions/latest/">Unicode</a>]. For a 200 list of current Unicode Technical Reports see [<a 201 href="http://www.unicode.org/reports/">Reports</a>]. For more 202 information about versions of the Unicode Standard, see [<a 203 href="http://www.unicode.org/versions/">Versions</a>]. 204 </i> 205 </p> 206 207 <!-- This section of Parts should be identical in all of the parts of this UTS. --> 208 <h2> 209 <a name="Parts" href="#Parts">Parts</a> 210 </h2> 211 <p>The LDML specification is divided into the following parts:</p> 212 <ul class="toc"> 213 <li>Part 1: <a href="tr35.html#Contents">Core</a> (languages, 214 locales, basic structure) 215 </li> 216 <li>Part 2: <a href="tr35-general.html#Contents">General</a> 217 (display names & transforms, etc.) 218 </li> 219 <li>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a> 220 (number & currency formatting) 221 </li> 222 <li>Part 4: <a href="tr35-dates.html#Contents">Dates</a> (date, 223 time, time zone formatting) 224 </li> 225 <li>Part 5: <a href="tr35-collation.html#Contents">Collation</a> 226 (sorting, searching, grouping) 227 </li> 228 <li>Part 6: <a href="tr35-info.html#Contents">Supplemental</a> 229 (supplemental data) 230 </li> 231 <li>Part 7: <a href="tr35-keyboards.html#Contents">Keyboards</a> 232 (keyboard mappings) 233 </li> 234 </ul> 235 236 <h2> 237 <a name="Contents" href="#Contents">Contents of Part 1, Core</a> 238 </h2> 239 <!-- START Generated TOC: CheckHtmlFiles --> 240 <ul class="toc"> 241 <li>1 <a href="#Introduction">Introduction</a> 242 <ul class="toc"> 243 <li>1.1 <a href="#Conformance">Conformance</a></li> 244 </ul> 245 </li> 246 <li>2 <a href="#Locale">What is a Locale?</a></li> 247 <li>3 <a href="#Identifiers">Unicode Language and Locale 248 Identifiers</a> 249 <ul class="toc"> 250 <li>3.1 <a href="#Unicode_language_identifier">Unicode 251 Language Identifier</a></li> 252 <li>3.2 <a href="#Unicode_locale_identifier">Unicode 253 Locale Identifier</a></li> 254 <li>3.3 <a href="#BCP_47_Conformance">BCP 47 Conformance</a> 255 <ul class="toc"> 256 <li>3.3.1 <a href="#BCP_47_Language_Tag_Conversion">BCP 257 47 Language Tag Conversion</a></li> 258 </ul> 259 </li> 260 <li>3.4 <a href="#Field_Definitions">Language Identifier 261 Field Definitions</a> 262 <ul class="toc"> 263 <li>Table: <a href="#Language_Locale_Field_Definitions">Language 264 Identifier Field Definitions</a></li> 265 </ul> 266 </li> 267 <li>3.5 <a href="#Special_Codes">Special Codes</a> 268 <ul class="toc"> 269 <li>3.5.1 <a href="#Unknown_or_Invalid_Identifiers">Unknown 270 or Invalid Identifiers</a></li> 271 <li>3.5.2 <a href="#Numeric_Codes">Numeric Codes</a></li> 272 <li>3.5.3 <a href="#Private_Use">Private Use Codes</a> 273 <ul class="toc"> 274 <li>Table: <a href="#Private_Use_CLDR">Private Use 275 Codes in CLDR</a></li> 276 </ul> 277 </li> 278 </ul> 279 </li> 280 <li>3.6 <a href="#Locale_Extension_Key_and_Type_Data">Unicode 281 BCP 47 U Extension</a> 282 <ul class="toc"> 283 <li>3.6.1 <a href="#Key_And_Type_Definitions_">Key And 284 Type Definitions</a> 285 <ul class="toc"> 286 <li>Table: <a href="#Key_Type_Definitions">Key/Type 287 Definitions</a></li> 288 </ul> 289 </li> 290 <li>3.6.2 <a href="#Numbering System Data">Numbering 291 System Data</a></li> 292 <li>3.6.3 <a href="#Time_Zone_Identifiers">Time Zone 293 Identifiers</a></li> 294 <li>3.6.4 <a href="#Unicode_Locale_Extension_Data_Files">U 295 Extension Data Files</a> 296 </li> 297 <li>3.6.5 <a href="#Unicode_Subdivision_Codes">Subdivision 298 Codes</a> 299 <ul class="toc"> 300 <li>3.6.5.1 <a href="#Validity">Validity</a></li> 301 </ul> 302 </li> 303 </ul> 304 </li> 305 <li>3.7 <a href="#t_Extension">Unicode BCP 47 T Extension</a> 306 <ul class="toc"> 307 <li>3.7.1 <a href="#Transformed_Content_Data_File">T 308 Extension Data Files</a></li> 309 </ul> 310 </li> 311 <li>3.8 <a href="#Compatibility_with_Older_Identifiers">Compatibility 312 with Older Identifiers</a> 313 <ul class="toc"> 314 <li>3.8.1 <a href="#Old_Locale_Extension_Syntax">Old 315 Locale Extension Syntax</a> 316 <ul class="toc"> 317 <li>Table: <a href="#Locale_Extension_Mappings">Locale 318 Extension Mappings</a></li> 319 </ul> 320 </li> 321 <li>3.8.2 <a href="#Legacy_Variants">Legacy Variants</a> 322 <ul class="toc"> 323 <li>Table: <a href="#Legacy_Variant_Mappings">Legacy 324 Variant Mappings</a></li> 325 </ul> 326 </li> 327 <li>3.8.3 <a href="#Relation_to_OpenI18n">Relation to 328 OpenI18n</a></li> 329 </ul> 330 </li> 331 <li>3.9 <a href="#Transmitting_Locale_Information">Transmitting 332 Locale Information</a> 333 <ul class="toc"> 334 <li>3.9.1 <a href="#Message_Formatting_and_Exceptions">Message 335 Formatting and Exceptions</a></li> 336 </ul> 337 </li> 338 <li>3.10 <a href="#Language_and_Locale_IDs">Unicode 339 Language and Locale IDs</a> 340 <ul class="toc"> 341 <li>3.10.1 <a href="#Written_Language">Written Language</a></li> 342 <li>3.10.2 <a href="#Hybrid_Locale">Hybrid Locale Identifiers</a></li> 343 </ul> 344 </li> 345 <li>3.11 <a href="#Validity_Data">Validity Data</a></li> 346 </ul> 347 </li> 348 <li>4 <a href="#Locale_Inheritance">Locale Inheritance and 349 Matching</a> 350 <ul class="toc"> 351 <li>4.1 <a href="#Lookup">Lookup</a> 352 <ul class="toc"> 353 <li>4.1.1 <a href="#Bundle_vs_Item_Lookup">Bundle vs 354 Item Lookup</a> 355 <ul class="toc"> 356 <li>Table: <a href="#Lookup-Differences">Lookup 357 Differences</a></li> 358 </ul> 359 </li> 360 <li>4.1.2 <a href="#Multiple_Inheritance">Lateral 361 Inheritance</a> 362 <ul class="toc"> 363 <li>Table: <a href="#Count_Fallback_normal">Count 364 Fallback: normal</a></li> 365 <li>Table: <a href="#Count_Fallback_currency">Count 366 Fallback: currency</a></li> 367 </ul> 368 </li> 369 <li>4.1.3 <a href="#Parent_Locales">Parent Locales</a></li> 370 </ul> 371 </li> 372 <li>4.2 <a href="#Inheritance_and_Validity">Inheritance 373 and Validity</a> 374 <ul class="toc"> 375 <li>4.2.1 <a href="#Definitions">Definitions</a></li> 376 <li>4.2.2 <a href="#Resolved_Data_File">Resolved Data 377 File</a></li> 378 <li>4.2.3 <a href="#Valid_Data">Valid Data</a></li> 379 <li>4.2.4 <a href="#Checking_for_Draft_Status">Checking 380 for Draft Status</a></li> 381 <li>4.2.5 <a href="#Keyword_and_Default_Resolution">Keyword 382 and Default Resolution</a></li> 383 <li>4.2.6 <a 384 href="#Inheritance_vs_Related">Inheritance vs Related Information</a></li> 385 </ul> 386 </li> 387 <li>4.3 <a href="#Likely_Subtags">Likely Subtags</a></li> 388 <li>4.4 <a href="#LanguageMatching">Language Matching</a> 389 <ul> 390 <li>4.4.1 <a href="#EnhancedLanguageMatching">Enhanced Language Matching</a></li> 391 </ul> 392 </li> 393 </ul> 394 </li> 395 <li>5 <a href="#XML_Format">XML Format</a> 396 <ul class="toc"> 397 <li>5.1 <a href="#Common_Elements">Common Elements</a> 398 <ul class="toc"> 399 <li>5.1.1 <a href="#special">Element special</a> 400 <ul class="toc"> 401 <li>5.1.1.1 <a href="#Sample_Special_Elements">Sample 402 Special Elements</a></li> 403 </ul> 404 </li> 405 <li>5.1.2 <a href="#Alias_Elements">Element alias</a> 406 <ul class="toc"> 407 <li>Table: <a href="#Inheritance_with_source_locale_">Inheritance 408 with source="locale"</a></li> 409 </ul> 410 </li> 411 <li>5.1.3 <a href="#Element_displayName">Element 412 displayName</a></li> 413 <li>5.1.4 <a href="#Escaping_Characters">Escaping 414 Characters</a></li> 415 </ul> 416 </li> 417 <li>5.2 <a href="#Common_Attributes">Common Attributes</a> 418 <ul class="toc"> 419 <li>5.2.1 <a href="#Attribute_type">Attribute type</a></li> 420 <li>5.2.2 <a href="#Attribute_draft">Attribute draft</a></li> 421 <li>5.2.3 <a href="#alt_attribute">Attribute alt</a></li> 422 </ul> 423 </li> 424 <li>5.3 <a href="#Common_Structures">Common Structures</a> 425 <ul class="toc"> 426 <li>5.3.1 <a href="#Date_Ranges">Date and Date Ranges</a></li> 427 <li>5.3.2 <a href="#Text_Directionality">Text 428 Directionality</a></li> 429 <li>5.3.3 <a href="#Unicode_Sets">Unicode Sets</a> 430 <ul class="toc"> 431 <li>5.3.3.1 <a href="#Lists_of_Code_Points">Lists of 432 Code Points</a></li> 433 <li>5.3.3.2 <a href="#Unicode_Properties">Unicode 434 Properties</a></li> 435 <li>5.3.3.3 <a href="#Boolean_Operations">Boolean 436 Operations</a></li> 437 <li>5.3.3.4 <a href="#UnicodeSet_Examples">UnicodeSet 438 Examples</a></li> 439 </ul> 440 </li> 441 <li>5.3.4 <a href="#String_Range">String Range</a></li> 442 </ul> 443 </li> 444 <li>5.4 <a href="#Identity_Elements">Identity Elements</a></li> 445 <li>5.5 <a href="#Valid_Attribute_Values">Valid Attribute 446 Values</a></li> 447 <li>5.6 <a href="#Canonical_Form">Canonical Form</a> 448 <ul class="toc"> 449 <li>5.6.1 <a href="#Content">Content</a></li> 450 <li>5.6.2 <a href="#Ordering">Ordering</a></li> 451 <li>5.6.3 <a href="#Comments">Comments</a></li> 452 </ul> 453 </li> 454 <li>5.7 <a href="#DTD_Annotations">DTD Annotations</a></li> 455 456 </ul> 457 </li> 458 <li>6 <a href="#Property_Data">Property Data</a> 459 <ul class="toc"> 460 <li>6.1 <a href="#Script_Metadata">Script Metadata</a></li> 461 <li>6.2 <a href="#Extended_Pictographic">Extended Pictographic</a></li> 462 <li>6.3 <a href="#Labels.txt">Labels.txt</a></li> 463 </ul> 464 </li> 465 <li>7 <a href="#Format_Parse_Issues">Issues in Formatting 466 and Parsing</a> 467 <ul class="toc"> 468 <li>7.1 <a href="#Lenient_Parsing">Lenient Parsing</a> 469 <ul class="toc"> 470 <li>7.1.1 <a href="#Motivation">Motivation</a></li> 471 <li>7.1.2 <a href="#Loose_Matching">Loose Matching</a></li> 472 </ul> 473 </li> 474 <li>7.2 <a href="#Invalid_Patterns">Handling Invalid 475 Patterns</a></li> 476 </ul> 477 </li> 478 <li>Annex A <a href="#Deprecated_Structure">Deprecated Structure</a> 479 <ul class="toc"> 480 <li>A.1 <a href="#Fallback_Elements">Element fallback</a></li> 481 <li>A.2 <a href="#BCP47_Keyword_Mapping">BCP 47 Keyword 482 Mapping</a></li> 483 <li>A.3 <a href="#Choice_Patterns">Choice Patterns</a></li> 484 <li>A.4 <a href="#Element_default">Element default</a></li> 485 <li>A.5 <a href="#Deprecated_Common_Attributes">Deprecated 486 Common Attributes</a> 487 <ul> 488 <li>A.5.1 <a href="#Attribute_standard">Attribute 489 standard</a></li> 490 <li>A.5.2 <a href="#Attribute_draft_nonLeaf">Attribute 491 draft in non-leaf elements</a></li> 492 </ul> 493 </li> 494 <li>A.6 <a href="#Element_base">Element base</a></li> 495 <li>A.7 <a href="#Element_rules">Element rules</a></li> 496 <li>A.8 <a href="#Deprecated_subelements_of_dates">Deprecated 497 subelements of <dates></a></li> 498 <li>A.9 <a href="#Deprecated_subelements_of_calendars">Deprecated 499 subelements of <calendars></a></li> 500 <li>A.10 <a href="#Deprecated_subelements_of_timeZoneNames">Deprecated 501 subelements of <timeZoneNames></a></li> 502 <li>A.11 <a href="#Deprecated_subelements_of_zone_metazone">Deprecated 503 subelements of <zone> and <metazone></a></li> 504 <li>A.12 <a 505 href="#Renamed_attribute_values_for_contextTransformUsage">Renamed 506 attribute values for <contextTransformUsage> element</a></li> 507 <li>A.13 <a href="#Deprecated_subelements_of_segmentations">Deprecated 508 subelements of <segmentations></a></li> 509 <li>A.14 <a href="#Element_cp">Element cp</a></li> 510 <li>A.15 <a href="#validSubLocales">Attribute 511 validSubLocales</a></li> 512 <li>A.16 <a href="#postCodeElements">Elements 513 postalCodeData, postCodeRegex</a></li> 514 <li>A.17 <a href="#telephoneCodeData">Element 515 telephoneCodeData</a></li> 516 </ul> 517 </li> 518 <li>Annex B <a href="#Links_to_Other_Parts">Links to Other Parts</a> 519 <ul class="toc"> 520 <li>Table: <a href="#Part_2_Links">Part 2 Links: General 521 (display names & transforms, etc.)</a></li> 522 <li>Table: <a href="#Part_3_Links">Part 3 Links: Numbers 523 (number & currency formatting)</a></li> 524 <li>Table: <a href="#Part_4_Links">Part 4 Links: Dates 525 (date, time, time zone formatting)</a></li> 526 <li>Table: <a href="#Part_5_Links">Part 5 Links: Collation 527 (sorting, searching, grouping)</a></li> 528 <li>Table: <a href="#Part_6_Links">Part 6 Links: 529 Supplemental (supplemental data)</a></li> 530 <li>Table: <a href="#Part_7_Links">Part 7 Links: Keyboards 531 (keyboard mappings)</a></li> 532 </ul> 533 </li> 534 <li><a href="#References">References</a></li> 535 <li><a href="#Acknowledgments">Acknowledgments</a></li> 536 <li><a href="#Modifications">Modifications</a></li> 537 </ul> 538 <!-- END Generated TOC: CheckHtmlFiles --> 539 <h2> 540 <a name="Introduction" href="#Introduction">1 Introduction</a> 541 </h2> 542 <p>Not long ago, computer systems were like separate worlds, 543 isolated from one another. The internet and related events have 544 changed all that. A single system can be built of many different 545 components, hardware and software, all needing to work together. Many 546 different technologies have been important in bridging the gaps; in 547 the internationalization arena, Unicode has provided a lingua franca 548 for communicating textual data. However, there remain differences in 549 the locale data used by different systems.</p> 550 <p>The best practice for internationalization is to store and 551 communicate language-neutral data, and format that data for the 552 client. This formatting can take place on any of a number of the 553 components in a system; a server might format data based on the 554 user's locale, or it could be that a client machine does the 555 formatting. The same goes for parsing data, and locale-sensitive 556 analysis of data.</p> 557 <p> 558 But there remain significant differences across systems and 559 applications in the locale-sensitive data used for such formatting, 560 parsing, and analysis. Many of those differences are simply 561 gratuitous; all within acceptable limits for human beings, but 562 yielding different results. In many other cases there are outright 563 errors. Whatever the cause, the differences can cause discrepancies 564 to creep into a heterogeneous system. This is especially serious in 565 the case of collation (sort-order), where different collation caused 566 not only ordering differences, but also different results of queries! 567 That is, with a query of customers with names between "Abbot, 568 Cosmo" and "Arnold, James", if different systems have 569 different sort orders, different lists will be returned. (For 570 comparisons across systems formatted as HTML tables, see [<a 571 href="#Comparisons">Comparisons</a>].) 572 </p> 573 <blockquote> 574 <p class="note"> 575 <b>Note:</b> There are many different equally valid ways in which 576 data can be judged to be "correct" for a particular 577 locale. The goal for the common locale data is to make it as 578 consistent as possible with existing locale data, and acceptable to 579 users in that locale. 580 </p> 581 </blockquote> 582 <p>This document specifies an XML format for the communication of 583 locale data: the Unicode Locale Data Markup Language (LDML). This 584 provides a common format for systems to interchange locale data so 585 that they can get the same results in the services provided by 586 internationalization libraries. It also provides a standard format 587 that can allow users to customize the behavior of a system. With it, 588 for example, collation (sorting) rules can be exchanged, allowing two 589 implementations to exchange a specification of tailored collation 590 rules. Using the same specification, the two implementations will 591 achieve the same results in comparing strings. Unicode LDML can also 592 be used to let a user encapsulate specialized sorting behavior for a 593 specific domain, or create a customized locale for a minority 594 language. Unicode LDML is also used in the Unicode Common Locale Data 595 Repository (CLDR). CLDR uses an open process for reconciling 596 differences between the locale data used on different systems and 597 validating the data, to produce with a useful, common, consistent 598 base of locale data.</p> 599 <p> 600 For more information, see the Common Locale Data Repository project 601 page [<a href="#localeProject">LocaleProject</a>]. 602 </p> 603 <p>As LDML is an interchange format, it was designed for ease of 604 maintenance and simplicity of transformation into other formats, 605 above efficiency of run-time lookup and use. Implementations should 606 consider converting LDML data into a more compact format prior to 607 use.</p> 608 <h3> 609 <a name="Conformance" href="#Conformance">1.1 Conformance</a> 610 </h3> 611 <p>There are many ways to use the Unicode LDML format and the data 612 in CLDR, and the Unicode Consortium does not restrict the ways in 613 which the format or data are used. However, an implementation may 614 also claim conformance to LDML or to CLDR, as follows:</p> 615 <p> </p> 616 <p> 617 <i><b>UAX35-C1.</b> </i>An implementation that claims conformance to 618 this specification shall: 619 </p> 620 <ol> 621 <li>Identify the sections of the specification that it conforms 622 to. 623 <ul> 624 <li>For example, an implementation might claim conformance to 625 all LDML features except for <i>transforms</i> and <i>segments</i>. 626 </li> 627 </ul> 628 </li> 629 <li>Interpret the relevant elements and attributes of LDML 630 documents in accordance with the descriptions in those sections. 631 <ul> 632 <li>For example, an implementation that claims conformance to 633 the date format patterns must interpret the characters in such 634 patterns according to <a 635 href="tr35-dates.html#Date_Field_Symbol_Table">Date Field 636 Symbol Table</a>. 637 </li> 638 </ul> 639 </li> 640 <li>Declare which types of CLDR data that it uses. 641 <ul> 642 <li>For example, an implementation might declare that it only 643 uses language names, and those with a <i>draft</i> status of <i>contributed</i> 644 or <i>approved</i>. 645 </li> 646 </ul> 647 </li> 648 </ol> 649 <p> 650 <i><b>UAX35-C2.</b> </i>An implementation that claims conformance to 651 Unicode locale or language identifiers shall: 652 </p> 653 <ol> 654 <li>Specify whether Unicode locale extensions are allowed</li> 655 <li>Specify the canonical form used for identifiers in terms of 656 casing and field separator characters.</li> 657 </ol> 658 <p>External specifications may also reference particular 659 components of Unicode locale or language identifiers, such as:</p> 660 <blockquote> 661 <p> 662 <i>Field X can contain any Unicode region subtag values as given 663 in Unicode Technical Standard #35: Unicode Locale Data Markup 664 Language (LDML), excluding grouping codes.</i> 665 </p> 666 </blockquote> 667 <h2> 668 <a name="Locale" href="#Locale">2 What is a Locale?</a> 669 </h2> 670 <p>Before diving into the XML structure, it is helpful to describe 671 the model behind the structure. People do not have to subscribe to 672 this model to use data in LDML, but they do need to understand it so 673 that the data can be correctly translated into whatever model their 674 implementation uses.</p> 675 <p> 676 The first issue is basic: <i>what is a locale?</i> In this model, a 677 locale is an identifier (id) that refers to a set of user preferences 678 that tend to be shared across significant swaths of the world. 679 Traditionally, the data associated with this id provides support for 680 formatting and parsing of dates, times, numbers, and currencies; for 681 measurement units, for sort-order (collation), plus translated names 682 for time zones, languages, countries, and scripts. The data can also 683 include support for text boundaries (character, word, line, and 684 sentence), text transformations (including transliterations), and 685 other services. 686 </p> 687 <p>Locale data is not cast in stone: the data used on 688 someone's machine generally may reflect the US format, for 689 example, but preferences can typically set to override particular 690 items, such as setting the date format for 2002.03.15, or using 691 metric or Imperial measurement units. In the abstract, locales are 692 simply one of many sets of preferences that, say, a website may want 693 to remember for a particular user. Depending on the application, it 694 may want to also remember the user's time zone, preferred 695 currency, preferred character set, smoker/non-smoker preference, meal 696 preference (vegetarian, kosher, and so on), music preference, 697 religion, party affiliation, favorite charity, and so on.</p> 698 <p>Locale data in a system may also change over time: country 699 boundaries change; governments (and currencies) come and go: 700 committees impose new standards; bugs are found and fixed in the 701 source data; and so on. Thus the data needs to be versioned for 702 stability over time.</p> 703 <p> 704 In general terms, the locale id is a parameter that is supplied to a 705 particular service (date formatting, sorting, spell-checking, and so 706 on). The format in this document does not attempt to represent all 707 the data that could conceivably be used by all possible services. 708 Instead, it collects together data that is in common use in systems 709 and internationalization libraries for basic services. The main 710 difference among locales is in terms of language; there may also be 711 some differences according to different countries or regions. 712 However, the line between <i>locales</i> and <i>languages</i>, as 713 commonly used in the industry, are rather fuzzy. Note also that the 714 vast majority of the locale data in CLDR is in fact language data; 715 all non-linguistic data is separated out into a separate tree. For 716 more information, see <i><a href="#Language_and_Locale_IDs">Section 717 3.10 Language and Locale IDs</a></i>. 718 </p> 719 <p> 720 We will speak of data as being "in locale X". That does not 721 imply that a locale <i>is</i> a collection of data; it is simply 722 shorthand for "the set of data associated with the locale id 723 X". Each individual piece of data is called a <i>resource </i>or 724 <i>field</i>, and a tag indicating the key of the resource is called 725 a <i>resource tag.</i> 726 </p> 727 <h2> 728 <a name="Identifiers" href="#Identifiers"></a><a 729 name="Unicode_Language_and_Locale_Identifiers" 730 href="#Unicode_Language_and_Locale_Identifiers"> 3 Unicode 731 Language and Locale Identifiers</a> 732 </h2> 733 <p> 734 Unicode LDML uses stable identifiers based on [<a href="#BCP47">BCP47</a>] 735 for distinguishing among languages, locales, regions, currencies, 736 time zones, transforms, and so on. There are many systems for 737 identifiers for these entities. The Unicode LDML identifiers may not 738 match the identifiers used on a particular target system. If so, some 739 process of identifier translation may be required when using LDML 740 data. 741 </p> 742 <p> 743 The BCP 47 extensions (-u- and -t-) are described in <em>Section 744 3.6 <a href="#u_Extension">Unicode BCP 47 U Extension</a> 745 </em> and <em>Section 3.7 <a href="#BCP47_T_Extension">Unicode 746 BCP 47 T Extension</a></em>. 747 </p> 748 <h3> 749 <i><a name="Unicode_language_identifier" 750 href="#Unicode_language_identifier">3.1 Unicode Language 751 Identifier</a></i> 752 </h3> 753 <p> 754 A <i>Unicode language identifier</i> has the following structure 755 (provided in either EBNF (Perl-based) or ABNF [<a href="#RFC5234">RFC5234</a>]). 756 The following table defines syntactically well-formed identifiers: 757 they are not necessarily valid identifiers. For additional validity 758 criteria, see the links on the right. 759 </p> 760 <table> 761 <tr> 762 <th> </th> 763 <th><div align="center">EBNF</div></th> 764 <th><div align="center">ABNF</div></th> 765 <th><div align="center">Validity / Comments</div></th> 766 </tr> 767 <tr> 768 <td><code> 769 <a href="#unicode_language_id" name="unicode_language_id">unicode_language_id</a> 770 </code></td> 771 <td><code> 772 = "root"<br> 773 | (unicode_language_subtag <br> (sep 774 unicode_script_subtag)? <br> | unicode_script_subtag)<br> 775 (sep unicode_region_subtag)? <br> 776 (sep 777 unicode_variant_subtag)* ; 778 </code></td> 779 <td><code> 780 = "root"<br> 781/ (unicode_language_subtag <br> [sep 782 unicode_script_subtag] <br> / unicode_script_subtag)<br> 783 [sep unicode_region_subtag] <br> 784 *(sep 785 unicode_variant_subtag) 786 </code></td><td>"root" is treated as a special <code>unicode_language_subtag</code></tr> 787 <tr> 788 <td><code> 789 <a href="#unicode_language_subtag" name="unicode_language_subtag">unicode_language_subtag</a> 790 </code></td> 791 <td><code> = alpha{2,3} | alpha{5,8}; </code></td> 792 <td><code> = 2*3ALPHA / 5*8ALPHA </code></td> 793 <td><code> 794 <a href='#unicode_language_subtag_validity'>validity</a><br> 795 <a href='http://unicode.org/cldr/latest/common/validity/language.xml'>latest-data</a> 796 </code></td> 797 </tr> 798 <tr> 799 <td><code> 800 <a href="#unicode_script_subtag" name="unicode_script_subtag">unicode_script_subtag</a> 801 </code></td> 802 <td><code>= alpha{4} ;</code></td> 803 <td><code>= 4ALPHA</code></td> 804 <td><code> 805 <a href='#unicode_script_subtag_validity'>validity</a><br> 806 <a href='http://unicode.org/cldr/latest/common/validity/script.xml'>latest-data</a> 807 </code></td> 808 </tr> 809 <tr> 810 <td><code> 811 <a href="#unicode_region_subtag" name="unicode_region_subtag">unicode_region_subtag</a> 812 </code></td> 813 <td><code>= (alpha{2} | digit{3}) ;</code></td> 814 <td><code>= 2ALPHA / 3DIGIT</code></td> 815 <td><code> 816 <a href='#unicode_language_subtag_validity'>validity</a><br> 817 <a href='http://unicode.org/cldr/latest/common/validity/region.xml'>latest-data</a> 818 </code></td> 819 </tr> 820 <tr> 821 <td><code> 822 <a href="#unicode_variant_subtag" name="unicode_variant_subtag">unicode_variant_subtag</a> 823 </code></td> 824 <td><code> 825 = (alphanum{5,8} <br> | digit alphanum{3}) ; 826 </code></td> 827 <td><code> 828 = 5*8alphanum<br>/ (DIGIT 3alphanum) 829 </code></td> 830 <td><code> 831 <a href='#unicode_language_subtag_validity'>validity</a><br> 832 <a href='http://unicode.org/cldr/latest/common/validity/variant.xml'>latest-data</a> 833 </code></td> 834 </tr> 835 <tr> 836 <td><code>sep</code></td> 837 <td><code>= [-_] ;</code></td> 838 <td><code>= "-" / "_"</code></td> 839 </tr> 840 <tr> 841 <td><code>digit</code></td> 842 <td><code>= [0-9] ;</code></td> 843 <td><code> </code></td> 844 </tr> 845 <tr> 846 <td><code>alpha</code></td> 847 <td><code>= [A-Z a-z] ;</code></td> 848 <td><code> </code></td> 849 </tr> 850 <tr> 851 <td><code>alphanum</code></td> 852 <td><code>= [0-9 A-Z a-z] ;</code></td> 853 <td><code>= ALPHA / DIGIT</code></td> 854 </tr> 855 </table> 856 <p> 857 The semantics of the various subtags is explained in <em>Section 858 3.4 <a href="#Field_Definitions">Language Identifier Field 859 Definitions</a> 860 </em>; there are also direct links from 861 <code> 862 <a href="#unicode_language_subtag">unicode_language_subtag</a> 863 </code> 864 , etc. While theoretically the 865 <code> 866 <a href="#unicode_language_subtag">unicode_language_subtag</a> 867 </code> 868 may have more than 3 letters through the IANA registration process, 869 in practice that has not occurred. The 870 <code> 871 <a href="#unicode_language_subtag">unicode_language_subtag</a> 872 </code> 873 "und" may be omitted when there is a 874 <code> 875 <a href="#unicode_script_subtag">unicode_script_subtag</a> 876 </code> 877 ; for that reason 878 <code> 879 <a href="#unicode_language_subtag">unicode_language_subtag</a> 880 </code> 881 values with 4 letters are not permitted. However, such 882 <code> 883 <a href="#unicode_language_id">unicode_language_id</a> 884 </code> 885 values are not intended for general interchange, because they are not 886 valid BCP 47 tags. Instead, they are intended for certain protocols 887 such as the identification of transliterators or font ScriptLangTag 888 values. 889 </p> 890 <p>For example, "en-US" (American English), 891 "en_GB" (British English), "es-419" (Latin 892 American Spanish), and "uz-Cyrl" (Uzbek in Cyrillic) are 893 all valid Unicode language identifiers.</p> 894 <h3> 895 <i><a name="Unicode_locale_identifier" 896 href="#Unicode_locale_identifier">3.2 Unicode Locale Identifier</a></i> 897 </h3> 898 <p> 899 A <i>Unicode locale identifier</i> is composed of a Unicode language 900 identifier plus (optional) locale extensions. It has the 901 following structure. The semantics of the U and T extensions are 902 explained in <em>Section 3.6 <a href="#u_Extension">Unicode 903 BCP 47 U Extension</a> 904 </em> and <em>Section 3.7 <a href="#BCP47_T_Extension">Unicode 905 BCP 47 T Extension</a></em>. Other extensions and private use extensions are supported for pass-through. The following table defines syntactically 906 <em>well-formed</em> identifiers: they are not necessarily <em>valid</em> identifiers. 907 For additional validity criteria, see the links on the right. </p> 908 <table border="0"> 909 <tr> 910 <th> </th> 911 <th><div align="center">EBNF</div></th> 912 <th><div align="center">ABNF</div></th> 913 <th><div align="center">Validity</div></th> 914 </tr> 915 <tr> 916 <td><code> 917 <a href="#unicode_locale_id" name="unicode_locale_id">unicode_locale_id</a> 918 </code></td> 919 <td><code> 920 = unicode_language_id<br> 921 extensions*<br> 922 923pu_extensions? ; </code></td> 924 <td><code> 925 = unicode_language_id<br> 926 [extensions] <br> 927 928 1*pu_extensions </code></td> 929 </tr> 930 <tr> 931 <td><code> 932 <a href="#extensions" name="extensions">extensions</a> 933 </code></td> 934 <td><code> 935 = unicode_locale_extensions <br> 936 | transformed_extensions <br> 937 | other_extensions ;</code></td> 938 <td><code>= unicode_locale_extensions <br> 939/ transformed_extensions <br> 940/ other_extensions</code></td> 941 </tr> 942 <tr> 943 <td><code> 944 <a href="#unicode_locale_extensions" 945 name="unicode_locale_extensions">unicode_locale_extensions</a> 946 </code></td> 947 <td><code> 948 = sep [uU]<br> ((sep keyword)+ <br> |(sep attribute)+ 949 (sep keyword)*) ; 950 </code></td> 951 <td><code> 952 = sep "u" <br> (1*(sep keyword) <br> / 1*(sep 953 attribute) *(sep keyword)) 954 </code></td> 955 </tr> 956 <tr> 957 <td><code> 958 <a href="#transformed_extensions" name="transformed_extensions">transformed_extensions</a> 959 </code></td> 960 <td><code> 961 = sep [tT] <br> ((sep tlang (sep tfield)*) <br> 962 | (sep tfield)+) ; </code></td> 963 <td><code> 964 = sep "t" <br> ((sep tlang 965 *(sep tfield)) <br> / 1*(sep tfield)) </code></td> 966 </tr> 967 <tr> 968 <td><code><a href="#pu_extensions" name="pu_extensions">pu_extensions</a></code></td> 969 <td><code>= sep [xX] <br> 970 971 (sep alphanum{1,8})* ;</code></td> 972 <td><code>= sep "x" <br> 973 974 [sep 1*8alphanum]</code></td> 975 </tr> 976 <tr> 977 <td><code><a href="#other_extensions" name="other_extensions">other_extensions</a></code></td> 978 <td><code>= [alphanum-[tTuUxX]]<br> 979 980 (sep alphanum{2,8})* ;</code></td> 981 <td><code>= (DIGIT<br> 982 983 / %x61-%x73<br> 984 / %x76-%x77<br> 985 986 / %x79-%x7A)<br> 987 988 *(sep 2*8alphanum)</code></td> 989 </tr> 990 <tr> 991 <td><code>keyword</code></td> 992 <td><code>= key (sep type)? ;</code></td> 993 <td><code>= key [sep type]</code></td> 994 </tr> 995 <tr> 996 <td><code>key</code></td> 997 <td><code> 998 = alphanum alpha ; 999 </code></td> 1000 <td><code> 1001 = alphanum ALPHA 1002 </code></td> 1003 <td><code> 1004 <a href="#Key_Type_Definitions">validity</a><br> 1005 <a 1006 href='http://unicode.org/cldr/latest/common/bcp47'>latest-data</a> 1007 </code></td> 1008 </tr> 1009 <tr> 1010 <td><code>type</code></td> 1011 <td><code> 1012 = alphanum{3,8}<br> (sep alphanum{3,8})* ; 1013 </code></td> 1014 <td><code> 1015 = 3*8alphanum<br> *(sep 3*8alphanum) 1016 </code></td> 1017 <td><code> 1018 <a href="#Key_Type_Definitions">validity</a><br> 1019 <a 1020 href='http://unicode.org/cldr/latest/common/bcp47'>latest-data</a> 1021 </code></td> 1022 </tr> 1023 <tr> 1024 <td><code>attribute</code></td> 1025 <td><code>= alphanum{3,8} ;</code></td> 1026 <td><code>= 3*8alphanum</code></td> 1027 </tr> 1028 <tr> 1029 <td><code> 1030 <a name="unicode_subdivision_id" href="#unicode_subdivision_id">unicode_subdivision_id</a><a 1031 name="unicode_subdivision_subtag"></a><a 1032 name="subdivision_attribute"></a> 1033 </code></td> 1034 <td><code> 1035 = <a href="#unicode_region_subtag">unicode_region_subtag</a> unicode_subdivision_suffix ; 1036 </code></td> 1037 <td><code> 1038 = <a href="#unicode_region_subtag">unicode_region_subtag</a> unicode_subdivision_suffix 1039 </code></td> 1040 <td><code> 1041 <a href='#unicode_subdivision_subtag_validity'>validity</a><br> 1042 <a 1043 href='http://unicode.org/cldr/latest/common/validity/subdivision.xml'>latest-data</a> 1044 </code></td> 1045 1046 </tr> 1047 <tr> 1048 <td><code>unicode_subdivision_suffix</code></td> 1049 <td><code> = (alphanum{1,4} ;</code></td> 1050 <td><code>= 1*4alphanum</code></td> 1051 </tr> 1052 <tr> 1053 <td><code> 1054 <a name="unicode_measure_unit" href="#unicode_measure_unit">unicode_measure_unit</a> 1055 </code></td> 1056 <td><code> 1057 = alphanum{3,8}<br> (sep alphanum{3,8})* ; 1058 </code></td> 1059 <td><code> 1060 = 3*8alphanum<br> *(sep 3*8alphanum) 1061 </code></td> 1062 <td><code> 1063 <a href='#Validity_Data'>validity</a><br> 1064 <a 1065 href='http://unicode.org/cldr/latest/common/validity/unit.xml'>latest-data</a> 1066 </code></td> 1067 </tr> 1068 <tr> 1069 <td><code>tlang</code></td> 1070 <td><code> 1071 = unicode_language_subtag<br> (sep unicode_script_subtag)?<br> (sep unicode_region_subtag)?<br> (sep unicode_variant_subtag)* ; </code></td> 1072 <td><code> 1073 = unicode_language_subtag <br> [sep unicode_script_subtag] <br> [sep unicode_region_subtag] <br> 1074 1075 *(sep unicode_variant_subtag) </code></td> 1076 </tr> 1077 <tr> 1078 <td><code>tfield</code></td> 1079 <td><code> 1080 = tkey tvalue; 1081 </code></td> 1082 <td><code> 1083 = tkey tvalue 1084 </code></td> 1085 <td><code> 1086 <a href="#BCP47_T_Extension">validity</a><br> 1087 <a 1088 href='http://unicode.org/cldr/latest/common/bcp47'>latest-data</a> 1089 </code></td> 1090 1091 </tr> 1092 <tr> 1093 <td><code> 1094 tkey 1095 </code></td> 1096 <td><code> 1097 = alpha digit ; 1098 </code></td> 1099 <td><code>= ALPHA DIGIT</code></td> 1100 </tr> 1101 <tr> 1102 <td><code> 1103 tvalue 1104 </code></td> 1105 <td><code>= (sep alphanum{3,8})+ ;</code></td> 1106 <td><code>= 1*(sep 3*8alphanum)</code></td> 1107 </tr> 1108 </table> 1109 1110 <p> 1111 For historical reasons, this is called a Unicode locale identifier. 1112 However, it really functions (with few exceptions) as a <span 1113 class="st">language</span> identifier, and accesses <span class="st">language</span>-based 1114 data. Except where it would be unclear, this document uses the term 1115 "locale" data loosely to encompass both types of data: for 1116 more information, see <i><a href="#Language_and_Locale_IDs">Section 1117 3.10 Language and Locale IDs</a></i>. 1118 </p> 1119 <p></p> 1120 <p>As of the release of this specification, there were no other_extensions defined. The other_extensions are present in the syntax to allow implementations to preserve that information. There cannot be more than one extension with the same singleton (-u-, -t-, ...). The private use extension must come after all other extensions. 1121 </p> 1122 <p>As for terminology, the term <i>code</i> may also be used instead of 1123 "subtag", and "territory" instead of 1124 "region". The primary language subtag is also called the <i>base 1125 language code</i>. For example, the base language code for 1126 "en-US" (American English) is "en" (English). The 1127 <i>type</i> may also be referred to as a <i>value</i> or <i>key-value</i>. 1128 </p> 1129 <p> 1130 The identifiers can vary in case and in the separator characters. The 1131 "-" and "_" separators are treated as equivalent, although "-" is preferred.</p> 1132 <p>All identifier field values are case-insensitive. Although case 1133 distinctions do not carry any special meaning, an implementation of 1134 LDML should use the casing recommendations in [<a href="#BCP47">BCP47</a>], 1135 especially when a Unicode locale identifier is used for locale data 1136 exchange in software protocols.</p> 1137 <p>The canonical form of a <code><a href="#unicode_locale_id">unicode_locale_id</a></code> has:</p> 1138 <ul> 1139 <li> a language subtag (those beginning with a script subtag only are specialized use)</li> 1140 <li>any script subtag in title case (eg, Hant)</li> 1141 <li>any region subtag in uppercase (eg, DE)</li> 1142 <li>all other subtags in lowercase (eg, en)</li> 1143 <li>any variants in alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)</li> 1144 <li>any extensions in alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)</li> 1145 </ul> 1146 <p> 1147 <b>Note:</b> The current version of CLDR data uses some non-preferred forms for backward compatibility. This might be changed in future CLDR releases.</p> 1148 <ul> 1149 <li>It uses uppercase letters for 1150 variant subtags, while the preferred forms are all lowercase.</li> 1151 <li>It uses "_" as the separator, while the preferred form of the separator is "-".</li> 1152 <li>It uses "root", while the preferred form is "und".</li> 1153 </ul> 1154 <h3> 1155 <a name="BCP_47_Conformance" href="#BCP_47_Conformance">3.3 BCP 1156 47 Conformance</a> 1157 </h3> 1158 <p> 1159 Unicode language and locale identifiers inherit the design and the 1160 repertoire of subtags from [<a href="#BCP47">BCP47</a>] Language 1161 Tags. There are some extensions and restrictions made for the use of 1162 the Unicode locale identifier in CLDR: 1163 </p> 1164 <ul> 1165 <li>It does not allow for the full syntax of [<a href="#BCP47">BCP47</a>]: 1166 <ul> 1167 <li>No extlang subtags are allowed (as in the BCP 47 canonical form, see BCP 47 <a href="https://tools.ietf.org/html/bcp47#section-4.5">Section 4.5</a> and <a href="https://tools.ietf.org/html/bcp47#section-3.1.7" target="_blank" >Section 3.1.7</a>)</li> 1168 <li>No irregular BCP 47 grandfathered tags are allowed (these are all deprecated in BCP 47)</li> 1169 <li>A tag must not start with the subtag "x": thus a <em>privateuse</em> (eg x-abc) can only be after a language subtag, like "und"</li> 1170 </ul> 1171 </li> 1172 <li>It allows for certain semantic additions and constraints: 1173 <ul> 1174 <li>Certain codes that are private-use in BCP-47 and ISO are given semantics by LDML</li> 1175 <li>Each macrolanguage has an identified primary encompassed language, which is treated as an alias for the macrolanguage, and thus is replaced when canonicalizing (as allowed by BCP 47, see <a href="https://tools.ietf.org/html/bcp47#section-4.1.2">Section 4.1.2</a>)</li> 1176 </ul> 1177 </li> 1178 <li>It allows certain syntax for backwards compatibility (not BCP 47-compatible): 1179 <ul> 1180 <li>The "_" character for field separator characters, as well as the "-" used in [<a href="#BCP47">BCP47</a>] 1181 (however, the canonical form is with "-")</li> 1182 <li>The subtag "root" to indicate the generic locale used as the parent 1183 of all languages in the CLDR data model ("und" can be used instead)</li> 1184 <li>The language tag may begin with a script subtag rather than a language subtag. This is specialized use only, and not required for CLDR conformance.</li> 1185 </ul> 1186 </li> 1187 </ul> 1188 <p>There are thus two subtypes of Unicode locale identifiers:</p> 1189 <ul> 1190 <li>the term <em>Unicode CLDR locale identifier</em> applies where the backwards compatibility syntax is used.</li> 1191 <li>the term <em>Unicode BCP 47 locale identifier</em> applies otherwise. A <em>Unicode BCP 47 locale identifier</em> is also a valid BCP 47 language tag.</li> 1192 </ul> 1193 <h4> 1194 <a name="BCP_47_Language_Tag_Conversion" 1195 href="#BCP_47_Language_Tag_Conversion">3.3.1 BCP 47 Language Tag 1196 Conversion</a> 1197 </h4> 1198 <p>The different identifiers can be converted to one another as described in this section. 1199 <p> 1200 <h5> 1201 <a name="Language_Tag_to_Locale_Identifier" 1202 href="#Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to Unicode BCP 47 Locale Identifier</a> 1203 </h5> 1204 <p>A valid [<a href="#BCP47">BCP47</a>] language tag can be converted 1205 to a valid Unicode BCP 47 locale identifier by performing the 1206 following transformation. </p> 1207 <ol> 1208 <li>Canonicalize the language tag (afterwards, there will be no 1209 extlang subtags).</li> 1210 <li>If the BCP 47 primary language subtag matches the <i>type</i> 1211 attribute of a <i>languageAlias</i> element in <a 1212 href="tr35-info.html#Supplemental_Data">Supplemental Data</a>, 1213 replace the language subtag with the <i>replacement</i> value. 1214 <ol> 1215 <li>If there are additional subtags in the <i>replacement</i> 1216 value, add them to the result, but only if there is no 1217 corresponding subtag already in the tag. 1218 </li> 1219 </ol> 1220 </li> 1221 <li>If the BCP 47 region subtag matches the <i>type</i> 1222 attribute of a <i>territoryAlias</i> element in <a 1223 href="tr35-info.html#Supplemental_Data">Supplemental Data</a>, 1224 replace the language subtag with the <i>replacement</i> value, as 1225 follows: 1226 <ol> 1227 <li>If there is a single territory in the replacement, use it.</li> 1228 <li>If there are multiple territories: 1229 <ol> 1230 <li>Look up the most likely territory for the base language 1231 code (and script, if there is one).</li> 1232 <li>If that likely territory is in the list, use it.</li> 1233 <li>Otherwise, use the first territory in the list.</li> 1234 </ol> 1235 </li> 1236 </ol> 1237 </li> 1238 <li>If the tag is one of the five deprecated grandfathered tags (cel-gaulish, i-default, i-enochian, i-mingo, zh-min) remaining after step #1, prefix by "und-x-".</li> 1239 <li>If the first subtag is "x", prefix by "und-".</li> 1240 </ol> 1241 <p>The result is a Unicode BCP 47 locale identifier, in canonical form. It is both a BCP 47 language tag and a Unicode locale identifier. Because the process maps from all BCP 47 language tags into a subset of BCP 47 language tags, the format changes are not reversible, much as a lowercase transformation of the string “McGowan” is not reversible.</p> 1242 <br> 1243 <p><em>Examples</em></p> 1244 <table> 1245 <tr> 1246 <th style='width:10em'>BCP 47 language tag</th> 1247 <th style='width:10em'>Unicode BCP 47 locale identifier</th> 1248 <th>Comments</th> 1249 </tr> 1250 <tr> 1251 <td><code>en-US</code></td> 1252 <td><code>en-US</code></td> 1253 <td>no changes</td> 1254 </tr> 1255 <tr> 1256 <td><code>iw-FX</code></td> 1257 <td><code>he-FR</code></td> 1258 <td>BCP 47 canonicalization [1]</td> 1259 </tr> 1260 <tr> 1261 <td><code>cmn-TW</code></td> 1262 <td><code>zh-TW</code></td> 1263 <td>language alias [2]</td> 1264 </tr> 1265 <tr> 1266 <td><code>zh-cmn-TW</code></td> 1267 <td><code>zh-TW</code></td> 1268 <td>BCP 47 canonicalization [1], then language alias [2]</td> 1269 </tr> 1270 <tr> 1271 <td><code>sr-CS</code></td> 1272 <td><code>sr-RS</code></td> 1273 <td>territory alias [3]</td> 1274 </tr> 1275 <tr> 1276 <td><code>sh</code></td> 1277 <td><code>sr-Latn</code></td> 1278 <td>multiple replacement subtags [2.1]</td> 1279 </tr> 1280 <tr> 1281 <td><code>sh-Cyrl</code></td> 1282 <td><code>sr-Cyrl</code></td> 1283 <td>no replacement with multiple replacement subtags [2.1 doesn't apply]</td> 1284 </tr> 1285 <tr> 1286 <td><code>hy-SU</code></td> 1287 <td><code>hy-AM</code></td> 1288 <td>multiple territory values [3.2]<br> <code><territoryAlias 1289 type="SU" replacement="RU AM AZ BY EE GE KZ KG LV 1290 LT MD TJ TM UA UZ" …/></code></td> 1291 </tr> 1292 <tr> 1293 <td><code>i-enochian</code></td> 1294 <td><code>und-x-i-enochian</code></td> 1295 <td>prefix any grandfathered tags with "und-x-" [4]</td> 1296 </tr> 1297 <tr> 1298 <td><code>x-abc</code></td> 1299 <td><code>und-x-abc</code></td> 1300 <td>prefix with "und-", so that there is always a base language subtag [5]</td> 1301 </tr> 1302 </table> 1303 <p> </p> 1304 <h5> 1305 <a name="Unicode_Locale_Identifier_CLDR_to_BCP_47" 1306 href="#Unicode_Locale_Identifier_CLDR_to_BCP_47">Unicode Locale Identifier: CLDR to BCP 47</a> 1307 </h5> 1308 1309 <p>A Unicode CLDR locale identifier can be converted to a valid [<a 1310 href="#BCP47">BCP47</a>] language tag (which is also a Unicode BCP 47 locale identifier) by performing the following 1311 transformation. </p> 1312 <ol> 1313 <li>Replace the "_" separators with "-"</li> 1314 <li>Replace the special language identifier "root" with the BCP 1315 47 primary language tag "und"</li> 1316 <li>Add an initial "und" primary language subtag if the first subtag is a script.</li> 1317 </ol> 1318 <p><em>Examples:</em></p> 1319 <table> 1320 <tr> 1321 <th style='width:10em'>Unicode CLDR locale identifier</th> 1322 <th style='width:10em'>BCP 47 language tag</th> 1323 <th>Comments</th> 1324 </tr> 1325 <tr> 1326 <td><code>en_US</code></td> 1327 <td><code>en-US</code></td> 1328 <td>change separator [1]</td> 1329 </tr> 1330 <tr> 1331 <td><code>de_DE_u_co_phonebk</code></td> 1332 <td><code>de-DE-u-co-phonebk</code></td> 1333 <td>change separator [1]</td> 1334 </tr> 1335 <tr> 1336 <td><code>root</code></td> 1337 <td><code>und</code></td> 1338 <td>change to "und" [2]</td> 1339 </tr> 1340 <tr> 1341 <td><code>root_u_cu_usd</code></td> 1342 <td><code>und-u-cu-usd</code></td> 1343 <td>change to "und" [1, 2]</td> 1344 </tr> 1345 <tr> 1346 <td><code>Latn_DE</code></td> 1347 <td><code>und-Latn-DE</code></td> 1348 <td>add "und" [1, 3]</td> 1349 </tr> 1350 </table><br> 1351 <p></p> 1352 <h5> 1353 <a name="Unicode_Locale_Identifier_BCP_47_to_CLDR" 1354 href="#Unicode_Locale_Identifier_BCP_47_to_CLDR">Unicode Locale Identifier: BCP 47 to CLDR</a> 1355 </h5> 1356 1357 <p>A Unicode BCP 47 locale identifier can be transformed into a Unicode CLDR locale identifier by performing the following transformation.</p> 1358 <ol> 1359 <li>the separator is changed to "_"</li> 1360 <li>the primary language subtag "und" is replaced with "root" 1361 if no script, region, or variant subtags are present.</li> 1362 </ol> 1363 <p><em>Examples:</em></p> 1364 <table> 1365 <tr> 1366 <th style='width:10em'>BCP 47 language tag</th> 1367 <th style='width:10em'>Unicode CLDR locale identifier</th> 1368 <th>Comments</th> 1369 </tr> 1370 <tr> 1371 <td><code>en-US</code></td> 1372 <td><code>en_US</code></td> 1373 <td>changes separator [1]</td> 1374 </tr> 1375 <tr> 1376 <td><code>und</code></td> 1377 <td><code>root</code></td> 1378 <td>changes to "root", because no script, region, or variant tag is 1379 present [2]</td> 1380 </tr> 1381 <tr> 1382 <td><code>und-US</code></td> 1383 <td><code>und_US</code></td> 1384 <td>no change to "und", because a region subtag is present [1]</td> 1385 </tr> 1386 <tr> 1387 <td nowrap><code>und-u-cu-USD</code></td> 1388 <td nowrap><code>root_u_cu_usd</code></td> 1389 <td>changes to "root", because no script, region, or variant tag is 1390 present [1, 2]</td> 1391 </tr> 1392 </table> 1393 <h3> 1394 <a name="Field_Definitions" href="#Field_Definitions">3.4 1395 Language Identifier Field Definitions </a> 1396 </h3> 1397 <p> 1398 Unicode language and locale identifier field values are provided in 1399 the following table. Note that some private-use BCP 47 field values 1400 are given specific meanings in CLDR. While field values are based on 1401 [<a href="#BCP47">BCP47</a>] subtag values, their validity status in 1402 CLDR is specified by means of machine-readable files in the <a 1403 href='http://unicode.org/repos/cldr/tags/latest/common/validity/'>common/validity/</a> 1404 subdirectory, such as language.xml. For the format of those files and 1405 more information, see <em><a href='#Validity_Data'>Section 1406 3.11 Validity Data</a></em>. 1407 </p> 1408 <table> 1409 <caption> 1410 <a name="Language_Locale_Field_Definitions" 1411 href="#Language_Locale_Field_Definitions">Language Identifier 1412 Field Definitions </a> 1413 </caption> 1414 <tr> 1415 <th>Field</th> 1416 <th>Valid values</th> 1417 </tr> 1418 <tr> 1419 <td><a href="#unicode_language_subtag_validity" 1420 name="unicode_language_subtag_validity">unicode_language_subtag</a> 1421 <p> 1422 (also known as a <i>Unicode base language code)</i> 1423 </p></td> 1424 <td>Subtags in the language.xml file (see <em>Section 3.11 1425 <a href="#Validity_Data">Validity Data</a> 1426 </em>). These are based on [<a href="#BCP47">BCP47</a>] subtag values 1427 marked as <b>Type: language</b> 1428 <p>ISO 639-3 introduces the notion of 1429 "macrolanguages", where certain ISO 639-1 or ISO 639-2 1430 codes are given broad semantics, and additional codes are given 1431 for the narrower semantics. For backwards compatibility, Unicode 1432 language identifiers retain use of the narrower semantics for 1433 these codes. For example:</p> 1434 <table border="1" cellspacing="0" cellpadding="2" 1435 style="margin: 0.5em"> 1436 <tr> 1437 <th>For</th> 1438 <th>Use</th> 1439 <th><i>Not</i></th> 1440 </tr> 1441 <tr> 1442 <td>Standard Chinese (Mandarin)</td> 1443 <td><code>zh</code></td> 1444 <td><code>cmn</code></td> 1445 </tr> 1446 <tr> 1447 <td>Standard Arabic</td> 1448 <td><code>ar</code></td> 1449 <td><code>arb</code></td> 1450 </tr> 1451 <tr> 1452 <td>Standard Malay</td> 1453 <td><code>ms</code></td> 1454 <td><code>zsm</code></td> 1455 </tr> 1456 <tr> 1457 <td>Standard Swahili</td> 1458 <td><code>sw</code></td> 1459 <td><code>swh</code></td> 1460 </tr> 1461 <tr> 1462 <td>Standard Uzbek</td> 1463 <td><code>uz</code></td> 1464 <td><code>uzn</code></td> 1465 </tr> 1466 <tr> 1467 <td>Standard Konkani</td> 1468 <td><code>kok</code></td> 1469 <td><code>knn</code></td> 1470 </tr> 1471 <tr> 1472 <td>Northern Kurdish</td> 1473 <td><code>ku</code></td> 1474 <td><code>kmr</code></td> 1475 </tr> 1476 </table> 1477 <p> 1478 If a language subtag matches the type attribute of a languageAlias 1479 element, then the replacement value is used instead. For example, 1480 because "swh" occurs in 1481 <tt><languageAlias type="swh" replacement="sw"/></tt> 1482 , "sw" must be used instead of "swh". Thus Unicode language 1483 identifiers use "ar-EG" for Standard Arabic (Egypt), not 1484 "arb-EG"; they use "zh-TW" for Mandarin 1485 Chinese (Taiwan), not "cmn-TW". 1486 </p> 1487 <p> 1488 The private use codes listed as <strong>excluded</strong> 1489 in <em>Section 3.5.3 <a href="#Private_Use">Private Use Codes</a></em> 1490 will never be given specific semantics in Unicode identifiers, and 1491 are thus safe for use for other purposes by other applications. </p> 1492 <p>The CLDR provides data for normalizing language/locale 1493 codes, including mapping overlong codes like "eng-840" 1494 or "eng-USA" to the correct code "en-US"; 1495 see the 1496 <strong><a href="https://www.unicode.org/cldr/charts/latest/supplemental/aliases.html">Aliases</a></strong> 1497 Chart.</p> 1498 <p>The following are special language subtags:</p> 1499 <table class="simple" border="1" cellspacing="0" cellpadding="2"> 1500 <tr> 1501 <td> </td> 1502 <td><strong>Name</strong></td> 1503 <td><strong>Comment</strong></td> 1504 </tr> 1505 <tr> 1506 <td><code>mis</code></td> 1507 <td>Uncoded languages</td> 1508 <td>The content is in a language that doesn't yet have an ISO 639 code.</td> 1509 </tr> 1510 <tr> 1511 <td><code>mul</code></td> 1512 <td>Multiple languages</td> 1513 <td>The content contains more than one language or text that is simultaneously in multiple languages (such as brand names).</td> 1514 </tr> 1515 <tr> 1516 <td><code>zxx</code></td> 1517 <td>No linguistic content</td> 1518 <td>The content is not in any particular languages (such as images, symbols, etc.)</td> 1519 </tr> 1520 </table></td> 1521 </tr> 1522 <tr> 1523 <td><a href="#unicode_script_subtag_validity" 1524 name="unicode_script_subtag_validity">unicode_script_subtag</a> 1525 <p> 1526 (also known as a <i>Unicode script code)</i> 1527 </p></td> 1528 <td>Subtags in the script.xml file (see <em>Section 3.11 <a 1529 href="#Validity_Data">Validity Data</a></em>). These are based on [<a 1530 href="#BCP47">BCP47</a>] subtag values marked as <b>Type: 1531 script</b> 1532 <p>In most cases the script is not necessary, since the 1533 language is only customarily written in a single script. Examples 1534 of cases where it is used are:</p> 1535 <table border="1" cellspacing="0" cellpadding="2" 1536 style="margin: 0.5em"> 1537 <tr> 1538 <td><code>az_Arab</code></td> 1539 <td>Azerbaijani in Arabic script</td> 1540 </tr> 1541 <tr> 1542 <td><code>az_Cyrl</code></td> 1543 <td>Azerbaijani in Cyrillic script</td> 1544 </tr> 1545 <tr> 1546 <td><code>az_Latn</code></td> 1547 <td>Azerbaijani in Latin script</td> 1548 </tr> 1549 <tr> 1550 <td><code>zh_Hans</code></td> 1551 <td>Chinese, in simplified script (=zh, zh-Hans, zh-CN, 1552 zh-Hans-CN)</td> 1553 </tr> 1554 <tr> 1555 <td><code>zh_Hant</code></td> 1556 <td>Chinese, in traditional script</td> 1557 </tr> 1558 </table> 1559 <p> 1560 Unicode identifiers give specific semantics to certain Unicode Script values. For more information, see also [<a 1561 href="http://www.unicode.org/reports/tr41/#UAX24">UAX24</a>]: 1562 </p> 1563 <table cellspacing="0" cellpadding="2" border="1" 1564 style="margin: 0.5em"> 1565 <tr> 1566 <td><code>Qaag</code></td> 1567 <td>Zawgyi</td> 1568 <td colspan="2">Qaag is a special script code for identifying the non-standard use of Myanmar characters for display with the Zawgyi font. The purpose of the code is to enable migration to standard, interoperable use of Unicode by providing an identifier for Zawgyi for tagging text, applications, input methods, font tables, transformations, and other mechanisms used for migration.</td> 1569 </tr> 1570 <tr> 1571 <td><code>Qaai</code></td> 1572 <td>Inherited</td> 1573 <td colspan="2"><strong>deprecated</strong>: the <em>canonicalized</em> 1574 form is Zinh</td> 1575 </tr> 1576 <tr> 1577 <td><code>Zinh</code></td> 1578 <td>Inherited</td> 1579 <td colspan="2"> </td> 1580 </tr> 1581 <tr> 1582 <td><code>Zsye</code></td> 1583 <td>Emoji Style</td> 1584 <td colspan="2">Prefer emoji style for characters that have both text 1585 and emoji styles available.</td> 1586 </tr> 1587 <tr> 1588 <td><code>Zsym</code></td> 1589 <td>Text Style</td> 1590 <td colspan="2">Prefer text style for characters that have both text and 1591 emoji styles available.</td> 1592 </tr> 1593 <tr> 1594 <td rowspan="7"><code>Zxxx</code></td> 1595 <td rowspan="7">Unwritten</td> 1596 <td colspan="2">Indicates spoken or otherwise unwritten content. For example:</td> 1597 </tr> 1598 <tr> 1599 <th>Sample(s)</th> 1600 <th>Description</th> 1601 </tr> 1602 <tr> 1603 <td>uz</td> 1604 <td>either written or spoken content</td> 1605 </tr> 1606 <tr> 1607 <td>uz-Latn <em>or</em> uz-Arab</td> 1608 <td>written-only content (particular script)</td> 1609 </tr> 1610 <tr> 1611 <td>uz-Zyyy</td> 1612 <td>written-only content (unspecified script)</td> 1613 </tr> 1614 <tr> 1615 <td>uz-Zxxx</td> 1616 <td>spoken-only content</td> 1617 </tr> 1618 <tr> 1619 <td>uz-Latn, uz-Zxxx</td> 1620 <td>both specific written and spoken content (using a <em>language list</em>)</td> 1621 </tr> 1622 <tr> 1623 <td><code>Zyyy</code></td> 1624 <td>Common</td> 1625 <td colspan="2"> </td> 1626 </tr> 1627 <tr> 1628 <td><code>Zzzz</code></td> 1629 <td>Unknown</td> 1630 <td colspan="2"> </td> 1631 </tr> 1632 </table> 1633 <p>The private use subtags listed as <strong>excluded</strong> in <em>Section 3.5.3 <a href="#Private_Use">Private Use Codes</a></em> will never be given 1634 specific semantics in Unicode identifiers, and are thus safe for 1635 use for other purposes by other applications.</p></td> 1636 </tr> 1637 <tr> 1638 <td><a href="#unicode_region_subtag_validity" 1639 name="unicode_region_subtag_validity">unicode_region_subtag</a> 1640 <p> 1641 (also known as a <i>Unicode region code, </i>or<i> a Unicode 1642 territory code)</i> 1643 </p></td> 1644 <td>Subtags in the region.xml file (see<em> Section 3.11 <a 1645 href="#Validity_Data">Validity Data</a></em>). These are based on [<a 1646 href="#BCP47">BCP47</a>] subtag values marked as <b>Type: 1647 region</b> 1648 <p>Unicode identifiers give specific semantics to the following 1649 subtags:</p> 1650 <table border="1" cellspacing="0" cellpadding="2"> 1651 <tr> 1652 <td> </td> 1653 <td><strong>Name</strong></td> 1654 <td><strong>Comment</strong></td> 1655 <td><strong> ISO 3166-1 status</strong></td> 1656 </tr> 1657 <tr> 1658 <td><code>QO</code></td> 1659 <td>Outlying Oceania</td> 1660 <td>countries in Oceania [009] that do not have a <a 1661 href="http://www.unicode.org/cldr/charts/latest/supplemental/territory_containment_un_m_49.html">subcontinent</a>. 1662 </td> 1663 <td>private use</td> 1664 </tr> 1665 <tr> 1666 <td><code>QU</code></td> 1667 <td>European Union</td> 1668 <td><strong>deprecated</strong>: the <em>canonicalized</em> 1669 form is EU</td> 1670 <td>private use</td> 1671 </tr> 1672 <tr> 1673 <td><code>UK</code></td> 1674 <td>United Kingdom</td> 1675 <td><strong>deprecated</strong>: the <em>canonicalized</em> 1676 form is GB</td> 1677 <td>exceptionally reserved</td> 1678 </tr> 1679 <tr> 1680 <td><code>XA</code></td> 1681 <td>Pseudo-Accents</td> 1682 <td>special code indicating derived testing locale with English + added accents and lengthened</td> 1683 <td>private use</td> 1684 </tr> 1685 <tr> 1686 <td><code>XB</code></td> 1687 <td>Pseudo-Bidi</td> 1688 <td>special code indicating derived testing locale with forced RTL English</td> 1689 <td>private use</td> 1690 </tr> 1691 <tr> 1692 <td><code>XK</code></td> 1693 <td>Kosovo</td> 1694 <td>industry practice</td> 1695 <td>private use</td> 1696 </tr> 1697 <tr> 1698 <td><code>ZZ</code></td> 1699 <td>Unknown or Invalid Territory</td> 1700 <td>used in APIs or as replacement for invalid code</td> 1701 <td>private use</td> 1702 </tr> 1703 </table> 1704 <p>The private use subtags listed as <strong>excluded</strong> in <em>Section 3.5.3 <a href="#Private_Use">Private Use Codes</a></em> will normally never be 1705 given specific semantics in Unicode identifiers, and are thus safe 1706 for use for other purposes by other applications. However, LDML 1707 may follow widespread industry practice in the use of some of 1708 these codes, such as for XK.</p> 1709 <p>The CLDR provides data for normalizing territory/region 1710 codes, including mapping overlong codes like "eng-840" 1711 or "eng-USA" to the correct code "en-US".</p> 1712 <p>Special Codes:</p> 1713 <ul> 1714 <li>The territory code 'UK' has a special status in ISO, and 1715 is used for the domain name instead of GB. It is thus recognized 1716 by CLDR as being an alternate (unnormalized) form of 'GB'.</li> 1717 <li>The territory code '001' (the World) is used to indicate 1718 a standardized form, such as "ar-001" for Modern 1719 Standard Arabic.</li> 1720 </ul></td> 1721 </tr> 1722 <tr> 1723 <td><a href="#unicode_variant_subtag_validity" 1724 name="unicode_variant_subtag_validity">unicode_variant_subtag</a> 1725 <p> 1726 (also known as a <i>Unicode language variant code)</i> 1727 </p></td> 1728 <td>Subtags in the variant.xml file (see<em> Section 3.11 1729 <a href="#Validity_Data">Validity Data</a> 1730 </em>). These are based on [<a href="#BCP47">BCP47</a>] subtag values 1731 marked as <b>Type: variant</b> 1732 <p> 1733 CLDR provides data for normalizing variant codes. About handling 1734 of the "POSIX" variant see <i>Section 3.8.2, <a 1735 href="#Legacy_Variants">Legacy Variants</a></i>. 1736 </p></td> 1737 </tr> 1738 </table> 1739 <p> 1740 <i>Examples:</i> 1741 </p> 1742 <blockquote> 1743 <pre>en 1744fr_BE 1745zh-Hant-HK</pre> 1746 </blockquote> 1747 <p> 1748 <em>Deprecated</em> codes—such as QU above—are valid, but strongly 1749 discouraged. 1750 </p> 1751 <p> 1752 A locale that only has a language subtag (and optionally a script 1753 subtag) is called a <i>language locale</i>; one with both language 1754 and territory subtag is called a <i>territory locale</i> (or <i>country 1755 locale</i>). 1756 </p> 1757 <h3> 1758 <a name="Special_Codes" href="#Special_Codes">3.5 Special Codes</a> 1759 </h3> 1760 1761 <h4> 1762 <a name="Unknown_or_Invalid_Identifiers" 1763 href="#Unknown_or_Invalid_Identifiers">3.5.1 Unknown or Invalid 1764 Identifiers</a> 1765 </h4> 1766 <p>The following identifiers are used to indicate an unknown or 1767 invalid code in Unicode language and locale identifiers. For Unicode 1768 identifiers, the region code uses a private use ISO 3166 code, and 1769 Time Zone code uses an additional code; the others are defined by the 1770 relevant standards. When these codes are used in APIs connected with 1771 Unicode identifiers, the meaning is that either there was no 1772 identifier available, or that at some point an input identifier value 1773 was determined to be invalid or ill-formed.</p> 1774 <table border="1" cellspacing="0" cellpadding="4" 1775 style="margin-top: 0.5em; margin-bottom: 0.5em" id="table4"> 1776 <tr> 1777 <th>Code Type</th> 1778 <th>Value</th> 1779 <th>Description in Referenced Standards</th> 1780 </tr> 1781 <tr> 1782 <td>Language</td> 1783 <td><code>und</code></td> 1784 <td>Undetermined language, also used for “root”</td> 1785 </tr> 1786 <tr> 1787 <td>Script</td> 1788 <td><code>Zzzz</code></td> 1789 <td>Code for uncoded script, Unknown [<a 1790 href="http://www.unicode.org/reports/tr41/#UAX24">UAX24</a>] 1791 </td> 1792 </tr> 1793 <tr> 1794 <td>Region </td> 1795 <td><code>ZZ</code></td> 1796 <td>Unknown or Invalid Territory</td> 1797 </tr> 1798 <tr> 1799 <td>Currency</td> 1800 <td><code>XXX</code></td> 1801 <td>The codes assigned for transactions where no currency is 1802 involved</td> 1803 </tr> 1804 <tr> 1805 <td>Time Zone</td> 1806 <td><code>unk</code></td> 1807 <td>Unknown or Invalid Time Zone</td> 1808 </tr> 1809 <tr> 1810 <td>Subdivision</td> 1811 <td><em><region></em>zzzz</td> 1812 <td>Unknown or Invalid Subdivision</td> 1813 </tr> 1814 </table> 1815 <p>When only the script or region are known, then a locale ID will 1816 use "und" as the language subtag portion. Thus the locale 1817 tag "und_Grek" represents the Greek script; 1818 "und_US" represents the US territory.</p> 1819 <h4> 1820 <a name="Numeric_Codes" href="#Numeric_Codes">3.5.2 Numeric Codes</a> 1821 </h4> 1822 <p>For region codes, ISO and the UN establish a mapping to 1823 three-letter codes and numeric codes. However, this does not extend 1824 to the private use codes, which are the codes 900-999 (total: 100), 1825 and AAA, QMA-QZZ, XAA-XZZ, and ZZZ (total: 1092). Unicode identifiers 1826 supply a standard mapping to these: for the numeric codes, it uses 1827 the top of the numeric private use range; for the 3-letter codes it 1828 doubles the final letter. These are the resulting mappings for all of 1829 the private use region codes:</p> 1830 <table border="1" cellspacing="0" cellpadding="4" 1831 style="margin-top: 0.5em; margin-bottom: 0.5em" id="table19"> 1832 <tr> 1833 <th>Region</th> 1834 <th>UN/ISO Numeric</th> 1835 <th>ISO 3-Letter</th> 1836 </tr> 1837 <tr> 1838 <td><code>AA</code></td> 1839 <td><code>958</code></td> 1840 <td><code>AAA</code></td> 1841 </tr> 1842 <tr> 1843 <td><code>QM..QZ</code></td> 1844 <td><code>959..972</code></td> 1845 <td><code>QMM..QZZ</code></td> 1846 </tr> 1847 <tr> 1848 <td><code>XA..XZ</code></td> 1849 <td><code>973..998</code></td> 1850 <td><code>XAA..XZZ</code></td> 1851 </tr> 1852 <tr> 1853 <td><code>ZZ</code></td> 1854 <td><code>999</code></td> 1855 <td><code>ZZZ</code></td> 1856 </tr> 1857 </table> 1858 <p>For script codes, ISO 15924 supplies a mapping (however, the 1859 numeric codes are not in common use):</p> 1860 <table border="1" cellspacing="0" cellpadding="4" 1861 style="margin-top: 0.5em; margin-bottom: 0.5em" id="table21"> 1862 <tr> 1863 <th>Script</th> 1864 <th>Numeric</th> 1865 </tr> 1866 <tr> 1867 <td><code>Qaaa..Qabx</code></td> 1868 <td><code>900..949</code></td> 1869 </tr> 1870 </table> 1871 <br> 1872 <h4> 1873 3.5.3 <a name="Private_Use" href="#Private_Use">Private Use Codes</a> 1874 </h4> 1875 <p>Private use codes fall into three groups.</p> 1876 <ul> 1877 <li><strong>defined:</strong> those that are given particular 1878 semantics currently in CLDR</li> 1879 <li><strong>reserved:</strong> those that may be given 1880 particular semantics in future versions of CLDR</li> 1881 <li><strong>excluded:</strong> those that will never be given 1882 particular CLDR semantics in the future, and thus can normally be 1883 used by applications without worrying about collisions. However, 1884 CLDR may follow widespread industry practice in the use of some of 1885 these codes, such as for XA, XB, and XK.</li> 1886 </ul> 1887 <table> 1888 <caption> 1889 <a name="Private_Use_CLDR" href="#Private_Use_CLDR">Private Use 1890 Codes in CLDR</a> 1891 </caption> 1892 <tr> 1893 <th>category</th> 1894 <th>status</th> 1895 <th>codes</th> 1896 </tr> 1897 <tr> 1898 <td rowspan="3">base language</td> 1899 <td>defined</td> 1900 <td>none</td> 1901 </tr> 1902 <tr> 1903 <td>reserved</td> 1904 <td>qaa..qfy</td> 1905 </tr> 1906 <tr> 1907 <td>excluded</td> 1908 <td>qfz..qtz</td> 1909 </tr> 1910 <tr> 1911 <td rowspan="3">script</td> 1912 <td>defined</td> 1913 <td>Qaai (obsolete), Qaag</td> 1914 </tr> 1915 <tr> 1916 <td>reserved</td> 1917 <td>Qaaa..Qaaf Qaah Qaaj..Qaap</td> 1918 </tr> 1919 <tr> 1920 <td>excluded</td> 1921 <td>Qaaq..Qabx</td> 1922 </tr> 1923 <tr> 1924 <td rowspan="3">region</td> 1925 <td>defined</td> 1926 <td>QO, QU, UK, XA, XB, XK, ZZ</td> 1927 </tr> 1928 <tr> 1929 <td>reserved</td> 1930 <td>AA QM..QN QP..QT QV..QZ</td> 1931 </tr> 1932 <tr> 1933 <td>excluded</td> 1934 <td>XC..XJ, XL..XZ</td> 1935 </tr> 1936 <tr> 1937 <td rowspan="3">timezone</td> 1938 <td>defined</td> 1939 <td>IANA: Etc/Unknown<br> 1940 bcp47: as listed in bcp47/timezone.xml 1941 </td> 1942 </tr> 1943 <tr> 1944 <td>reserved</td> 1945 <td>bcp47: all non-5 letter codes not starting with x</td> 1946 </tr> 1947 <tr> 1948 <td>excluded</td> 1949 <td>bcp47: all non-5 letter codes starting with x</td> 1950 </tr> 1951 </table> 1952 <p> 1953 See also <em>Section 3.5.1 <a 1954 href="#Unknown_or_Invalid_Identifiers">Unknown or Invalid 1955 Identifiers</a></em>. 1956 </p> 1957 <p></p> 1958 <h3> 1959 <a name="Locale_Extension_Key_and_Type_Data"></a><a 1960 name="u_Extension" href="#u_Extension">3.6 Unicode BCP 47 U 1961 Extension</a> 1962 </h3> 1963 <p> 1964 [<a href="#BCP47">BCP47</a>] Language Tags provides a mechanism for 1965 extending language tags for use in various applications by extension 1966 subtags. Each extension subtag is identified by a single alphanumeric 1967 character subtag assigned by IANA. 1968 </p> 1969 <p> 1970 The Unicode Consortium has registered and is the maintaining 1971 authority for two BCP 47 language tag extensions: the extension 'u' 1972 for Unicode locale extension [<a href="#RFC6067">RFC6067</a>] and 1973 extension 't' for transformed content [<a href="#RFC6497">RFC6497</a>]. 1974 The Unicode BCP 47 extension data defines the complete list of valid 1975 subtags. 1976 </p> 1977 1978 <p> 1979 These subtags are all in lowercase (that is the canonical casing for 1980 these subtags), however, subtags are case-insensitive and casing does 1981 not carry any specific meaning. All subtags within the Unicode 1982 extensions are alphanumeric characters in length of two to eight that 1983 meet the rule 1984 <code>extension</code> 1985 in the [<a href="#BCP47">BCP47</a>] 1986 </p> 1987 <p> 1988 <strong>The -u- Extension.</strong> The syntax of 'u' extension 1989 subtags is defined by the rule 1990 <code>unicode_locale_extensions</code> 1991 in <a href="#Unicode_locale_identifier">Section 3.2 Unicode 1992 locale identifier</a>, except the separator of subtags 1993 <code>sep</code> 1994 must be always hyphen '-' when the extension is used as a part of BCP 1995 47 language tag. 1996 </p> 1997 <p> 1998 A 'u' extension may contain multiple 1999 <code>attribute</code> 2000 s or 2001 <code>keyword</code> 2002 s as defined in <a href="#Unicode_locale_identifier">Section 3.2 2003 Unicode locale identifier</a>. Although the order of 2004 <code>attribute</code> 2005 s or 2006 <code>keyword</code> 2007 s does not matter, this specification defines the canonical form as 2008 below: 2009 </p> 2010 <ul> 2011 <li>All attributes are sorted in alphabetical order.</li> 2012 <li>All keywords are sorted by alphabetical order of keys.</li> 2013 <li>All keywords are in lowercase.</li> 2014 <li>All keys and types use the canonical form (from the name 2015 attribute; see <a href="#Unicode_Locale_Extension_Data_Files">Section 2016 3.6.4 U Extension Data Files</a>). 2017 </li> 2018 <li>Type value "true" is removed.</li> 2019 </ul> 2020 <p>For example, the canonical form of 'u' extension 2021 "u-foo-bar-nu-thai-ca-buddhist-kk-true" is 2022 "u-bar-foo-ca-buddhist-kk-nu-thai". The attributes "foo" and "bar" in 2023 this example are provided only for illustration; no attribute subtags 2024 are defined by the current CLDR specification.</p> 2025 <p> 2026 <em>See also <a 2027 href="http://cldr.unicode.org/index/bcp47-extension"> Unicode 2028 Extensions for BCP 47</a> on the CLDR site. 2029 </em> 2030 </p> 2031 <h4> 2032 <a href="#Key_And_Type_Definitions_" name="Key_And_Type_Definitions_">3.6.1 2033 Key And Type Definitions</a> 2034 </h4> 2035 <p>The following chart contains a set of U extension key values 2036 that are currently available, with a description or sampling of the U 2037 extension type values. Each category is associated with an XML file 2038 in the bcp47 directory.</p> 2039 <p> 2040 For the complete list of valid keys and types defined for Unicode 2041 locale extensions, see <a href="#Unicode_Locale_Extension_Data_Files">Section 2042 3.6.4 U Extension Data Files</a>. For information on the process for 2043 adding new <i>key</i>/<i>type</i>, see [<a href="#localeProject">LocaleProject</a>]. 2044 </p> 2045 <p> 2046 Most type values are represented by a single subtag in the current 2047 version of CLDR. There are exceptions, such as types used for key 2048 "ca" (calendar) and "kr" (collation reordering). If the type is not 2049 included, then the type value "true" is assumed. Note that the 2050 default for key with a possible "true" value is often 2051 "false", but may not always be. Note also that 2052 "true"/"True" is not a valid script code, since <a 2053 href="http://www.unicode.org/iso15924/codelists.html">the ISO 2054 15924 Registration Authority has exceptionally reserved it</a>, which 2055 means that it will not be assigned for any purpose. 2056 </p> 2057 <p>The BCP 47 form for keys and types is the canonical form, and 2058 recommended. Other aliases are included for backwards compatibility. 2059 </p> 2060 <table> 2061 <caption> 2062 <a name="Key_Type_Definitions" href="#Key_Type_Definitions">Key/Type 2063 Definitions</a> 2064 </caption> 2065 <tr> 2066 <th>key<br> (old key name) 2067 </th> 2068 <th>key description</th> 2069 <th>example type<br> (old type name) 2070 </th> 2071 <th>type description</th> 2072 </tr> 2073 <tr> 2074 <td colspan="4"><strong>A <a 2075 href="#UnicodeCalendarIdentifier" name="UnicodeCalendarIdentifier">Unicode 2076 Calendar Identifier</a> defines a type of calendar. The valid values 2077 are those <em>name</em> attribute values in the <em>type</em> 2078 elements of key name="ca" in bcp47/<a target="_blank" 2079 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td> 2080 </tr> 2081 <tr> 2082 <td rowspan="10">"ca"<br> (calendar) 2083 </td> 2084 <td rowspan="10">Calendar algorithm<br> <br> <i>(For 2085 information on the calendar algorithms associated with the data 2086 used with these, see [<a href="#Calendars">Calendars</a>].) 2087 </i></td> 2088 <td>"buddhist"</td> 2089 <td>Thai Buddhist calendar (same as Gregorian except for the 2090 year)</td> 2091 </tr> 2092 <tr> 2093 <td>"chinese"</td> 2094 <td>Traditional Chinese calendar</td> 2095 </tr> 2096 <tr> 2097 <td colspan="2">…</td> 2098 </tr> 2099 <tr> 2100 <td>"gregory"<br> (gregorian) 2101 </td> 2102 <td>Gregorian calendar</td> 2103 </tr> 2104 <tr> 2105 <td colspan="2">…</td> 2106 </tr> 2107 <tr> 2108 <td>"islamic"</td> 2109 <td>Islamic calendar</td> 2110 </tr> 2111 <tr> 2112 <td>"islamic-civil"</td> 2113 <td>Islamic calendar, tabular (intercalary years 2114 [2,5,7,10,13,16,18,21,24,26,29] - civil epoch)</td> 2115 </tr> 2116 <tr> 2117 <td>"islamic-umalqura"</td> 2118 <td>Islamic calendar, Umm al-Qura</td> 2119 </tr> 2120 <tr> 2121 <td colspan="2">…</td> 2122 </tr> 2123 <tr> 2124 <td colspan="2"><b>Note:</b> <i>Some calendar types are 2125 represented by two subtags. In such cases, the first subtag 2126 specifies a generic calendar type and the second subtag specifies 2127 a calendar algorithm variant. The CLDR uses generic calendar types 2128 (single subtag types) for tagging data when calendar algorithm 2129 variations within a generic calendar type are irrelevant. For 2130 example, type "islamic" is used for specifying Islamic calendar 2131 formatting data for all Islamic calendar types, including 2132 "islamic-civil" and "islamic-umalqura".</i></td> 2133 </tr> 2134 2135 <tr> 2136 <td colspan="4"><strong>A <a 2137 href="#UnicodeCurrencyFormatIdentifier" 2138 name="UnicodeCurrencyFormatIdentifier">Unicode Currency Format 2139 Identifier</a> defines a style for currency formatting. The valid 2140 values are those <em>name</em> attribute values in the <em>type</em> 2141 elements of key name="cf" in bcp47/<a target="_blank" 2142 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/currency.xml">currency.xml</a></strong>.</td> 2143 </tr> 2144 <tr> 2145 <td rowspan="2">"cf"</td> 2146 <td rowspan="2">Currency Format style</td> 2147 <td>"standard"</td> 2148 <td>Negative numbers use the minusSign symbol (the default).</td> 2149 </tr> 2150 <tr> 2151 <td>"account"</td> 2152 <td>Negative numbers use parentheses or equivalent.</td> 2153 </tr> 2154 2155 <tr> 2156 <td colspan="4"><strong>A <a 2157 href="#UnicodeCollationIdentifier" 2158 name="UnicodeCollationIdentifier">Unicode Collation Identifier</a> 2159 defines a type of collation (sort order). The valid values are 2160 those <em>name</em> attribute values in the <em>type</em> elements 2161 of bcp47/<a target="_blank" 2162 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/collation.xml">collation.xml</a></strong>.</td> 2163 </tr> 2164 <tr> 2165 <td colspan="4"><i>For information on each collation 2166 setting parameter, from <strong>ka</strong> to <strong>vt</strong>, 2167 see <a href="tr35-collation.html#Setting_Options">Setting 2168 Options</a> 2169 </i></td> 2170 </tr> 2171 <tr> 2172 <td rowspan="9">"co"<br> (collation) 2173 </td> 2174 <td rowspan="9">Collation type</td> 2175 <td>"standard"</td> 2176 <td>The default ordering for each language. For root it is 2177 based on the [<a href="#DUCET">DUCET</a>] (Default Unicode 2178 Collation Element Table): see <em><a 2179 href="tr35-collation.html#Root_Collation">Root Collation</a></em>. Each 2180 other locale is based on that, except for appropriate modifications 2181 to certain characters for that language. 2182 </td> 2183 </tr> 2184 2185 <tr> 2186 <td>"search"</td> 2187 <td>A special collation type dedicated for string search—it is 2188 not used to determine the relative order of two strings, but only 2189 to determine whether they should be considered equivalent for the 2190 specified strength, using the string search matching rules 2191 appropriate for the language. Compared to the normal collator for 2192 the language, this may add or remove primary equivalences, may make 2193 additional characters ignorable or change secondary equivalences, 2194 and may modify contractions to allow matching within them, 2195 depending on the desired behavior. For example, in Czech, the 2196 distinction between ‘a’ and ‘á’ is secondary for normal collation, 2197 but primary for search; a search for ‘a’ should never match ‘á’ and 2198 vice versa. A search collator is normally used with strength set to 2199 PRIMARY or SECONDARY (should be SECONDARY if using “asymmetric” 2200 search as described in the [<a 2201 href="http://www.unicode.org/reports/tr41/#UTS10">UCA</a>] section 2202 Asymmetric Search). The search collator in root supplies matching 2203 rules that are appropriate for most languages (and which are 2204 different than the root collation behavior); language-specific 2205 search collators may be provided to override the matching rules for 2206 a given language as necessary. 2207 </td> 2208 </tr> 2209 <tr> 2210 <td colspan="2"><p> 2211 Other keywords provide additional choices for certain locales; <i>they 2212 only have effect in certain locales.</i> 2213 </p></td> 2214 </tr> 2215 <tr> 2216 <td colspan="2">…</td> 2217 </tr> 2218 <tr> 2219 <td>"phonetic"</td> 2220 <td>Requests a phonetic variant if available, where text is 2221 sorted based on pronunciation. It may interleave different scripts, 2222 if multiple scripts are in common use.</td> 2223 </tr> 2224 <tr> 2225 <td>"pinyin"</td> 2226 <td>Pinyin ordering for Latin and for CJK characters; that is, 2227 an ordering for CJK characters based on a character-by-character 2228 transliteration into a pinyin. (used in Chinese)</td> 2229 </tr> 2230 <tr> 2231 <td>"reformed"</td> 2232 <td>Reformed collation (such as in Swedish)</td> 2233 </tr> 2234 <tr> 2235 <td>"searchjl"</td> 2236 <td>Special collation type for a modified string search in 2237 which a pattern consisting of a sequence of Hangul initial 2238 consonants (jamo lead consonants) will match a sequence of Hangul 2239 syllable characters whose initial consonants match the pattern. The 2240 jamo lead consonants can be represented using conjoining or 2241 compatibility jamo. This search collator is best used at SECONDARY 2242 strength with an "asymmetric" search as described in the [<a 2243 href="http://www.unicode.org/reports/tr41/#UTS10">UCA</a>] section 2244 Asymmetric Search and obtained, for example, using ICU4C's usearch 2245 facility with attribute USEARCH_ELEMENT_COMPARISON set to value 2246 USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD; this ensures that a full 2247 Hangul syllable in the search pattern will only match the same 2248 syllable in the searched text (instead of matching any syllable 2249 with the same initial consonant), while a Hangul initial consonant 2250 in the search pattern will match any Hangul syllable in the 2251 searched text with the same initial consonant. 2252 </td> 2253 </tr> 2254 <tr> 2255 <td colspan="2">…</td> 2256 </tr> 2257 2258 <tr> 2259 <td colspan="4"><strong>A <a 2260 href="#UnicodeCurrencyIdentifier" name="UnicodeCurrencyIdentifier">Unicode 2261 Currency Identifier</a> defines a type of currency. The valid values 2262 are those <em>name</em> attribute values in the <em>type</em> 2263 elements of key name="cu" in bcp47/<a target="_blank" 2264 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/currency.xml">currency.xml</a>. 2265 </strong></td> 2266 </tr> 2267 <tr> 2268 <td>"cu"<br> (currency) 2269 </td> 2270 <td>Currency type</td> 2271 <td><i>ISO 4217 code,</i> 2272 <p> 2273 <i>plus others in common use</i> 2274 </p></td> 2275 <td><p> 2276 Codes consisting of 3 ASCII letters that are or have been valid in 2277 ISO 4217, plus certain additional codes that are or have been in 2278 common use. The list of countries and time periods associated with 2279 each currency value is available in <a 2280 href="tr35-numbers.html#Supplemental_Currency_Data">Supplemental 2281 Currency Data</a>, plus the default number of decimals. 2282 </p> 2283 <p> 2284 The XXX code is given a broader interpretation as <em>Unknown 2285 or Invalid Currency</em>. 2286 </p></td> 2287 </tr> 2288 2289 <tr> 2290 <td colspan="4"><strong>A <a 2291 href="#UnicodeEmojiPresentationStyleIdentifier" name="UnicodeEmojiPresentationStyleIdentifier">Unicode 2292 Emoji Presentation Style Identifier</a> specifies a request for 2293 the preferred emoji presentation style. This can be used as part of 2294 the value for an HTML lang attribute, for example 2295 <code><html lang="sr-Latn-u-em-emoji"></code>. 2296 The valid values are those <em>name</em> attribute values 2297 in the <em>type</em> elements of key name="em" in bcp47/<a 2298 target="_blank" 2299 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/variant.xml">variant.xml</a></strong>.</td> 2300 </tr> 2301 <tr> 2302 <td rowspan="3">"em"</td> 2303 <td rowspan="3">Emoji presentation style</td> 2304 <td>"emoji"</td> 2305 <td>Use an emoji presentation for emoji characters if possible.</td> 2306 </tr> 2307 <tr> 2308 <td>"text"</td> 2309 <td>Use a text presentation for emoji characters if possible.</td> 2310 </tr> 2311 <tr> 2312 <td>"default"</td> 2313 <td>Use the default presentation for emoji characters as specified in UTR #51 Section 4, 2314 <a href="http://www.unicode.org/reports/tr51/#Presentation_Style">Presentation Style</a>.</td> 2315 </tr> 2316 2317 <tr> 2318 <td colspan="4"><strong>A <a 2319 href="#UnicodeFirstDayIdentifier" name="UnicodeFirstDayIdentifier">Unicode 2320 First Day Identifier</a> defines the preferred first day of the week 2321 for calendar display. Specifying "fw" in a locale identifier 2322 overrides the default value specified by supplemental week data 2323 (see Part 4 Dates, section 4.3 <a href="tr35-dates.html#Week_Data">Week 2324 Data</a>). The valid values are those <em>name</em> attribute values 2325 in the <em>type</em> elements of key name="fw" in bcp47/<a 2326 target="_blank" 2327 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td> 2328 </tr> 2329 <tr> 2330 <td rowspan="4">"fw"</td> 2331 <td rowspan="4">First day of week</td> 2332 <td>"sun"</td> 2333 <td>Sunday</td> 2334 </tr> 2335 <tr> 2336 <td>"mon"</td> 2337 <td>Monday</td> 2338 </tr> 2339 <tr> 2340 <td colspan="2">…</td> 2341 </tr> 2342 <tr> 2343 <td>"sat"</td> 2344 <td>Saturday</td> 2345 </tr> 2346 2347 <tr> 2348 <td colspan="4"><strong>A <a 2349 href="#UnicodeHourCycleIdentifier" 2350 name="UnicodeHourCycleIdentifier">Unicode Hour Cycle 2351 Identifier</a> defines the preferred time cycle. Specifying "hc" in a 2352 locale identifier overrides the the default value specified by 2353 supplemental time data (see Part 4 Dates, section 4.4 <a 2354 href="tr35-dates.html#Time_Data">Time Data</a>). The valid values 2355 are those <em>name</em> attribute values in the <em>type</em> 2356 elements of key name="hc" in bcp47/<a target="_blank" 2357 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td> 2358 </tr> 2359 <tr> 2360 <td rowspan="4">"hc"</td> 2361 <td rowspan="4">Hour cycle</td> 2362 <td>"h12"</td> 2363 <td>Hour system using 1–12; corresponds to 'h' in patterns</td> 2364 </tr> 2365 <tr> 2366 <td>"h23"</td> 2367 <td>Hour system using 0–23; corresponds to 'H' in patterns</td> 2368 </tr> 2369 <tr> 2370 <td>"h11"</td> 2371 <td>Hour system using 0–11; corresponds to 'K' in patterns</td> 2372 </tr> 2373 <tr> 2374 <td>"h24"</td> 2375 <td>Hour system using 1–24; corresponds to 'k' in pattern</td> 2376 </tr> 2377 2378 <tr> 2379 <td colspan="4"><strong>A <a 2380 href="#UnicodeLineBreakStyleIdentifier" 2381 name="UnicodeLineBreakStyleIdentifier">Unicode Line Break 2382 Style Identifier</a> defines a preferred line break style 2383 corresponding to the CSS level 3 <a 2384 href="https://drafts.csswg.org/css-text/#line-break-property">line-break 2385 option</a>. Specifying "lb" in a locale identifier overrides the 2386 locale‘s default style (which may correspond to "normal" or 2387 "strict"). The valid values are those <em>name</em> attribute 2388 values in the <em>type</em> elements of key name="lb" in bcp47/<a 2389 target="_blank" 2390 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td> 2391 </tr> 2392 <tr> 2393 <td rowspan="3">"lb"</td> 2394 <td rowspan="3">Line break style</td> 2395 <td>"strict"</td> 2396 <td>CSS level 3 line-break=strict, e.g. treat CJ as NS</td> 2397 </tr> 2398 <tr> 2399 <td>"normal"</td> 2400 <td>CSS level 3 line-break=normal, e.g. treat CJ as ID, break 2401 before hyphens for ja,zh</td> 2402 </tr> 2403 <tr> 2404 <td>"loose"</td> 2405 <td>CSS lev 3 line-break=loose</td> 2406 </tr> 2407 2408 <tr> 2409 <td colspan="4"><strong>A <a 2410 href="#UnicodeLineBreakWordIdentifier" 2411 name="UnicodeLineBreakWordIdentifier">Unicode Line Break Word 2412 Identifier</a> defines preferred line break word handling behavior 2413 corresponding to the CSS level 3 <a 2414 href="https://drafts.csswg.org/css-text/#word-break-property">word-break 2415 option</a>. The valid values are those <em>name</em> attribute values 2416 in the <em>type</em> elements of key name="lw" in bcp47/<a 2417 target="_blank" 2418 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td> 2419 </tr> 2420 <tr> 2421 <td rowspan="3">"lw"</td> 2422 <td rowspan="3">Line break word handling</td> 2423 <td>"normal"</td> 2424 <td>CSS level 3 word-break=normal, normal script/language 2425 behavior for midword breaks</td> 2426 </tr> 2427 <tr> 2428 <td>"breakall"</td> 2429 <td>CSS level 3 word-break=break-all, allow midword breaks 2430 unless forbidden by lb setting</td> 2431 </tr> 2432 <tr> 2433 <td>"keepall"</td> 2434 <td>CSS level 3 word-break=keep-all, prohibit midword breaks 2435 except for dictionary breaks</td> 2436 </tr> 2437 2438 <tr> 2439 <td colspan="4"><strong>A <a 2440 href="#UnicodeMeasurementSystemIdentifier" 2441 name="UnicodeMeasurementSystemIdentifier">Unicode Measurement 2442 System Identifier</a> defines a preferred measurement system. 2443 Specifying "ms" in a locale identifier overrides the default value 2444 specified by supplemental measurement system data (see Part 2 2445 General, section 5 <a 2446 href="tr35-general.html#Measurement_System_Data">Measurement 2447 System Data</a>). The valid values are those <em>name</em> attribute 2448 values in the <em>type</em> elements of key name="ms" in bcp47/<a 2449 target="_blank" 2450 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/measure.xml">measure.xml</a></strong>.</td> 2451 </tr> 2452 <tr> 2453 <td rowspan="3">"ms"</td> 2454 <td rowspan="3">Measurement system</td> 2455 <td>"metric"</td> 2456 <td>Metric System</td> 2457 </tr> 2458 <tr> 2459 <td>"ussystem"</td> 2460 <td>US System of measurement: feet, pints, etc.; pints are 16oz</td> 2461 </tr> 2462 <tr> 2463 <td>"uksystem"</td> 2464 <td>UK System of measurement: feet, pints, etc.; pints are 20oz</td> 2465 </tr> 2466 2467 <tr> 2468 <td colspan="4"><strong>A <a 2469 href="#UnicodeNumberSystemIdentifier" 2470 name="UnicodeNumberSystemIdentifier">Unicode Number System 2471 Identifier</a> defines a type of number system. The valid values are 2472 those <em>name</em> attribute values in the <em>type</em> elements 2473 of bcp47/<a target="_blank" 2474 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/number.xml">number.xml</a>. 2475 </strong></td> 2476 </tr> 2477 <tr> 2478 <td rowspan="7">"nu"<br> (numbers) 2479 </td> 2480 <td rowspan="7">Numbering system</td> 2481 <td><i>Unicode script subtag</i></td> 2482 <td><p> 2483 Four-letter types indicating the primary numbering system for the 2484 corresponding script represented in Unicode. Unless otherwise 2485 specified, it is a decimal numbering system using digits 2486 [:GeneralCategory=Nd:]. For example, "latn" refers to 2487 the ASCII / Western digits 0-9, while "taml" is an 2488 algorithmic (non-decimal) numbering system. (The code "tamldec" is 2489 indicates the "modern Tamil decimal digits".)<br> 2490 </p> 2491 <p class="note"> 2492 For more information, see <a 2493 href="tr35-numbers.html#Numbering_Systems">Numbering Systems</a>. 2494 </p></td> 2495 </tr> 2496 <tr> 2497 <td>"arabext"</td> 2498 <td>Extended Arabic-Indic digits ("arab" means the base 2499 Arabic-Indic digits)</td> 2500 </tr> 2501 <tr> 2502 <td>"armnlow"</td> 2503 <td>Armenian lowercase numerals</td> 2504 </tr> 2505 <tr> 2506 <td colspan="2">…</td> 2507 </tr> 2508 <tr> 2509 <td>"roman"</td> 2510 <td>Roman numerals</td> 2511 </tr> 2512 <tr> 2513 <td>"romanlow"</td> 2514 <td>Roman lowercase numerals</td> 2515 </tr> 2516 <tr> 2517 <td>"tamldec"</td> 2518 <td>Modern Tamil decimal digits</td> 2519 </tr> 2520 2521 <tr> 2522 <td colspan="4"><strong>A <a href="#RegionOverride" 2523 name="RegionOverride">Region Override</a> specifies an alternate 2524 region to use for obtaining certain region-specific default values 2525 (those specified by the <a href="tr35-info.html#rgScope"><rgScope></a> 2526 element), instead of using the region specified by the <a 2527 href="#unicode_region_subtag">unicode_region_subtag</a> in the 2528 Unicode Language Identifier (or inferred from the <a 2529 href="#unicode_language_subtag">unicode_language_subtag</a>). 2530 </strong></td> 2531 </tr> 2532 <tr> 2533 <td rowspan="2">"rg"</td> 2534 <td rowspan="2">Region Override</td> 2535 <td>"uszzzz"<br> <br></td> 2536 <td rowspan="2">The value is a <a href="#unicode_region_subtag">unicode_region_subtag</a> 2537 for a regular region (not a macroregion), suffixed by "ZZZZ" (case 2538 is not significant). For example, “en-GB-u-rg-uszzzz” represents a 2539 locale for British English but with region-specific defaults set to 2540 US for items such as default currency, default calendar and week 2541 data, default time cycle, and default measurement system and unit 2542 preferences. 2543 </td> 2544 </tr> 2545 <tr> 2546 <td>…</td> 2547 </tr> 2548 2549 <tr> 2550 <td colspan="4"><strong>A <a 2551 name="unicode_subdivision_subtag_validity"></a><a 2552 href="#UnicodeSubdivisionIdentifier" 2553 name="UnicodeSubdivisionIdentifier">Unicode Subdivision 2554 Identifier</a> defines a regional subdivision used for locales. The 2555 valid values are based on the <em>subdivisionContainment</em> 2556 element as described in <em>Section <a 2557 href="#Unicode_Subdivision_Codes">3.6.5 Subdivision Codes</a></em>. 2558 </strong></td> 2559 </tr> 2560 <tr> 2561 <td rowspan="2">"sd"</td> 2562 <td rowspan="2">Regional Subdivision</td> 2563 <td>"gbsct"<br> <br></td> 2564 <td rowspan="2">A <a href="#unicode_subdivision_id">unicode_subdivision_id</a>, which is 2565 a <a href="#unicode_region_subtag">unicode_region_subtag</a>concatenated 2566 with a unicode_subdivision_suffix.<br> For example, <em>gbsct</em> is “gb”+“sct” (where sct 2567 represents the subdivision code for Scotland). Thus 2568 “en-GB-u-sd-gbsct” represents the language variant “English as used 2569 in Scotland”. And both “en-u-sd-usca” and “en-US-u-sd-usca” 2570 represent “English as used in California”. See 2571 <strong><em><a href="#Unicode_Subdivision_Codes">3.6.5 2572 Subdivision Codes</a></em></strong>. 2573 </td> 2574 </tr> 2575 <tr> 2576 <td>…</td> 2577 </tr> 2578 2579 <tr> 2580 <td colspan="4"><strong>A <a 2581 href="#UnicodeSentenceBreakSuppressionsIdentifier" 2582 name="UnicodeSentenceBreakSuppressionsIdentifier">Unicode 2583 Sentence Break Suppressions Identifier</a> defines a set of data to 2584 be used for suppressing certain sentence breaks that would 2585 otherwise be found by UAX #14 rules. The valid values are those <em>name</em> 2586 attribute values in the <em>type</em> elements of key name="ss" in 2587 bcp47/<a target="_blank" 2588 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td> 2589 </tr> 2590 <tr> 2591 <td rowspan="2">"ss"</td> 2592 <td rowspan="2">Sentence break suppressions</td> 2593 <td>"none"</td> 2594 <td>Don’t use sentence break suppressions data (the default).</td> 2595 </tr> 2596 <tr> 2597 <td>"standard"</td> 2598 <td>Use sentence break suppressions data of type "standard"</td> 2599 </tr> 2600 2601 <tr> 2602 <td colspan="4"><strong>A <a 2603 href="#UnicodeTimezoneIdentifier" name="UnicodeTimezoneIdentifier">Unicode 2604 Timezone Identifier</a> defines a timezone. The valid values are 2605 those name attribute values in the <em>type</em> elements of 2606 bcp47/<a target="_blank" 2607 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/timezone.xml">timezone.xml</a>. 2608 </strong></td> 2609 </tr> 2610 <tr> 2611 <td>"tz"<br> (timezone) 2612 </td> 2613 <td>Time zone</td> 2614 <td><i>Unicode short time zone IDs</i></td> 2615 <td><p> 2616 Short identifiers defined in terms of a TZ time zone database [<a 2617 href="#Olson">Olson</a>] identifier in the file 2618 common/bcp47/timezone.xml file, plus a few extra values. 2619 </p> 2620 <p> 2621 For more information, see <a href="#Time_Zone_Identifiers">Section 2622 3.7.1.2 Time Zone Identifiers</a>. 2623 </p> 2624 <p>CLDR provides data for normalizing timezone codes.</p></td> 2625 </tr> 2626 <tr> 2627 <td colspan="4"><strong>A <a 2628 href="#UnicodeVariantIdentifier" name="UnicodeVariantIdentifier">Unicode 2629 Variant Identifier</a> defines a special variant used for locales. 2630 The valid values are those name attribute values in the <em>type</em> 2631 elements of bcp47/<a target="_blank" 2632 href="http://www.unicode.org/repos/cldr/tags/latest/common/bcp47/variant.xml">variant.xml</a>. 2633 </strong></td> 2634 </tr> 2635 <tr> 2636 <td>"va"</td> 2637 <td>Common variant type</td> 2638 <td>"posix"</td> 2639 <td>POSIX style locale variant. About handling of the "POSIX" 2640 variant see <i>Section 3.8.2, <a href="#Legacy_Variants">Legacy 2641 Variants</a></i>. 2642 </td> 2643 </tr> 2644 </table> 2645 <p> 2646 For more information on the allowed keys and types, see the specific 2647 elements below, and <a href="#Unicode_Locale_Extension_Data_Files">Section 2648 3.6.4 U Extension Data Files</a>. 2649 </p> 2650 <p>Additional keys or types might be added in future versions. 2651 Implementations of LDML should be robust to handle any syntactically 2652 valid key or type values.</p> 2653 <h4> 2654 <a href="#Numbering System Data" name="Numbering System Data">3.6.2 2655 Numbering System Data </a> 2656 </h4> 2657 <p> 2658 LDML supports multiple numbering systems. The identifiers for those 2659 numbering systems are defined in the file <strong>bcp47/number.xml</strong>. 2660 For example, for the 'trunk' version of the data see <a 2661 href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/number.xml">bcp47/number.xml</a>.<br> 2662 </p> 2663 <p> 2664 Details about those numbering systems are defined in <strong>supplemental/numberingSystems.xml</strong>. 2665 For example, for the 'trunk' version of the data see <a 2666 href="http://unicode.org/repos/cldr/tags/latest/common/supplemental/numberingSystems.xml">supplemental/numberingSystems.xml</a>.<br> 2667 </p> 2668 <p> 2669 LDML makes certain stability guarantees on this data: <br> 2670 </p> 2671 <ol> 2672 <li>Like other BCP 47 identifiers, once a numeric identifier is 2673 added to <strong>bcp47/number.xml</strong> or <strong>numberingSystems.xml</strong>, 2674 it will never be removed from either of those files. 2675 </li> 2676 <li>If an identifier has type="numeric" in numberingSystems.xml, 2677 then 2678 <ol> 2679 <li>It is a decimal, positional numbering system with an 2680 attribute digits=X, where X is a string with the 10 digits in 2681 order used by the numbering system.</li> 2682 <li>The values of the type and digits will never change.</li> 2683 </ol> 2684 </li> 2685 </ol> 2686 <h4> 2687 <a href="#Time_Zone_Identifiers" name="Time_Zone_Identifiers">3.6.3 2688 Time Zone Identifiers</a> 2689 </h4> 2690 <p> 2691 LDML inherits time zone IDs from the tz database [<a href="#Olson">Olson</a>]. 2692 Because these IDs from the tz database do not satisfy the BCP 47 2693 language subtag syntax requirements, CLDR defines short identifiers 2694 for the use in the Unicode locale extension. The short identifiers 2695 are defined in the file <strong>common/bcp47/timezone.xml</strong>. 2696 </p> 2697 <p> 2698 The short identifiers use UN/LOCODE [<a href="#LOCODE">LOCODE</a>] 2699 (excluding a space character) codes where possible. For example, the 2700 short identifier for "America/Los_Angeles" is "uslax" (the LOCODE for 2701 Los Angeles, US is "US LAX"). Identifiers of length not equal to 5 2702 are used where there is no corresponding UN/LOCODE, such as 2703 "usnavajo" for "America/Shiprock", or "utcw01" for "Etc/GMT+1", so 2704 that they do not overlap with future UN/LOCODE. 2705 </p> 2706 <p>Although the first two letters of a short identifier may match 2707 an ISO 3166 two-letter country code, a user should not assume that 2708 the time zone belongs to the country. The first two letters in an 2709 identifier of length not equal to 5 has no meaning. Also, the 2710 identifiers are stabilized, meaning that they will not change no 2711 matter what changes happen in the base standard. So if Hawaii leaves 2712 the US and joins Canada as a new province, the short time zone 2713 identifier "ushnl" would not change in CLDR even if the UN/LOCODE 2714 changes to "cahnl" or something else.</p> 2715 <p>There is a special code "unk" for an Unknown or Invalid time 2716 zone. This can be expressed in the tz database style ID 2717 "Etc/Unknown", although it is not defined in the tz database.</p> 2718 <p> 2719 <b>Stability of Time Zone Identifiers</b> 2720 </p> 2721 <p> 2722 Although the short time zone identifiers are guaranteed to be stable, 2723 the preferred IDs in the tz database (as those found in <strong>zone.tab</strong> 2724 file) might be changed time to time. For example, "Asia/Culcutta" was 2725 replaced with "Asia/Kolkata" and moved to <strong>backward</strong> 2726 file in the tz database. CLDR contains locale data using a time zone 2727 ID from the tz database as the key, stability of the IDs is cirtical. 2728 </p> 2729 <p> 2730 To maintain the stability of "long" IDs (for those inherited from the 2731 tz database), a special rule applied to the <i>alias</i> attribute in 2732 the <type> element for "tz" - the first "long" ID is the CLDR 2733 canonical "long" time zone ID. 2734 </p> 2735 <p>For example:</p> 2736 <blockquote><type name="inccu" alias="Asia/Calcutta 2737 Asia/Kolkata" description="Kolkata, India"/></blockquote> 2738 <p> 2739 Above <type> element defines the short time zone ID "inccu" 2740 (for the use in the Unicode locale extension), corresponding <em>CLDR 2741 canonical "long" ID</em> "Asia/Culcutta", and an alias "Asia/Kolkata". 2742 </p> 2743 <h4> 2744 <a href="#Unicode_Locale_Extension_Data_Files" 2745 name="Unicode_Locale_Extension_Data_Files">3.6.4 U Extension 2746 Data Files</a> 2747 </h4> 2748 <p> 2749 The 'u' extension data is stored in multiple XML files located under 2750 common/bcp47 directory in CLDR. Each file contains the locale 2751 extension key/type values and their backward compatibility mappings 2752 appropriate for a particular domain. <a 2753 href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/collation.xml">common/bcp47/collation.xml</a> 2754 contains key/type values for collation, including optional collation 2755 parameters and valid type values for each key. 2756 </p> 2757 <p> 2758 The 't' extension data is stored in <a 2759 href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform.xml">common/bcp47/transform.xml</a>. 2760 </p> 2761 <p class="dtd"><!ELEMENT keyword ( key* )></p> 2762 <p class="dtd"> 2763 <!ELEMENT key ( type* )><br> <!ATTLIST key extension 2764 NMTOKEN #IMPLIED><br> <!ATTLIST key name NMTOKEN 2765 #REQUIRED><br> <!ATTLIST key description CDATA 2766 #IMPLIED><br> <!ATTLIST key deprecated ( true | false ) 2767 "false"><br> <!ATTLIST key preferred NMTOKEN #IMPLIED><br> 2768 <!ATTLIST key alias NMTOKEN #IMPLIED><br> <!ATTLIST key valueType (single | multiple 2769 | incremental | any) #IMPLIED ><br> <!ATTLIST key since 2770 CDATA #IMPLIED> 2771 </p> 2772 <p class="dtd"> 2773 <!ELEMENT type EMPTY><br> <!ATTLIST type name NMTOKEN 2774 #REQUIRED><br> <!ATTLIST type description CDATA 2775 #IMPLIED><br> <!ATTLIST type deprecated ( true | false ) 2776 "false"><br> <!ATTLIST type preferred NMTOKEN #IMPLIED><br> 2777 <!ATTLIST type alias CDATA #IMPLIED><br> <!ATTLIST type 2778 since CDATA #IMPLIED> 2779 </p> 2780 <p class="dtd"> 2781 <!ELEMENT attribute EMPTY><br> <!ATTLIST attribute name 2782 NMTOKEN #REQUIRED><br> <!ATTLIST attribute description 2783 CDATA #IMPLIED><br> <!ATTLIST attribute deprecated ( true 2784 | false ) "false"><br> <!ATTLIST attribute preferred 2785 NMTOKEN #IMPLIED><br> <!ATTLIST attribute since CDATA 2786 #IMPLIED> 2787 </p> 2788 <p>The extension attribute in <key> element specifies the 2789 BCP 47 language tag extension type. The default value of the 2790 extension attribute is "u" (Unicode locale extension). The 2791 <type> element is only applicable to the enclosing <key>. 2792 </p> 2793 <p> 2794 In the Unicode locale extension 'u' and 2795 't' data files, the common attributes for the <key>, 2796 <type> and <attribute> elements are as follows: 2797 </p> 2798 <dl> 2799 <dt> 2800 <b>name</b> 2801 </dt> 2802 <dd> 2803 <p> 2804 The key or type name used by Unicode locale extension with <a 2805 href="#Unicode_locale_identifier">'u' extension syntax</a> or the 't' extensions syntax. When <i>alias</i> 2806 below is absent, this name can be also used with the old style <a 2807 href="#Old_Locale_Extension_Syntax"> "@key=type" syntax</a>. 2808 </p> 2809 <p> 2810 Most type names are <strong>literal type names</strong>, which 2811 match exactly the same value. All of these have at least one 2812 lowercase letter, such as "buddhist". There are a small 2813 number of <strong>indirect type names</strong>, such as 2814 "RG_KEY_VALUE". These have no lowercase letters. The 2815 interpretation of each one is listed below. 2816 </p> 2817 <h5> 2818 <a name="CODEPOINTS" href="#CODEPOINTS">CODEPOINTS</a> 2819 </h5> 2820 <p> 2821 The type name <strong>"CODEPOINTS"</strong> is reserved for a 2822 variable representing Unicode code point(s). The syntax is: 2823 </p> 2824 <table border="0"> 2825 <tr> 2826 <th> </th> 2827 <th><div align="center">EBNF</div></th> 2828 <th><div align="center">ABNF</div></th> 2829 </tr> 2830 <tr> 2831 <td><pre>codepoints</pre></td> 2832 <td><pre>= codepoint (sep codepoint)?</pre></td> 2833 <td><pre>= codepoint *(sep codepoint)</pre></td> 2834 </tr> 2835 <tr> 2836 <td><pre>codepoint</pre></td> 2837 <td><pre>= [0-9 A-F a-f]{4,6}</pre></td> 2838 <td><pre>= 4*6HEXDIG</pre></td> 2839 </tr> 2840 </table> 2841 <p>In addition, no codepoint may exceed 10FFFF. For example, 2842 "00A0", "300b", "10D40C" and "00C1-00E1" are valid, but "A0", 2843 "U060C" and "110000" are not.</p> 2844 <p>In the current version of CLDR, the type "CODEPOINTS" is only 2845 used for the deprecated locale extension key "vt" (variableTop). 2846 The subtags forming the type for "vt" represent an arbitrary string 2847 of characters. There is no formal limit in the number of 2848 characters, although practically anything above 1 will be rare, and 2849 anything longer than 4 might be useless. Repetition is allowed, for 2850 example, 0061-0061 ("aa") is a Valid type value for "vt", since the 2851 sequence may be a collating element. Order is vital: 0061-0062 2852 ("ab") is different than 0062-0061 ("ba"). Note that for 2853 variableTop any character sequence must be a contraction which 2854 yields exactly one primary weight.</p> 2855 <p>For example,</p> 2856 <blockquote> 2857 <p> 2858 <strong>en-u-vt-00A4</strong> : this indicates English, with any 2859 characters sorting at or below " ¤" (at a primary level) 2860 considered Variable. 2861 </p> 2862 </blockquote> 2863 <p> 2864 By default in UCA, variable characters are ignored in sorting at a 2865 primary, secondary, and tertiary level. But in CLDR, they are not 2866 ignorable by default. For more information, see <a 2867 href="tr35-collation.html#Setting_Options">Collation: Section 2868 3.3 <em>Setting Options</em> 2869 </a>. 2870 </p> 2871 2872 <h5> 2873 <a name="REORDER_CODE" href="#REORDER_CODE">REORDER_CODE</a> 2874 </h5> 2875 <p> 2876 The type name <strong>"REORDER_CODE"</strong> is reserved for 2877 reordering block names (e.g. "latn", "digit" and "others") defined 2878 in the <i><a href="tr35-collation.html#Root_Collation">Root 2879 Collation</a></i>. The type "REORDER_CODE" is used for locale extension 2880 key "kr" (colReorder). The value of type for "kr" is represented by 2881 one or more reordering block names such as "latn-digit". For more 2882 information, see <a href="tr35-collation.html#Script_Reordering">Collation: 2883 Section 3.12 <em>Collation Reordering</em> 2884 </a>. 2885 </p> 2886 <h5> 2887 <a name="RG_KEY_VALUE" href="#RG_KEY_VALUE">RG_KEY_VALUE</a> 2888 </h5> 2889 <p> 2890 The type name <strong>"RG_KEY_VALUE"</strong> is reserved for 2891 region codes in the format required by the "rg" key; this is a 2892 region code from the idValidity data in common/validity/region.xml 2893 (with certain exclusions, listed below) followed by "zzzz". The 2894 excluded region codes are those with idStatus='unknown' and 2895 'macroregion'; region codes with idStatus='deprecated' should not 2896 be generated, and those with idStatus='private_use' are only to be 2897 used with prior agreement. Thus the value for the "rg" key will 2898 normally be a region code with idStatus='regular' followed by 2899 "zzzz"; this set of values is the same as the subdivision codes 2900 with idStatus='unknown' from the idValidity data in 2901 common/validity/subdivision.xml. 2902 </p> 2903 <h5> 2904 <a name="SUBDIVISION_CODE" href="#SUBDIVISION_CODE">SUBDIVISION_CODE</a> 2905 </h5> 2906 <p> 2907 The type name <strong>"SUBDIVISION_CODE"</strong> is reserved for 2908 subdivision codes in the format required by the "sd" key; this is a 2909 subdivision code from the idValidity data in 2910 common/validity/subdivision.xml, excluding those with 2911 idStatus='unknown'. Codes with idStatus='deprecated' should not be 2912 generated, and those with idStatus='private_use' are only to be 2913 used with prior agreement. 2914 </p> 2915 <h5> 2916 <a name="PRIVATE_USE" href="#PRIVATE_USE">PRIVATE_USE</a> 2917 </h5> 2918 <p> 2919 The type name <strong>"PRIVATE_USE"</strong> is reserved for 2920 private use types. A valid type value is composed of one or more 2921 subtags separated by hyphens and each subtag consists of three to 2922 eight ASCII alphanumeric characters. In the current version of 2923 CLDR, <strong>"PRIVATE_USE"</strong> is only used for transform 2924 extension "x0". 2925 </p> 2926 2927 </dd> 2928 <dt> 2929 <b>valueType</b> 2930 </dt> 2931 <dd> 2932 <p>The valueType attribute indicates how many 2933 subtags are valid for a given key:</p> 2934 <table class='simple' width="100%" border="1"> 2935 <tbody> 2936 <tr> 2937 <th>single</th> 2938 <td>Either exactly one type value, or no type value (but only if the value of "true" would be valid). This is the default 2939 if no valueType attribute is present.</td> 2940 </tr> 2941 <tr> 2942 <th>incremental</th> 2943 <td>Multiple type values are allowed, but only if a prefix 2944 is also present, and the sequence is explicitly listed. Each 2945 successive type value indicates a refinement of its prefix. For 2946 example:<br> <key name="ca" 2947 description="Calendar algorithm key"<strong> 2948 valueType="incremental"</strong>> <br> <type 2949 name="islamic" description="Islamic 2950 calendar"/><br> <type 2951 name="islamic-umalqura" description="Islamic 2952 calendar, Umm al-Qura"/><br> Thus <em>ca-islamic-umalqura</em> 2953 is valid. However, <em>ca-gregory-japanese</em> is not valid, 2954 because "gregory-japanese" is not listed as a type. 2955 </td> 2956 </tr> 2957 <tr> 2958 <th>multiple</th> 2959 <td>Multiple type values are allowed, but each may only 2960 occur once. For example:<br><key name="kr" 2961 description="Collation reorder codes" <strong>valueType="multiple"</strong>><br> 2962 <type name="REORDER_CODE" …/> 2963 </td> 2964 </tr> 2965 <tr> 2966 <th>any</th> 2967 <td>Any number of type values are allowed, with none of the 2968 above restrictions. For example:<br> <key 2969 extension="t" name="x0"<strong> </strong>description="Private 2970 use transform type key."<strong> 2971 valueType="any"</strong>><br> <type 2972 name="PRIVATE_USE" …/> 2973 </td> 2974 </tr> 2975 </tbody> 2976 </table> 2977 </dd> 2978 <dt> 2979 <b>description</b> 2980 </dt> 2981 <dd> 2982 <p> 2983 The description of the key, type or attribute element. There is 2984 also some informative text about certain keys and types in the 2985 Section 3.5 <a href="#Key_And_Type_Definitions_">Key And Type 2986 Definitions</a>. 2987 </p> 2988 </dd> 2989 <dt> 2990 <b>deprecated</b> 2991 </dt> 2992 <dd> 2993 <p>The deprecation status of the key, type or attribute element. 2994 The value "true" indicates the element is deprecated and no longer 2995 used in the version of CLDR. The default value is "false".</p> 2996 </dd> 2997 <dt> 2998 <b>preferred</b> 2999 </dt> 3000 <dd> 3001 <p>The preferred value of the deprecated key, type or attribute 3002 element. When a key, type or attribute element is deprecated, this 3003 attribute is used for specifying a new canonical form if available.</p> 3004 </dd> 3005 <dt> 3006 <b>alias</b> (Not applicable to <attribute>) 3007 </dt> 3008 <dd> 3009 <p>The BCP 47 form is the canonical form, and recommended. Other 3010 aliases are included only for backwards compatibility.</p> 3011 </dd> 3012 <dd> 3013 <em>Example:</em> 3014 </dd> 3015 <dd> 3016 <p> 3017 <type name="phonebk" <strong>alias="phonebook"</strong> 3018 description="Phonebook style ordering (such as in German)"/><br> 3019 </p> 3020 The preferred term, and the only one to be used in BCP 47, is the 3021 name: in this example, "phonebk".<br> 3022 </dd> 3023 <dd> 3024 <p> 3025 The alias is a key or type name used by Unicode locale extensions 3026 with the old <a href="#Old_Locale_Extension_Syntax">"@key=type" 3027 syntax</a>. The attribute value for type may contain multiple names 3028 delimited by ASCII space characters. Of those aliases, the first 3029 name is the preferred value. 3030 </p> 3031 </dd> 3032 <dt> 3033 <b>since</b> 3034 </dt> 3035 <dd>The version of CLDR in which this key or type was 3036 introduced. Absence of this attribute value implies the key or type 3037 was available in CLDR 1.7.2.</dd> 3038 </dl> 3039 <p> 3040 <em>Note: There are no values defined for the locale extension 3041 attribute in the current CLDR release. </em> 3042 </p> 3043 <p>For example,</p> 3044 <pre> 3045<key name="co" alias="collation" description="Collation type key"> 3046 <type name="pinyin" description="Pinyin ordering for Latin and for CJK characters (used in Chinese)"/> 3047</key> 3048 3049<key name="ka" alias="colAlternate" description="Collation parameter key for alternate handling"> 3050 <type name="noignore" alias="non-ignorable" description="Variable collation elements are not reset to ignorable"/> 3051 <type name="shifted" description="Variable collation elements are reset to zero at levels one through three"/> 3052</key> 3053 3054<key name="tz" alias="timezone"> 3055 ... 3056 <type name="aumel" alias="Australia/Melbourne Australia/Victoria" description="Melbourne, Australia"/> 3057 <type name="aumqi" alias="Antarctica/Macquarie" description="Macquarie Island Station, Macquarie Island" since="1.8.1"/> 3058 ... 3059</key> 3060 </pre> 3061 The data above indicates: 3062 <ul> 3063 <li>type "pinyin" is valid for key "co", thus "u-co-pinyin" is a 3064 valid Unicode locale extension.</li> 3065 <li>type "pinyin" is not valid for key "ka", thus "u-ka-pinyin" 3066 is not a valid Unicode locale extension.</li> 3067 <li>type "pinyin" has no <i>alias</i>, so "zh@collation=pinyin" 3068 is a valid Unicode locale identifier according to the old syntax. 3069 </li> 3070 <li>type "noignore" has an alias attribute, so 3071 "en@colAlternate=noignore" is not a valid Unicode locale identifier 3072 according to the old syntax.</li> 3073 <li>type "aumel" is valid for key "tz", supported by CLDR 1.7.2 3074 (default value) or later versions.</li> 3075 <li>type "aumqi" is valid for key "tz", supported by CLDR 1.8.1 3076 or later versions.</li> 3077 </ul> 3078 <p>It is strongly recommended that all API methods accept all 3079 possible aliases for keywords and types, but generate the canonical 3080 form. For example, "ar-u-ca-islamicc" would be equivalent 3081 to "ar-u-ca-islamic-civil" on input, but the latter should 3082 be output. The one exception is where an alias would only be 3083 well-formed with the old syntax, such as "gregorian" (for 3084 "gregory").</p> 3085 <h4> 3086 <a href="#Unicode_Subdivision_Codes" name="Unicode_Subdivision_Codes">3.6.5 3087 Subdivision Codes</a> 3088 </h4> 3089 <p> 3090 The subdivision codes designate a 3091 subdivision of a country or region. They are called various names, 3092 such as a <em>state</em> in the United States, or a <em>province</em> 3093 in Canada. The codes in CLDR 3094 are based on ISO 3166-2 subdivision codes. The 3095 ISO codes have a region code followed by a hyphen, then a suffix 3096 consisting of 1..3 ASCII letters or digits. 3097 </p> 3098 <p> 3099 The CLDR codes are designed to work in a 3100 <a href='#unicode_locale_id'>unicode_locale_id</a> (BCP47), and are 3101 thus all lowercase, with no hyphen. 3102 For example, the following are valid, and mean “English as used in 3103 California, USA”. 3104 </p> 3105 <ul> 3106 <li>en-u-sd-<strong>usca</strong></li> 3107 <li>en-US-u-sd-<strong>usca</strong></li> 3108 </ul> 3109 <p>CLDR has additional subdivision codes. These 3110 may start with a 3-digit region code or use a suffix of 4 ASCII 3111 letters or digits, so they will not collide with the ISO codes. 3112 Subdivision codes for unknown values are the region code plus 3113 "zzzz", such as "uszzzz" for an unknown 3114 subdivision of the US. Other codes may be added for stability.</p> 3115 <p> 3116 Like BCP 47, CLDR requires stable codes, which are not guaranteed for 3117 ISO 3166-2 (nor have the ISO 3166-2 3118 codes been stable in the past). If an ISO 3166-2 code is removed, it 3119 remains valid (though marked as deprecated) in CLDR. If an ICU 3166-2 3120 code is reused (for the same region), then CLDR will define a new 3121 equivalent code using these a 4-character suffixes. 3122 </p> 3123 <h5> 3124 <a name="Validity" href="#Validity">3.6.5.1 Validity</a> 3125 </h5> 3126 <p> 3127 A <a href="#unicode_subdivision_id">unicode_subdivision_id</a> 3128 is only valid when it is present in the 3129 subdivision.xml file as described in <em>Section 3.11 <a 3130 href="#Validity_Data">Validity Data</a></em>. 3131 The data is in a compressed form, and thus needs to be expanded 3132 before such a test is made. 3133 </p> 3134 <p> 3135 <em> Examples:<br> 3136 </em> 3137 </p> 3138 <ul> 3139 <li><strong>usca</strong> is valid — there is an <strong>id</strong> 3140 element<code><id type="subdivision"…>… usca 3141 …</id></code></li> 3142 <li><strong>ussct</strong> is invalid — there is no <strong>id</strong> 3143 element <code><id type="subdivision"…>… ussct 3144 …</id></code></li> 3145 </ul> 3146 <p>If a <a href='#unicode_locale_id'>unicode_locale_id</a> contains both a <a 3147 href="#unicode_region_subtag">unicode_region_subtag</a> and a <a 3148 href="#unicode_subdivision_id">unicode_subdivision_id</a>, it is only valid if the <a 3149 href="#unicode_subdivision_id">unicode_subdivision_id</a> starts with the <a 3150 href="#unicode_region_subtag">unicode_region_subtag</a> (case-insensitively).<br> 3151 </p> 3152 <p>It is recommended that a <a href='#unicode_locale_id'>unicode_locale_id</a> contain a <a 3153 href="#unicode_region_subtag">unicode_region_subtag</a> if it contains a <a 3154 href="#unicode_subdivision_id">unicode_subdivision_id</a> and the region would not be added by adding likely subtags. That produces better behavior if the <a 3155 href="#unicode_subdivision_id">unicode_subdivision_id</a> is ignored by an implementation or if the language tag is truncated. </p> 3156 <p> 3157 Examples:<br> 3158 </p> 3159 <ul> 3160 <li>en-<strong>US</strong>-u-sd-<strong>us</strong>ca 3161 is valid — the region "US" matches 3162 the first part of "usca"</li> 3163 <li>en-u-sd-<strong>us</strong>ca is valid — it still works after adding likely subtags.</li> 3164 <li>en-<strong>CA</strong>-u-sd-<strong>gb</strong>sct is 3165 invalid — the region "CA" does not match the first part of "gbsct". An implementation should disregard the subdivision id (or return an error).</li> 3166 <li>en-u-sd-<strong>gb</strong>sct is valid but not recommended — an implementation that ignores the <a 3167 href="#unicode_subdivision_id">unicode_subdivision_id</a> can get the wrong fallback behavior, or could add likely subtags and get the invalid en<strong>-Latn-US</strong>-u-sd-<strong>gb</strong>sct</li> 3168 </ul> 3169 <p> 3170 In version 28.0, the subdivisions in the 3171 validity files used the ISO format, uppercase with a hyphen separating two 3172 components, instead of the BCP 47 format. 3173 </p> 3174 <h3> 3175 <a name="t_Extension"></a><a name="BCP47_T_Extension" 3176 href="#BCP47_T_Extension">3.7 Unicode BCP 47 T Extension</a> 3177 </h3> 3178 <p> 3179 The Unicode Consortium has registered and is the maintaining 3180 authority for two BCP 47 language tag extensions: the extension 'u' 3181 for Unicode locale extension [<a href="#RFC6067">RFC6067</a>] and 3182 extension 't' for transformed content [<a href="#RFC6497">RFC6497</a>]. 3183 The Unicode BCP 47 extension data defines the complete list of valid 3184 subtags. 3185 While the title of the RFC is “Transformed Content”, the abstract makes it clear that the scope is broader than the term "transformed" might indicate to a casual reader: “including content that has been transliterated, transcribed, or 3186 translated, or <em>in some other way influenced by the source. It also provides for additional information used for identification.</em>”</p> 3187 <p> 3188 <strong>The -t- Extension.</strong> The syntax of 't' extension 3189 subtags is defined by the rule 3190 <code>unicode_locale_extensions</code> 3191 in <a href="#Unicode_locale_identifier"><em>Section 3.2 3192 Unicode locale identifier</em></a>, except the separator of subtags 3193 <code>sep</code> 3194 must be always hyphen '-' when the extension is used as a part of BCP 3195 47 language tag. For information about the registration process, 3196 meaning, and usage of the 't' extension, see [<a href="#RFC6497">RFC6497</a>]. 3197 </p> 3198 <p> 3199 These subtags are all in lowercase (that is the canonical casing for 3200 these subtags), however, subtags are case-insensitive and casing does 3201 not carry any specific meaning. All subtags within the Unicode 3202 extensions are alphanumeric characters in length of two to eight that 3203 meet the rule 3204 <code>extension</code> 3205 in the [<a href="#BCP47">BCP47</a>].</p> 3206 <p>The following keys are defined for the -t- extension:</p> 3207 <table class='simple'> 3208 <tbody> 3209 <tr> 3210 <th>Keys</th> 3211 <th>Description</th> 3212 <th>Values in latest release</th> 3213 </tr> 3214 <tr> 3215 <td>m0</td> 3216 <td><strong>Transform extension mechanism:</strong> to reference an authority or rules for a type of transformation</td> 3217 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform.xml">transform.xml</a></td> 3218 </tr> 3219 <tr> 3220 <td nowrap>s0, d0 </td> 3221 <td><strong>Transform source/destination:</strong> for non-languages/scripts, such as fullwidth-halfwidth conversion.</td> 3222 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform-destination.xml">transform-destination.xml</a></td> 3223 </tr> 3224 <tr> 3225 <td>i0</td> 3226 <td><strong>Input Method Engine transform:</strong> Used to indicate an input method transformation, such as one used by 3227a client-side input method. The first subfield in a sequence would 3228typically be a 'platform' or vendor designation.</td> 3229 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform_ime.xml">transform_ime.xml</a></td> 3230 </tr> 3231 <tr> 3232 <td>k0</td> 3233 <td><strong>Keyboard transform:</strong> Used to indicate a keyboard transformation, such as one used by a client-side virtual keyboard. The first subfield in a sequence would typically be a 'platform' designation, representing the platform that the keyboard is intended for. The keyboard might or might not correspond to a keyboard mapping shipped by the vendor for the platform. One or more subsequent fields may occur, but are only added where needed to distinguish from others.</td> 3234 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform_keyboard.xml">transform_keyboard.xml</a></td> 3235 </tr> 3236 <tr> 3237 <td>t0</td> 3238 <td><strong>Machine Translation:</strong> Used to indicate content that has been machine translated, or a request for a particular type of machine translation of content. The first subfield in a sequence would typically be a 'platform' or vendor designation.</td> 3239 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform_mt.xml">transform_mt.xml</a></td> 3240 </tr> 3241 <tr> 3242 <td nowrap>h0</td> 3243 <td><strong>Hybrid Locale Identifiers:</strong> h0 with the value 'hybrid' indicates that the -t- value is a language that is mixed into the main language tag to form a hybrid. For more information, and examples, see <em>Section 3.10.2 <a href="#Hybrid_Locale">Hybrid Locale Identifiers</a>.</em></td> 3244 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform_hybrid.xml">transform_hybrid.xml</a></td> 3245 </tr> 3246 <tr> 3247 <td>x0</td> 3248 <td><strong>Private use transform</strong></td> 3249 <td><a href="http://unicode.org/repos/cldr/tags/latest/common/bcp47/transform_private_use.xml">transform_private_use.xml</a></td> 3250 </tr> 3251 </tbody> 3252 </table> 3253 <h4> 3254 <a href="#Transformed_Content_Data_File" 3255 name="Transformed_Content_Data_File">3.7.1 T Extension Data 3256 Files</a> 3257 </h4> 3258 <p>The overall structure of the data files is the similar to the U 3259 Extension, with the following exceptions.</p> 3260 <p>In the transformed content 't' data file, the name attribute in 3261 a <key> element defines a valid field separator subtag. The 3262 name attribute in an enclosed <type> element defines a valid 3263 field subtag for the field separator subtag. For example:</p> 3264 <pre> 3265<key extension="t" name="m0" 3266 description="Transform extension mechanism"> 3267 <type name="ungegn" 3268 description="United Nations Group of Experts on Geographical Names" 3269 since="21"/> 3270<key> 3271</pre> 3272 The data above indicates: 3273 <ul> 3274 <li>"m0" is a valid field separator for the transformed content 3275 extension 't'.</li> 3276 <li>field subtag "ungegn" is valid for field separator "m0".</li> 3277 <li>field subtag "ungegn" was introduced in CLDR 21.</li> 3278 </ul> 3279 <p>The attributes are:</p> 3280 <dl> 3281 <dt> 3282 <b>name</b> 3283 </dt> 3284 <dd> 3285 The name of the mechanism, limited to 3-8 characters (or sequences 3286 of them). Any indirect type names are 3287 listed in 3.6.4 <a href="#Unicode_Locale_Extension_Data_Files">U 3288 Extension Data Files</a>. 3289 </dd> 3290 <dt> 3291 <b>description</b> 3292 </dt> 3293 <dd>A description of the name, with all and only that 3294 information necessary to distinguish one name from | American 3295 Library others with which it might be confused. Descriptions are not 3296 intended to provide general background information.</dd> 3297 <dt> 3298 <b>since</b> 3299 </dt> 3300 <dd>Indicates the first version of CLDR where the name appears. 3301 (Required for new items.)</dd> 3302 <dt> </dt> 3303 <dt> 3304 <b>alias</b> 3305 </dt> 3306 <dd> 3307 Alternative name, not limited in number of characters. Aliases are 3308 intended for compatibility, not to provide all possible alternate 3309 names or designations. <em>(Optional)</em> 3310 </dd> 3311 </dl> 3312 <p> 3313 For information about the registration process, meaning, and usage of 3314 the 't' extension, see [<a href="#RFC6497">RFC6497</a>]. 3315 </p> 3316 <h3> 3317 <a name="Compatibility_with_Older_Identifiers" 3318 href="#Compatibility_with_Older_Identifiers">3.8 Compatibility 3319 with Older Identifiers</a> 3320 </h3> 3321 <p>LDML version before 1.7.2 used slightly different syntax for 3322 variant subtags and locale extensions. Implementations of LDML may 3323 provide backward compatible identifier support as described in 3324 following sections.</p> 3325 3326 <h4> 3327 <a name="Old_Locale_Extension_Syntax" 3328 href="#Old_Locale_Extension_Syntax">3.8.1 Old Locale Extension 3329 Syntax </a> 3330 </h4> 3331 <p>LDML 1.7 or older specification used different syntax for 3332 representing unicode locale extensions. The previous definition of 3333 Unicode locale extensions had the following structure:</p> 3334 <table border="0"> 3335 <tr> 3336 <th> </th> 3337 <th><div align="center">EBNF</div></th> 3338 <th><div align="center">ABNF</div></th> 3339 </tr> 3340 <tr> 3341 <td>old_unicode_locale_extensions</td> 3342 <td><pre>= "@" old_key "=" old_type 3343 (";" old_key "=" old_type)*</pre></td> 3344 <td><pre>= "@" old_key "=" old_type 3345*(";" old_key "=" old_type)</pre></td> 3346 </tr> 3347 </table> 3348 <p>The new specification mandates keys to be two alphanumeric 3349 characters and types to be three to eight alphanumeric characters. As 3350 the result, new codes were assigned to all existing keys and some 3351 types. For example, a new key "co" replaced the previous key 3352 "collation", a new type "phonebk" replaced the previous type 3353 "phonebook". However, the existing collation type "big5han" already 3354 satisfied the new requirement, so no new type code was assigned to 3355 the type. All new keys and types introduced after LDML 1.7 satisfy 3356 the new requirement, so they do not have aliases dedicated for the 3357 old syntax, except time zone types. The conversion between old types 3358 and new types can be done regardless of key, with one known exception 3359 (old type "traditional" is mapped to new type "trad" for collation 3360 and "traditio" for numbering system), and this relationship will be 3361 maintained in the future versions unless otherwise noted.</p> 3362 <p> 3363 The new specification introduced a new field 3364 <code>attribute</code> 3365 in addition to key/type pairs in the Unicode locale extension. When 3366 it is necessary to map a new Unicode locale identifier with 3367 <code>attribute</code> 3368 field to a well-formed old locale identifier, a special key name <i>attribute</i> 3369 with the value of entire 3370 <code>attribute</code> 3371 subtags in the new identifier is used. For example, a new identifier 3372 <code>ja-u-xxx-yyy-ca-japanese</code> 3373 is mapped to an old identifier 3374 <code>ja@attribute=xxx-yyy;calendar=japanese</code> 3375 . 3376 </p> 3377 <p>The chart below shows some example mappings between the new 3378 syntax and the old syntax.</p> 3379 3380 <table> 3381 <caption> 3382 <a name="Locale_Extension_Mappings" 3383 href="#Locale_Extension_Mappings">Locale Extension Mappings</a> 3384 </caption> 3385 <tr> 3386 <th>Old (LDML 1.7 or older)</th> 3387 <th>New</th> 3388 </tr> 3389 <tr> 3390 <td>de_DE@collation=phonebook</td> 3391 <td>de_DE_u_co_phonebk</td> 3392 </tr> 3393 <tr> 3394 <td>zh_Hant_TW@collation=big5han</td> 3395 <td>zh_Hant_TW_u_co_big5han</td> 3396 </tr> 3397 <tr> 3398 <td>th_TH@calendar=gregorian;numbers=thai</td> 3399 <td>th_TH_u_ca_gregory_nu_thai</td> 3400 </tr> 3401 <tr> 3402 <td>en_US_POSIX@timezone=America/Los_Angeles</td> 3403 <td>en_US_u_tz_uslax_va_posix</td> 3404 </tr> 3405 </table> 3406 3407 <p>Where the old API is supplied the bcp47 language code, or vice 3408 versa, the recommendation is to:</p> 3409 <ol> 3410 <li>Have all methods that take the old syntax also take the new 3411 syntax, interpreted correctly. For example, 3412 "zh-TW-u-co-pinyin" and "zh_TW@collation=pinyin" 3413 would both be interpreted as meaning the same.</li> 3414 <li>Have all methods (both for old and new syntax) accept all 3415 possible aliases for keywords and types. For example, 3416 "ar-u-ca-islamicc" would be equivalent to 3417 "ar-u-ca-islamic-civil". 3418 <ul> 3419 <li>The one exception is where an alias would only be 3420 well-formed with the old syntax, such as "gregorian" 3421 (for "gregory").</li> 3422 </ul> 3423 </li> 3424 <li>Where an API cannot successfully accept the alternate 3425 syntax, throw an exception (or otherwise indicate an error) so that 3426 people can detect that they are using the wrong method (or wrong 3427 input).</li> 3428 <li>Provide a method that tests a purported locale ID string to 3429 determine its status: 3430 <ol> 3431 <li><strong>well-formed</strong> - syntactically correct</li> 3432 <li><strong>valid</strong> - well-formed and only uses 3433 registered language subtags, extensions, keywords, types...</li> 3434 <li><strong>canonical</strong> - valid and no deprecated codes 3435 or structure.</li> 3436 </ol> 3437 </li> 3438 </ol> 3439 3440 <h4> 3441 <a name="Legacy_Variants" href="#Legacy_Variants">3.8.2 Legacy 3442 Variants </a> 3443 </h4> 3444 <p> 3445 Old LDML specification allowed codes other than registered [<a 3446 href="#BCP47">BCP47</a>] variant subtags used in Unicode language 3447 and locale identifiers for representing variations of locale data. 3448 Unicode locale identifiers including such variant codes can be 3449 converted to the new [<a href="#BCP47">BCP47</a>] compatible 3450 identifiers by following the descriptions below: 3451 </p> 3452 <table> 3453 <caption> 3454 <a name="Legacy_Variant_Mappings" href="#Legacy_Variant_Mappings">Legacy 3455 Variant Mappings</a> 3456 </caption> 3457 <tr> 3458 <th>Variant Code</th> 3459 <th>Description</th> 3460 </tr> 3461 3462 <tr> 3463 <td>AALAND</td> 3464 <td>Åland, variant of "sv" Swedish used in Finland. Use "sv_AX" 3465 to indicate this.</td> 3466 </tr> 3467 3468 <tr> 3469 <td>BOKMAL</td> 3470 <td>Bokmål, variant of "no" Norwegian. Use primary language 3471 subtag "nb" to indicate this.</td> 3472 </tr> 3473 3474 <tr> 3475 <td>NYNORSK</td> 3476 <td>Nynorsk, variant of "no" Norwegian. Use primary language 3477 subtag "nn" to indicate this.</td> 3478 </tr> 3479 3480 <tr> 3481 <td>POSIX</td> 3482 <td>POSIX variation of locale data. Use Unicode locale 3483 extension "-u-va-posix" to indicate this.</td> 3484 </tr> 3485 3486 <tr> 3487 <td>POLYTONI</td> 3488 <td>Polytonic, variant of "el" Greek. Use [<a href="#BCP47">BCP47</a>] 3489 variant subtag "polyton" to indicate this. 3490 </td> 3491 </tr> 3492 3493 <tr> 3494 <td>SAAHO</td> 3495 <td>The Saaho variant of Afar. Use primary language subtag 3496 "ssy" to indicated this.</td> 3497 </tr> 3498 </table> 3499 <p> 3500 When converting to old syntax, the Unicode locale extension 3501 "-u-va-posix" should be converted to the "POSIX" variant, <i>not</i> 3502 to old extension syntax like "@va=posix". This is an exception: The 3503 other mappings above should not be reversed. 3504 </p> 3505 3506 <p>Examples:</p> 3507 <ul> 3508 <li>en_US_POSIX ↔ en-US-u-va-posix</li> 3509 <li>en_US_POSIX@colNumeric=yes ↔ en-US-u-kn-va-posix</li> 3510 <li>en-US-POSIX-u-kn-true → en-US-u-kn-va-posix</li> 3511 <li>en-US-POSIX-u-kn-va-posix → en-US-u-kn-va-posix</li> 3512 </ul> 3513 3514 <h4> 3515 <a name="Relation_to_OpenI18n" href="#Relation_to_OpenI18n">3.8.3 3516 Relation to OpenI18n</a> 3517 </h4> 3518 <p> 3519 The locale id format generally follows the description in the <i>OpenI18N 3520 Locale Naming Guideline</i> [<a href="#NamingGuideline">NamingGuideline</a>], 3521 with some enhancements. The main differences from the those 3522 guidelines are that the locale id: 3523 </p> 3524 <ol type="a"> 3525 <li style="margin-top: 0.5em; margin-bottom: 0.5em">does not 3526 include a charset (since the data in LDML format always provides a 3527 representation of all Unicode characters. The repository is stored 3528 in UTF-8, although that can be transcoded to other encodings as 3529 well.),</li> 3530 <li style="margin-top: 0.5em; margin-bottom: 0.5em">adds the 3531 ability to have a variant, as in Java</li> 3532 <li style="margin-top: 0.5em; margin-bottom: 0.5em">adds the 3533 ability to discriminate the written language by script (or script 3534 variant).</li> 3535 <li style="margin-top: 0.5em; margin-bottom: 0.5em">is a 3536 superset of [<a href="#BCP47">BCP47</a>] codes. 3537 </li> 3538 </ol> 3539 <h3> 3540 <a name="Transmitting_Locale_Information" 3541 href="#Transmitting_Locale_Information">3.9 Transmitting Locale 3542 Information</a> 3543 </h3> 3544 <p> 3545 In a world of on-demand software components, with arbitrary 3546 connections between those components, it is important to get a sense 3547 of where localization should be done, and how to transmit enough 3548 information so that it can be done at that appropriate place. 3549 End-users need to get messages localized to their languages, messages 3550 that not only contain a translation of text, but also contain 3551 variables such as date, time, number formats, and currencies 3552 formatted according to the users' conventions. The strategy for 3553 doing the so-called <i>JIT localization </i>is made up of two parts: 3554 </p> 3555 <ol> 3556 <li>Store and transmit <i>neutral-format</i> data wherever 3557 possible. 3558 <ul> 3559 <li>Neutral-format data is data that is kept in a standard 3560 format, no matter what the local user's environment is. 3561 Neutral-format is also (loosely) called <i>binary data</i>, even 3562 though it actually could be represented in many different ways, 3563 including a textual representation such as in XML. 3564 </li> 3565 <li>Such data should use accepted standards where possible, 3566 such as for currency codes.</li> 3567 <li>Textual data should also be in a uniform character set 3568 (Unicode/10646) to avoid possible data corruption problems when 3569 converting between encodings.</li> 3570 </ul> 3571 </li> 3572 <li>Localize that data as "<i>close</i>" to the 3573 end-user as possible. 3574 </li> 3575 </ol> 3576 <p>There are a number of advantages to this strategy. The longer 3577 the data is kept in a neutral format, the more flexible the entire 3578 system is. On a practical level, if transmitted data is 3579 neutral-format, then it is much easier to manipulate the data, debug 3580 the processing of the data, and maintain the software connections 3581 between components.</p> 3582 <p>Once data has been localized into a given language, it can be 3583 quite difficult to programmatically convert that data into another 3584 format, if required. This is especially true if the data contains a 3585 mixture of translated text and formatted variables. Once information 3586 has been localized into, say, Romanian, it is much more difficult to 3587 localize that data into, say, French. Parsing is more difficult than 3588 formatting, and may run up against different ambiguities in 3589 interpreting text that has been localized, even if the original 3590 translated message text is available (which it may not be).</p> 3591 <p>Moreover, the closer we are to end-user, the more we know about 3592 that user's preferred formats. If we format dates, for example, 3593 at the user's machine, then it can easily take into account any 3594 customizations that the user has specified. If the formatting is done 3595 elsewhere, either we have to transmit whatever user customizations 3596 are in play, or we only transmit the user's locale code, which 3597 may only approximate the desired format. Thus the closer the 3598 localization is to the end user, the less we need to ship all of the 3599 user's preferences around to all the places that localization 3600 could possibly need to be done.</p> 3601 <p>Even though localization should be done as close to the 3602 end-user as possible, there will be cases where different components 3603 need to be aware of whatever settings are appropriate for doing the 3604 localization. Thus information such as a locale code or time zone 3605 needs to be communicated between different components.</p> 3606 <h4> 3607 <a name="Message_Formatting_and_Exceptions" 3608 href="#Message_Formatting_and_Exceptions">3.9.1 Message 3609 Formatting and Exceptions</a> 3610 </h4> 3611 <p> 3612 Windows (<a 3613 href="http://msdn.microsoft.com/en-us/library/ms679351.aspx">FormatMessage</a>, 3614 <a href="http://msdn.microsoft.com/en-us/library/aa331875.aspx">String.Format</a>), 3615 Java (<a 3616 href="http://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html">MessageFormat</a>) 3617 and ICU (<a 3618 href="http://www.icu-project.org/apiref/icu4c/classMessageFormat.html">MessageFormat</a>, 3619 <a href="http://www.icu-project.org/apiref/icu4c/umsg_8h.html">umsg</a>) 3620 all provide methods of formatting variables (dates, times, etc) and 3621 inserting them at arbitrary positions in a string. This avoids the 3622 manual string concatenation that causes severe problems for 3623 localization. The question is, where to do this? It is especially 3624 important since the original code site that originates a particular 3625 message may be far down in the bowels of a component, and passed up 3626 to the top of the component with an exception. So we will take that 3627 case as representative of this class of issues. 3628 </p> 3629 <p>There are circumstances where the message can be communicated 3630 with a language-neutral code, such as a numeric error code or 3631 mnemonic string key, that is understood outside of the component. If 3632 there are arguments that need to accompany that message, such as a 3633 number of files or a datetime, those need to accompany the numeric 3634 code so that when the localization is finally at some point, the full 3635 information can be presented to the end-user. This is the best case 3636 for localization.</p> 3637 <p>More often, the exact messages that could originate from within 3638 the component are not known outside of the component itself; or at 3639 least they may not be known by the component that is finally 3640 displaying text to the user. In such a case, the information as to 3641 the user's locale needs to be communicated in some way to the 3642 component that is doing the localization. That locale information 3643 does not necessarily need to be communicated deep within the 3644 component; ideally, any exceptions should bundle up some 3645 language-neutral message ID, plus the arguments needed to format the 3646 message (for example, datetime), but not do the localization at the 3647 throw site. This approach has the advantages noted above for JIT 3648 localization.</p> 3649 <p>In addition, exceptions are often caught at a higher level; 3650 they do not end up being displayed to any end-user at all. By 3651 avoiding the localization at the throw site, it the cost of doing 3652 formatting, when that formatting is not really necessary. In fact, in 3653 many running programs most of the exceptions that are thrown at a low 3654 level never end up being presented to an end-user, so this can have 3655 considerable performance benefits.</p> 3656 <h3> 3657 <a name="Language_and_Locale_IDs" href="#Language_and_Locale_IDs">3.10 3658 Unicode Language and Locale IDs</a> 3659 </h3> 3660 <p>People have very slippery notions of what distinguishes a 3661 language code versus a locale code. The problem is that both are 3662 somewhat nebulous concepts.</p> 3663 <p> 3664 In practice, many people use [<a href="#BCP47">BCP47</a>] codes to 3665 mean locale codes instead of strictly language codes. It is easy to 3666 see why this came about; because [<a href="#BCP47">BCP47</a>] 3667 includes an explicit region (territory) code, for most people it was 3668 sufficient for use as a locale code as well. For example, when 3669 typical web software receives an [<a href="#BCP47">BCP47</a>] code, 3670 it will use it as a locale code. Other typical software will do the 3671 same: in practice, language codes and locale codes are treated 3672 interchangeably. Some people recommend distinguishing on the basis of 3673 "-" versus "_" (for example, <i>zh-TW</i> for 3674 language code, <i>zh_TW</i> for locale code), but in practice that 3675 does not work because of the free variation out in the world in the 3676 use of these separators. Notice that Windows, for example, uses 3677 "-" as a separator in its locale codes. So pragmatically 3678 one is forced to treat "-" and "_" as equivalent 3679 when interpreting either one on input. 3680 </p> 3681 <p> 3682 Another reason for the conflation of these codes is that <i>very</i> 3683 little data in most systems is distinguished by region alone; 3684 currency codes and measurement systems being some of the few. 3685 Sometimes date or number formats are mentioned as regional, but that 3686 really does not make much sense. If people see the sentence "You 3687 will have to adjust the value to १,२३४.५६७ from ૭૧,૨૩૪.૫૬" 3688 (using Indic digits), they would say that sentence is simply not 3689 English. Number format is far more closely associated with language 3690 than it is with region. The same is true for date formats: people 3691 would never expect to see intermixed a date in the format 3692 "2003年4月1日" (using Kanji) in text purporting to be purely 3693 English. There are regional differences in date and number format — 3694 differences which can be important — but those are different in kind 3695 than other language differences between regions. 3696 </p> 3697 <p> 3698 As far as we are concerned — <i>as a completely practical matter</i> 3699 — two languages are different if they require substantially different 3700 localized resources. Distinctions according to spoken form are 3701 important in some contexts, but the written form is by far and away 3702 the most important issue for data interchange. Unfortunately, this is 3703 not the principle used in [<a href="#ISO639">ISO639</a>], which has 3704 the fairly unproductive notion (for data interchange) that only 3705 spoken language matters (it is also not completely consistent about 3706 this, however). 3707 </p> 3708 <p> 3709 [<a href="#BCP47">BCP47</a>] <i><b>can</b></i> express a difference 3710 if the use of written languages happens to correspond to region 3711 boundaries expressed as [<a href="#ISO3166">ISO3166</a>] region 3712 codes, and has recently added codes that allow it to express some 3713 important cases that are not distinguished by [<a href="#ISO3166">ISO3166</a>] 3714 codes. These written languages include simplified and traditional 3715 Chinese (both used in Hong Kong S.A.R.); Serbian in Latin script; 3716 Azerbaijani in Arab script, and so on. 3717 </p> 3718 <p> 3719 Notice also that <i>currency codes</i> are different than <i>currency 3720 localizations</i>. The currency localizations should largely be in the 3721 language-based resource bundles, not in the territory-based resource 3722 bundles. Thus, the resource bundle <i>en</i> contains the localized 3723 mappings in English for a range of different currency codes: USD → 3724 US$, RUR → Rub, AUD → $A and so on. Of course, some currency symbols 3725 are used for more than one currency, and in such cases 3726 specializations appear in the territory-based bundles. Continuing the 3727 example, <i>en_US</i> would have USD → $, while <i>en_AU</i> would 3728 have AUD → $. (In protocols, the currency codes should always 3729 accompany any currency amounts; otherwise the data is ambiguous, and 3730 software is forced to use the user's territory to guess at the 3731 currency. For some informal discussion of this, see <a 3732 href="http://source.icu-project.org/repos/icu/icuhtml/trunk/design/jit_localization.html">JIT 3733 Localization</a>.) 3734 </p> 3735 <h4> 3736 <a name="Written_Language" href="#Written_Language">3.10.1 3737 Written Language</a> 3738 </h4> 3739 <p> 3740 Criteria for what makes a written language should be purely 3741 pragmatic; <i>what would copy-editors say? </i>If one gave them text 3742 like the following, they would respond that is far from acceptable 3743 English for publication, and ask for it to be redone: 3744 </p> 3745 <ol> 3746 <li type="A">"Theatre Center News: The date of the last 3747 version of this document was 2003年3月20日. A copy can be obtained for 3748 $50,0 or 1.234,57 грн. We would like to acknowledge contributions by 3749 the following authors (in alphabetical order): Alaa Ghoneim, Behdad 3750 Esfahbod, Ahmed Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and 3751 Doug Felt."</li> 3752 </ol> 3753 <p>So one would change it to either B or C below, depending on 3754 which orthographic variant of English was the target for the 3755 publication:</p> 3756 <ol type="A" start="2"> 3757 <li>"Theater Center News: The date of the last version of 3758 this document was 3/20/2003. A copy can be obtained for $50.00 or 3759 1,234.57 Ukrainian Hryvni. We would like to acknowledge 3760 contributions by the following authors (in alphabetical order): Alaa 3761 Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, 3762 Doug Felt, Eric Mader."</li> 3763 <li>"Theatre Centre News: The date of the last version of 3764 this document was 20/3/2003. A copy can be obtained for $50.00 or 3765 1,234.57 Ukrainian Hryvni. We would like to acknowledge 3766 contributions by the following authors (in alphabetical order): Alaa 3767 Ghoneim, Ahmed Talaat, Asmus Freytag, Avery Bishop, Behdad Esfahbod, 3768 Doug Felt, Eric Mader."</li> 3769 </ol> 3770 <p> 3771 Clearly there are many acceptable variations on this text. For 3772 example, copy editors might still quibble with the use of first 3773 versus last name sorting in the list, but clearly the first list was 3774 <i>not</i> acceptable English alphabetical order. And in quoting a 3775 name, like "Theatre Centre News", one may leave it in the 3776 source orthography even if it differs from the publication target 3777 orthography. And so on. However, just as clearly, there limits on 3778 what is acceptable English, and "2003年3月20日", for example, 3779 is <i>not</i>. 3780 </p> 3781 <p>Note that the language of locale data may differ from the 3782 language of localized software or web sites, when those latter are 3783 not localized into the user's preferred language. In such cases, 3784 the kind of incongruous juxtapositions described above may well 3785 appear, but this situation is usually preferable to forcing 3786 unfamiliar date or number formats on the user as well.</p> 3787 <h4> 3788 <a name="Hybrid_Locale" href="#Hybrid_Locale">3.10.2 3789 Hybrid Locale Identifiers</a> 3790 </h4> 3791 <p>Hybrid locales have intermixed content from 2 (or more) languages, often with one language's grammatical structure applied to words in another. These are commonly referred to with portmanteau words such as <em>Franglais, <a href="https://en.oxforddictionaries.com/definition/spanglish">Spanglish</a> </em>or<em> Denglish</em>. Hybrid locales do not <em>not</em> reference text simply containing two languages: a book of parallel text containing English and French, such as the following, is not Franglais:</p> 3792 <table style='margin-left:2em; margin-right:2em'> 3793 <tbody> 3794 <tr> 3795 <td width='50%' style='font-family:serif'>On the 24th of May, 1863, my uncle, Professor Liedenbrock, rushed into his little house, No. 19 Königstrasse, one of the oldest streets in the oldest portion of the city of Hamburg…</td> 3796 <td style='font-family:serif'>Le 24 mai 1863, un dimanche, mon oncle, le professeur Lidenbrock, revint précipitamment vers sa petite maison située au numéro 19 de Königstrasse, l’une des plus anciennes rues du vieux quartier de Hambourg…</td> 3797 </tr> 3798 </tbody> 3799 </table> 3800 <p>While text in a document can be tagged as partly in one language and partly in another, that is not the same having a hybrid locale. There is a difference between having a Spanglish document, and a Spanish document that has some passages quoted in English. Fine-grained tagging doesn't handle grammatical combinations like Denglisch “<a href="http://www.duden.de/rechtschreibung/downloaden">gedownloadet</a>”, which is neither English nor German — similarly the Franglais “<a href='http://www.le-dictionnaire.com/definition.php?mot=downloader'>downloadé</a>”. More importantly, it doesn’t work for the very common use case for a <a href="#unicode_locale_id">unicode_locale_id</a>: <i>locale selection</i>. </p> 3801 <p>To communicate requests for localized content and internationalization services, locales are used. When people pick a language from a menu, internally they are picking a locale (en-GB, es-419, etc.). To allow an application to support Spanglish or Hinglish locale selection, <a href="#unicode_locale_id">unicode_locale_id</a>s can represent hybrid locales using the T extension key-value 'h0-hybrid'. (For more information on the T extension, see <em>Section 3.7 <a href="#t_Extension">Unicode BCP 47 T Extension</a>.</em>) 3802 </p> 3803 <p>Examples:</p> 3804 <table class='simple'> 3805 <tbody> 3806 <tr> 3807 <td>hi-t-<u>en-h0-hybrid</u></td> 3808 <td>Hinglish</td> 3809 <td>Hindi-English hybrid locale</td> 3810 </tr> 3811 <tr> 3812 <td>ta-t-<u>en-h0-hybrid</u></td> 3813 <td>Tanglish</td> 3814 <td>Tamil-English hybrid locale</td> 3815 </tr> 3816 <tr> 3817 <td>ba-t-<u>en-h0-hybrid</u></td> 3818 <td>Banglish</td> 3819 <td>Bangla-English hybrid locale</td> 3820 </tr> 3821 <tr><td colspan="3">…</td></tr> 3822 <tr> 3823 <td>en-t-<u>hi-h0-hybrid</u></td> 3824 <td>Hinglish</td> 3825 <td>English-Hindi hybrid locale</td> 3826 </tr> 3827 <tr> 3828 <td>en-t-<u>zh-h0-hybrid</u></td> 3829 <td>Chinglish</td> 3830 <td>English-Chinese hybrid locale</td> 3831 </tr> 3832 <tr><td colspan="3">…</td></tr> 3833 </tbody> 3834 </table> 3835 <blockquote> 3836 <p><em>Note: The <a href="#unicode_language_id">unicode_language_id</a> should be the language used as the ‘scaffold’: for the fallback locale for internationalization services, typically used for more of the core vocabulary/structure in the content. Thus Hinglish should be represented as hi-t-h0-en where Hindi is the scaffold, and as en-t-h0-hi where English is.</em></p> 3837 </blockquote> 3838 <p>The value of -t- is a full <em><a href="#unicode_language_id">unicode_language_id</a></em>, and can contain subtags for script or region where it is important to include them, as in the following. It may be useful in order to emphasize the script, even where it is the default script for the language, if it is not the same as the script of the main language tag.</p> 3839 <table class='simple'> 3840 <tbody> 3841 <tr> 3842 <td>ru-t<u>-en-latn-gb-h0-hybrid</u></td> 3843 <td>Runglish</td> 3844 <td>Russian with an admixture of British English in Latin script</td> 3845 </tr> 3846 <tr> 3847 <td>ru-t-<u>en-cyrl-gb-h0-hybrid</u></td> 3848 <td>Runglish</td> 3849 <td>Russian with an admixture of British English in Cyrillic script</td> 3850 </tr> 3851 </tbody> 3852 </table> 3853 <p>Should there ever be strong need for hybrids of more than two languages or for other purposes such as hybrid languages as the source of translated content, additional structure could be added.</p> 3854 <h3> 3855 <a name="Validity_Data" href="#Validity_Data">3.11 Validity Data</a> 3856 </h3> 3857 <p class='dtd'> 3858 <!ELEMENT idValidity (id*) ><br> <!ELEMENT id ( #PCDATA 3859 ) ><br> <!ATTLIST id type NMTOKEN #REQUIRED > <br> 3860 <!ATTLIST id idStatus NMTOKEN #REQUIRED > 3861 </p> 3862 <p> 3863 The directory <a 3864 href='http://unicode.org/repos/cldr/tags/latest/common/validity/'>common/validity</a> 3865 contains machine-readable data for validating the language, region, 3866 script, and variant subtags, as well as currency, subdivisions and 3867 measure units. Each file contains a number of subtags with the 3868 following <strong>idStatus</strong> values: 3869 </p> 3870 <ul> 3871 <li><strong>regular</strong> — the standard codes used for the 3872 specific type of subtag</li> 3873 <li><strong>special</strong> — certain 3874 exceptional language codes like 'mul'<em> (languages only)</em></li> 3875 <li><strong>unknown</strong> — the code used to indicate the 3876 "unknown", "undetermined" or "invalid" 3877 values. For more information, see <em>Section 3.5.1 <a 3878 href="#Unknown_or_Invalid_Identifiers">Unknown or Invalid 3879 Identifiers</a></em>.</li> 3880 <li><strong>macroregion</strong> — the standard codes that are 3881 macroregions<em> (for regions only).</em> 3882 <ul> 3883 <li>Note that some two-letter region codes are macroregions, 3884 and (in the future) some three-digit codes may be regular codes.</li> 3885 <li>For details as to which regions are contained within which 3886 macroregions, see the <strong><containment></strong> element 3887 of the supplemental data. 3888 </li> 3889 </ul></li> 3890 <li><strong>deprecated</strong> — codes that should not be used. 3891 The <strong><alias></strong> element in the supplementalMeta 3892 file contains more information about these codes, and which codes 3893 should be used instead.</li> 3894 <li><strong>private_use</strong> — codes that, for CLDR, are 3895 considered private use. Note that some BCP 47 private-use codes have 3896 defined CLDR semantics, and are considered regular codes. For more 3897 information, see <em>Section 3.5.3 <a href="#Private_Use">Private 3898 Use Codes</a>. 3899 </em></li> 3900 </ul> 3901 <p> 3902 The list of subtags for each idStatus use a compact format as a 3903 space-delimited list of StringRanges, as defined in <em>Section 3904 <a href="#String_Range">5.3.4 String Range</a>. 3905 </em> The separator for each StringRange is a "~". 3906 </p> 3907 <p>Each measure unit is a sequence of subtags, such as 3908 “angle-arc-minute”. The first subtag provides a general “category” of 3909 the unit.</p> 3910 <p> 3911 In version 28.0, the subdivisions in the 3912 validity files used the ISO format, uppercase with a hyphen separating two 3913 components, instead of the BCP 47 format. 3914 </p> 3915 <h2> 3916 <a name="Locale_Inheritance" href="#Locale_Inheritance">4 Locale 3917 Inheritance and Matching</a> 3918 </h2> 3919 <p> 3920 The XML format relies on an inheritance model, whereby the resources 3921 are collected into <i>bundles</i>, and the bundles organized into a 3922 tree. Data for the many Spanish locales does not need to be 3923 duplicated across all of the countries having Spanish as a national 3924 language. Instead, common data is collected in the Spanish language 3925 locale, and territory locales only need to supply differences. The 3926 parent of all of the language locales is a generic locale known as <i>root</i>. 3927 Wherever possible, the resources in the root are language & 3928 territory neutral. For example, the collation (sorting) order in the 3929 root is based on the [<a href="#DUCET">DUCET</a>] (see<em><a 3930 href="tr35-collation.html#Root_Collation">Root Collation</a></em>). Since 3931 English language collation has the same ordering as the root locale, 3932 the 'en' locale data does not need to supply any collation 3933 data, nor do the 'en_US', 'en_GB' or the any of the 3934 various other locales that use English. 3935 </p> 3936 <p>Given a particular locale id "en_US_someVariant", the 3937 search chain for a particular resource is the following.</p> 3938 <blockquote> 3939 <pre>en_US_someVariant 3940en_US 3941en 3942root</pre> 3943 </blockquote> 3944 <p> 3945 <em>The inheritance is often not simple truncation, as will be 3946 seen later in this section.</em> 3947 </p> 3948 <p>If a type and key are supplied in the locale id, then logically 3949 the chain from that id to the root is searched for a resource tag 3950 with a given type, all the way up to root. If no resource is found 3951 with that tag and type, then the chain is searched again without the 3952 type.</p> 3953 <p> 3954 Thus the data for any given locale will only contain resources that 3955 are different from the parent locale. For example, most territory 3956 locales will inherit the bulk of their data from the language locale: 3957 "en" will contain the bulk of the data: "en_IE" 3958 will only contain a few items like currency. All data that is 3959 inherited from a parent is presumed to be valid, just as valid as if 3960 it were physically present in the file. This provides for much 3961 smaller resource bundles, and much simpler (and less error-prone) 3962 maintenance. At the script or region level, the "primary" 3963 child locale will be empty, since its parent will contain all of the 3964 appropriate resources for it. For more information see <i>CLDR 3965 Information : Section 9.3 <a href="tr35-info.html#Default_Content">Default 3966 Content</a>. 3967 </i> 3968 </p> 3969 3970 <p> 3971 Certain data items depend only on the region specified in a locale id 3972 (by a <a 3973 href="#unicode_region_subtag_validity">unicode_region_subtag</a> or 3974 an “rg” <a href="#RegionOverride">Region Override</a> key) 3975 , and are obtained from supplemental data rather than through locale 3976 resources. For example: 3977 </p> 3978 <ul> 3979 <li>The currency for the specified region (see <a 3980 href="tr35-numbers.html#Supplemental_Currency_Data">Supplemental 3981 Currency Data</a>) 3982 </li> 3983 <li>The measurement system for the specified region (see <a 3984 href="tr35-general.html#Measurement_System_Data">Measurement 3985 System Data</a>) 3986 </li> 3987 <li>The week conventions for the specified region (see <a 3988 href="tr35-dates.html#Week_Data">Week Data</a>) 3989 </li> 3990 </ul> 3991 <p> 3992 (For more information on the specific 3993 items handled this way, see <a 3994 href="tr35-info.html#Territory_Based_Preferences">Territory-Based 3995 Preferences</a>.) 3996 These items will be correct for the specified region regardless of 3997 whether a locale bundle actually exists with the same combination of 3998 language and region as in the locale id. For example, suppose data is 3999 requested for the locale id "fr_US" and there is no bundle for that 4000 combination. Data obtained via locale inheritance, such as currency 4001 patterns and currency symbols, will be obtained from the parent 4002 locale "fr". However, currency amounts would be formatted by default 4003 using US dollars, just displayed in the manner governed by the locale 4004 "fr". When a locale id does not specify a region, the region-specific 4005 items such as those above are obtained from the likely region for the 4006 locale (obtained via <a href="#Likely_Subtags">Likely Subtags</a>).</p> 4007 <p>For the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see Section 4.2.6 <a 4008 href="tr35.html#Inheritance_vs_Related">Inheritance vs Related Information</a>.</p> 4009 <h3> 4010 <a href="#Lookup" name="Lookup">4.1 Lookup</a> 4011 </h3> 4012 4013 <p>If a language has more than one script in customary modern use, 4014 then the CLDR file structure in common/main follows the following 4015 model:</p> 4016 <blockquote> 4017 <p> 4018 lang<br> lang_script<br> lang_script_region<br> 4019 lang_region<i> (aliases to lang_script_region)</i> 4020 </p> 4021 </blockquote> 4022 <h4> 4023 <a href="#Bundle_vs_Item_Lookup" name="Bundle_vs_Item_Lookup">4.1.1 4024 Bundle vs Item Lookup</a> 4025 </h4> 4026 <p> 4027 There are actually two different kinds of inheritance fallback: <em>resource bundle lookup</em> 4028 and <em>resource item lookup</em>. For the former, a 4029 process is looking to find the first, best resource bundle it can; 4030 for the later, it is fallback within bundles on individual 4031 items, like the translated name for the region "CN" in 4032 Breton. 4033 </p> 4034 <p> 4035 These are closely related, but distinct, processes. They are 4036 illustrated in the table <a href="#Lookup-Differences">Lookup 4037 Differences</a>, where "key" stands for zero or more key/type 4038 pairs. Logically speaking, when looking up an item for a given 4039 locale, you first do a resource bundle lookup to find the best bundle 4040 for the locale, then you do a inherited item lookup starting with 4041 that resource bundle. 4042 </p> 4043 <p> 4044 The table <a href="#Lookup-Differences">Lookup Differences</a> uses 4045 the naïve resource bundle lookup for illustration. More sophisticated 4046 systems will get far better results for resource bundle lookup if 4047 they use the algorithm described in <em>Section 4.4 <a 4048 href="#LanguageMatching">Language Matching</a></em>. That algorithm takes 4049 into account both the user’s desired locale(s) and the application’s 4050 supported locales, in order to get the best match. 4051 </p> 4052 <p> 4053 If the naïve resource bundle lookup is used, the desired locale needs 4054 to be canonicalized using 4.3 <a href="#Likely_Subtags">Likely 4055 Subtags</a> and the supplemental alias information, so that locales that 4056 CLDR considers identical are treated as such. Thus eng-Latn-GB should 4057 be mapped to en-GB, and cmn-TW mapped to zh-Hant-TW. 4058 </p> 4059 <p>For the purposes of CLDR, everything with the <ldml> dtd 4060 is treated logically as if it is one resource bundle, even if the 4061 implementation separates data into separate physical resource 4062 bundles. For example, suppose that there is a main XML file for Nama 4063 (naq), but there are no <unit> elements for it because the 4064 units are all inherited from root. If the <unit> elements are 4065 separated into a separate data tree for modularity in the 4066 implementation, the Nama <unit> resource bundle would be empty. 4067 However, for purposes of resource-bundle lookup the resource bundle 4068 lookup still stops at naq.xml.</p> 4069 4070 <div id="iqaw2" style="margin-top: 0px; margin-bottom: 0px;"> 4071 <table class='simple' id="a1bn" border="1" cellpadding="3" cellspacing="0"> 4072 <caption> 4073 <a href="#Lookup-Differences" name="Lookup-Differences">Lookup 4074 Differences</a> 4075 </caption> 4076 <tbody id="iqaw3"> 4077 <tr id="x40y0"> 4078 <th id="x40y1" style="vertical-align: top;" nowrap>Lookup 4079 Type</th> 4080 <th id="x40y3" style="vertical-align: top;" nowrap>Example</th> 4081 <th id="x40y5" style="vertical-align: top;">Comments</th> 4082 </tr> 4083 <tr id="iqaw4"> 4084 <td id="iqaw5" style="vertical-align: top;" nowrap> 4085 <p id="rkc40"> 4086 <strong>Resource bundle</strong> lookup 4087 </p> 4088 </td> 4089 <td id="iqaw7" style="vertical-align: top;" nowrap> 4090 <p>se-FI →</p> 4091 <p>se →</p> 4092 <p> 4093 <em>default-locale* →</em> 4094 </p> 4095 <p>root</p> 4096 </td> 4097 <td id="rkc41" style="vertical-align: top;"> 4098 <p>* The default-locale may have its own inheritance change; 4099 for example, it may be "en-GB → en" In that 4100 case, the chain is expanded by inserting the chain, resulting 4101 in:</p> 4102 <blockquote> 4103 <p>se-FI →</p> 4104 <p>se →</p> 4105 <p>fi →</p> 4106 <p> 4107 <em>en-GB →</em> 4108 </p> 4109 <p> 4110 <em>en →</em> 4111 </p> 4112 <p>root</p> 4113 </blockquote> 4114 </td> 4115 </tr> 4116 <tr id="iqaw9"> 4117 <td id="iqaw10" style="vertical-align: top;" nowrap> 4118 <p> 4119 <strong>Inherited item</strong> lookup 4120 </p> 4121 </td> 4122 <td id="iqaw12" style="vertical-align: top;" nowrap> 4123 <p>se-FI+key →</p> 4124 <p>se+key →</p> 4125 <p> 4126 <em>root_alias*+key </em> 4127 </p> 4128 <p>→ root+key</p> 4129 </td> 4130 <td id="rkc43" style="vertical-align: top;"> 4131 <p>* If there is a root_alias to another key or locale, then 4132 insert that entire chain. For example, suppose that months for 4133 another calendar system have a root alias to Gregorian months. 4134 In that case, the root alias would change the key, and retry 4135 from se-FI downward. This can happen multiple times.</p> 4136 <blockquote> 4137 <p>se-FI+key →</p> 4138 <p>se+key →</p> 4139 <p>root_alias*+key →</p> 4140 <p> 4141 <em>se-FI+key2 →</em> 4142 </p> 4143 <p> 4144 <em>se+key2 →</em> 4145 </p> 4146 <p>root_alias*+key2 →</p> 4147 <p>root+key2</p> 4148 </blockquote> 4149 </td> 4150 </tr> 4151 </tbody> 4152 </table> 4153 </div> 4154 <p>Both the resource bundle inheritance and the inherited item 4155 inheritance use the parentLocale data, where available, instead of 4156 simple trunctation.</p> 4157 <p>The fallback is a bit different for these two cases; internal 4158 aliases and keys are are not involved in the bundle lookup, and the 4159 default locale is not involved in the item lookup. If the 4160 default-locale were used in the resource-item lookup, then strange 4161 results will occur. For example, suppose that the default locale is 4162 Swedish, and there is a Nama locale but no specific inherited item 4163 for collation. If the default-locale were used in resource-item 4164 lookup, it would produce odd and unexpected results for Nama sorting. 4165 </p> 4166 <p>The default locale is not even always used in resource bundle 4167 inheritance. For the following services, the fallback is always 4168 directly to the root locale rather than through default locale.</p> 4169 <ul> 4170 <li>collation</li> 4171 <li>break iteration</li> 4172 <li>case mapping</li> 4173 <li>transliteration 4174 <ul> 4175 <li>The lookup for transliteration is yet more complicated 4176 because of the interplay of source and target locales: see <em>Part 4177 2 General, Section 10.1 <a 4178 href="http://www.unicode.org/reports/tr35/tr35-general.html#Inheritance">Inheritance.</a> 4179 </em> 4180 </li> 4181 </ul> 4182 </li> 4183 </ul> 4184 <p> 4185 Thus if there is no Akan locale, for example, asking for a collation 4186 for Akan should produce the root collation, <em>not the Swedish 4187 collation.</em> 4188 </p> 4189 <p>The inherited item lookup must remain stable, because the 4190 resources are built with a certain fallback in mind; changing the 4191 core fallback order can render the bundle structure incoherent.</p> 4192 <p> 4193 Resource bundle lookup, on the other hand, is more flexible; changes 4194 in the view of the "best" match between the input request 4195 and the output bundle are more tolerant, when represent overall 4196 improvements for users. For more information, see <i> <a 4197 href="#Fallback_Elements">A.1 Element fallback</a></i>. 4198 </p> 4199 <p> 4200 Where the LDML inheritance relationship does not match a target 4201 system, such as POSIX, the data logically should be fully resolved in 4202 converting to a format for use by that system, by adding <i>all</i> 4203 inherited data to each locale data set. 4204 </p> 4205 <p> 4206 For a more complete description of how inheritance applies to data, 4207 and the use of keywords, see <i><a 4208 href="#Inheritance_and_Validity">Section 4.2 Inheritance </a></i>. 4209 </p> 4210 <p> 4211 The locale data does not contain general character properties that 4212 are derived from the <i>Unicode Character Database</i> [<a 4213 href="http://unicode.org/reports/tr41/#UAX44">UAX44</a>]. That data 4214 being common across locales, it is not duplicated in the bundles. 4215 Constructing a POSIX locale from the CLDR data requires use of UCD 4216 data. In addition, POSIX locales may also specify the character 4217 encoding, which requires the data to be transformed into that target 4218 encoding. 4219 </p> 4220 <p> 4221 <b>Warning: </b>If a locale has a different script than its parent 4222 (for example, sr_Latn), then special attention must be paid to make 4223 sure that all inheritance is covered. For example, auxiliary exemplar 4224 characters may need to be empty ("[]") to block 4225 inheritance. 4226 </p> 4227 <p> 4228 <strong>Empty Override:</strong> There is one special value reserved 4229 in LDML to indicate that a child locale is to have no value for a 4230 path, even if the parent locale has a value for that path. That value 4231 is "∅∅∅". For example, if there is no phrase for "two 4232 days ago" in a language, that can be indicated with: 4233 </p> 4234 <pre><field type="day"> 4235 <relative type="-2">∅∅∅</relative> 4236</pre> 4237 <h4> 4238 <a name="Multiple_Inheritance"></a><a name="Lateral_Inheritance" 4239 href="#Lateral_Inheritance">4.1.2 Lateral Inheritance </a> 4240 </h4> 4241 <p> 4242 In clearly specified instances, resources may inherit from within the 4243 same locale. For example, currency format symbols inherit from the 4244 number format symbols; the Buddhist calendar inherits from the 4245 Gregorian calendar. This <i>only</i> happens where documented in this 4246 specification. In these special cases, the inheritance functions as 4247 normal, up to the root. If the data is not found along that path, 4248 then a second search is made, logically changing the 4249 element/attribute to the alternate values. 4250 </p> 4251 <p> 4252 For example, for the locale "en_US" the month data in 4253 <calendar class="<span style="color: blue">buddhist</span>"> 4254 inherits first from <calendar class="<span 4255 style="color: blue">buddhist</span>"> in "en", 4256 then in "root". If not found there, then it inherits from 4257 <calendar type="<span style="color: blue">gregorian</span>"> 4258 in "en_US", then "en", then in "root". 4259 </p> 4260 <p>There is one special case, for items with a "count" 4261 parameter (used to select a plural form). In that case, the 4262 inheritance works as follows:</p> 4263 <p>If there is no value for a path, and that path has a 4264 [@count="x"] attribute and value, then:</p> 4265 <ol> 4266 <li>If "x" is anything but "other", it falls 4267 back to [@count="other"], within that the same locale.</li> 4268 <li>In the special case of currencies, if the 4269 [@count="other"] value is missing, it falls back to the 4270 path that is completely missing the count item.</li> 4271 <li>If there is no value within the same locale, the same 4272 process is used in the parent locale, and so on.</li> 4273 </ol> 4274 <p> 4275 <em>Examples:</em> 4276 </p> 4277 <table class='simple' border="1" cellpadding="3" cellspacing="0" id="a1bn3"> 4278 <caption> 4279 <a name="Count_Fallback_normal" href="#Count_Fallback_normal">Count 4280 Fallback: normal</a> 4281 </caption> 4282 <tbody> 4283 <tr> 4284 <th nowrap style="vertical-align: top;">Locale</th> 4285 <th nowrap style="vertical-align: top;">Path</th> 4286 </tr> 4287 <tr> 4288 <td nowrap style="vertical-align: top;">fr-CA</td> 4289 <td nowrap id="iqaw" style="vertical-align: top;"><code> 4290 //ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong> 4291 </code></td> 4292 </tr> 4293 <tr> 4294 <td nowrap style="vertical-align: top;">fr-CA</td> 4295 <td nowrap id="iqaw16" style="vertical-align: top;"><code> 4296 //ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong> 4297 </code></td> 4298 </tr> 4299 <tr> 4300 <td nowrap style="vertical-align: top;">fr</td> 4301 <td nowrap id="iqaw19" style="vertical-align: top;"><code> 4302 //ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong> 4303 </code></td> 4304 </tr> 4305 <tr> 4306 <td nowrap style="vertical-align: top;">fr</td> 4307 <td nowrap id="iqaw18" style="vertical-align: top;"><code> 4308 //ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong> 4309 </code></td> 4310 </tr> 4311 <tr> 4312 <td nowrap style="vertical-align: top;">root</td> 4313 <td nowrap id="iqaw21" style="vertical-align: top;"><code> 4314 //ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong> 4315 </code></td> 4316 </tr> 4317 <tr> 4318 <td nowrap style="vertical-align: top;">root</td> 4319 <td nowrap id="iqaw20" style="vertical-align: top;"><code> 4320 //ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong> 4321 </code></td> 4322 </tr> 4323 </tbody> 4324 </table> 4325 <p>Note that there may be an alias in root that changes the path 4326 and starts again from the requested locale, such as:</p> 4327 <p> 4328 <code> 4329 <unitLength type="<strong>narrow</strong>"><br> 4330 <alias source="locale" 4331 path="../unitLength[@type='<strong>short</strong>']"/><br> 4332 </unitLength> 4333 </code> 4334 </p> 4335 <table class='simple' border="1" cellpadding="3" cellspacing="0" id="a1bn2"> 4336 <caption> 4337 <a name="Count_Fallback_currency" href="#Count_Fallback_currency">Count 4338 Fallback: currency</a> 4339 </caption> 4340 <tbody> 4341 <tr> 4342 <th nowrap style="vertical-align: top;">Locale</th> 4343 <th nowrap style="vertical-align: top;">Path</th> 4344 </tr> 4345 <tr> 4346 <td nowrap style="vertical-align: top;">fr-CA</td> 4347 <td nowrap id="iqaw11" style="vertical-align: top;"><code> 4348 //ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong> 4349 </code></td> 4350 </tr> 4351 <tr> 4352 <td nowrap style="vertical-align: top;">fr-CA</td> 4353 <td nowrap id="iqaw6" style="vertical-align: top;"><code> 4354 //ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong> 4355 </code></td> 4356 </tr> 4357 <tr> 4358 <td nowrap style="vertical-align: top;">fr-CA</td> 4359 <td nowrap id="iqaw8" style="vertical-align: top;"><code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td> 4360 </tr> 4361 <tr> 4362 <td nowrap style="vertical-align: top;">fr</td> 4363 <td nowrap id="iqaw15" style="vertical-align: top;"><code> 4364 //ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong> 4365 </code></td> 4366 </tr> 4367 <tr> 4368 <td nowrap style="vertical-align: top;">fr</td> 4369 <td nowrap id="iqaw14" style="vertical-align: top;"><code> 4370 //ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong> 4371 </code></td> 4372 </tr> 4373 <tr> 4374 <td nowrap style="vertical-align: top;">fr</td> 4375 <td nowrap id="iqaw13" style="vertical-align: top;"><code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td> 4376 </tr> 4377 <tr> 4378 <td nowrap style="vertical-align: top;">root</td> 4379 <td nowrap id="iqaw25" style="vertical-align: top;"><code> 4380 //ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong> 4381 </code></td> 4382 </tr> 4383 <tr> 4384 <td nowrap style="vertical-align: top;">root</td> 4385 <td nowrap id="iqaw24" style="vertical-align: top;"><code> 4386 //ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong> 4387 </code></td> 4388 </tr> 4389 <tr> 4390 <td nowrap style="vertical-align: top;">root</td> 4391 <td nowrap id="iqaw23" style="vertical-align: top;"><code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td> 4392 </tr> 4393 </tbody> 4394 </table> 4395 <br> 4396 <h4> 4397 <a name="Parent_Locales" href="#Parent_Locales">4.1.3 Parent 4398 Locales</a> 4399 </h4> 4400 <p class="dtd"> 4401 <!ELEMENT parentLocales ( parentLocale* ) ><br> 4402 <!ELEMENT parentLocale EMPTY ><br> <!ATTLIST 4403 parentLocale parent NMTOKEN #REQUIRED 4404 ><br> <!ATTLIST parentLocale locales NMTOKENS #REQUIRED > 4405 </p> 4406 <p>In some cases, the normal truncation inheritance does not 4407 function well. This happens when:</p> 4408 <ol> 4409 <li>The child locale is of a different script. In this case, 4410 mixing elements from the parent into the child data results in a 4411 mishmash.</li> 4412 <li>A large number of child locales behave similarly, and 4413 differently from the truncation parent.</li> 4414 </ol> 4415 <p> 4416 The <span class="element">parentLocale</span> element is used to 4417 override the normal inheritance when accessing CLDR data. 4418 </p> 4419 <p>For case 1, the children are script locales, and the parent is 4420 "root". For example:</p> 4421 <pre> <parentLocale parent="root" locales="az_Cyrl ha_Arab … zh_Hant"/></pre> 4422 <p>For case 2, the children and parent share the same primary 4423 language, but the region is changed. For example:</p> 4424 <pre> <parentLocale parent="es_419" locales="es_AR es_BO … es_UY es_VE"/></pre> 4425 <p>Collation data, however, is an exception. Since collation rules 4426 do not truly inherit data from the parent, the parentLocale element 4427 is not necessary and not used for collation. Thus, for a locale like 4428 zh_Hant in the example above, the parentLocale element would dictate 4429 the parent as "root" when referring to main locale data, 4430 but for collation data, the parent locale would still be 4431 "zh", even though the parentLocale element is present for 4432 that locale.</p> 4433 <p> 4434 Since parentLocale information is not localizable on a per locale 4435 basis, the parentLocale information is contained in CLDR’s <a 4436 href="tr35-info.html">supplemental data.</a> 4437 </p> 4438 <p> 4439 When a <span class="element">parentLocale</span> element is used to 4440 override normal inheritance, the following invariants must always be 4441 true: 4442 </p> 4443 <ol> 4444 <li>If X is the parentLocale of Y, then either X is the root 4445 locale, or X has the same base language code as Y. For example, the 4446 parent of "en" cannot be "fr", and the parent of 4447 "en_YY" cannot be "fr" or "fr_XX".</li> 4448 <li>If X is the parentLocale of Y, Y must not be a base language 4449 locale. For example, the parent of "en" cannot be 4450 "en_XX".</li> 4451 <li>There can never be cycles, such as: X parent of Y ... parent 4452 of X.</li> 4453 </ol> 4454 <h3> 4455 <a name="Inheritance_and_Validity" href="#Inheritance_and_Validity">4.2 4456 Inheritance and Validity</a> 4457 </h3> 4458 <p>The following describes in more detail how to determine the 4459 exact inheritance of elements, and the validity of a given element in 4460 LDML.</p> 4461 <h4> 4462 <a name="Definitions" href="#Definitions">4.2.1 Definitions</a> 4463 </h4> 4464 <p> 4465 <i>Blocking</i> elements are those whose subelements do not inherit 4466 from parent locales. For example, a <collation> element is a 4467 blocking element: everything in a <collation> element is 4468 treated as a single lump of data, as far as inheritance is concerned. 4469 For more information, see <a href="#Valid_Attribute_Values">Section 4470 5.5 Valid Attribute Values</a>. 4471 </p> 4472 <p> 4473 Attributes that serve to distinguish multiple elements at the same 4474 level are called <i>distinguishing</i> attributes. For example, the <i>type</i> 4475 attribute distinguishes different elements in lists of translations, 4476 such as: 4477 </p> 4478 <pre><language type="aa">Afar</language> 4479<language type="ab">Abkhazian</language></pre> 4480 <p> 4481 Distinguishing attributes affect inheritance; two elements with 4482 different distinguishing attributes are treated as different for 4483 purposes of inheritance. For more information, see <a 4484 href="#Valid_Attribute_Values">Section 5.5 Valid Attribute 4485 Values</a>. Other attributes are called nondistinguishing (or 4486 informational) attributes. These carry separate information, and do 4487 not affect inheritance. 4488 </p> 4489 <p> 4490 For any element in an XML file, <i>an element chain</i> is a resolved 4491 [<a href="#XPath">XPath</a>] leading from the root to an element, 4492 with attributes on each element in alphabetical order. So in, say, <a 4493 href="http://unicode.org/cldr/data/common/main/el.xml">http://unicode.org/cldr/data/common/main/el.xml</a> 4494 we may have: 4495 </p> 4496 <pre><ldml> 4497 <identity> 4498 <version number="1.1" /> 4499 <language type="el" /> 4500 </identity> 4501 <localeDisplayNames> 4502 <languages> 4503 <language type="ar">Αραβικά</language> 4504...</pre> 4505 <p>Which gives the following element chains (among others):</p> 4506 <ul> 4507 <li>//ldml/identity/version[@number="1.1"]</li> 4508 <li>//ldml/localeDisplayNames/languages/language[@type="ar"]</li> 4509 </ul> 4510 <p> 4511 An element chain A is an <i>extension</i> of an element chain B if B 4512 is equivalent to an initial portion of A. For example, #2 below is an 4513 extension of #1. (Equivalent, depending on the tree, may not be 4514 "identical to". See below for an example.) 4515 </p> 4516 <ol> 4517 <li>//ldml/localeDisplayNames</li> 4518 <li>//ldml/localeDisplayNames/languages/language[@type="ar"]</li> 4519 </ol> 4520 <p> 4521 An LDML file can be thought of as an ordered list of <i>element 4522 pairs</i>: <element chain, data>, where the element chains are all 4523 the chains for the end-nodes. (This works because of restrictions on 4524 the structure of LDML, including that it does not allow mixed 4525 content.) The ordering is the ordering that the element chains are 4526 found in the file, and thus determined by the DTD. 4527 </p> 4528 <p>For example, some of those pairs would be the following. Notice 4529 that the first has the null string as element contents.</p> 4530 <ul> 4531 <li><b><</b>//ldml/identity/version[@number="1.1"]<b>, 4532 </b>""<b>></b></li> 4533 <li><b><</b>//ldml/localeDisplayNames/languages/language[@type="ar"]<b>, 4534 </b>"Αραβικά"<b>></b></li> 4535 </ul> 4536 <blockquote> 4537 <p> 4538 <b>Note: </b>There are two exceptions to this: 4539 </p> 4540 <ol> 4541 <li>Blocking nodes and their contents are treated as a single 4542 end node.</li> 4543 <li>In terms of computing inheritance, the element pair 4544 consists of the element chain plus all distinguishing attributes; 4545 the value consists of the value (if any) plus any nondistinguishing 4546 attributes.</li> 4547 </ol> 4548 <blockquote> 4549 <p>Thus instead of the element pair being (a) below, it is (b):</p> 4550 <ol type="a"> 4551 <li><b><</b>//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart[@day='sun'][@time='00:00']<b>,</b><br> 4552 <b>""></b></li> 4553 <li><b><</b>//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart<b>,</b><br> 4554 [@day='sun'][@time='00:00']<b>></b></li> 4555 </ol> 4556 </blockquote> 4557 </blockquote> 4558 <p> 4559 Two LDML element chains are <i>equivalent</i> when they would be 4560 identical if all attributes and their values were removed — except 4561 for distinguishing attributes. Thus the following are equivalent: 4562 </p> 4563 <ul> 4564 <li><code>//ldml/localeDisplayNames/languages/language[@type="ar"]</code></li> 4565 <li><code>//ldml/localeDisplayNames/languages/language[@type="ar"][@draft="unconfirmed"]</code></li> 4566 </ul> 4567 <p> 4568 For any locale ID, an <i>locale chain</i> is an ordered list starting 4569 with the root and leading down to the ID. For example: 4570 </p> 4571 <blockquote> 4572 <p><root, de, de_DE, de_DE_xxx></p> 4573 </blockquote> 4574 <h4> 4575 <a name="Resolved_Data_File" href="#Resolved_Data_File">4.2.2 4576 Resolved Data File</a> 4577 </h4> 4578 <p>To produce fully resolved locale data file from CLDR for a 4579 locale ID L, you start with L, and successively add unique items from 4580 the parent locales until you get up to root. More formally, this can 4581 be expressed as the following procedure.</p> 4582 <ol> 4583 <li>Let Result be initially L.</li> 4584 <li>For each Li in the locale chain for L, starting at L and 4585 going up to root: 4586 <ol> 4587 <li>Let Temp be a copy of the pairs in the LDML file for Li</li> 4588 <li>Replace each alias in Temp by the resolved list of pairs 4589 it points to. 4590 <ol> 4591 <li>The resolved list of pairs is obtained by recursively 4592 applying this procedure.</li> 4593 <li>That alias now blocks any inheritance from the parent. 4594 (See <i><a href="#Common_Elements">Section 5.1 Common 4595 Elements</a></i> for an example.) 4596 </li> 4597 </ol> 4598 </li> 4599 <li>For each element pair P in Temp: 4600 <ol> 4601 <li>If P does not contain a blocking element, and Result 4602 does not have an element pair Q with an equivalent element 4603 chain, add P to Result.</li> 4604 </ol> 4605 </li> 4606 </ol> 4607 </li> 4608 </ol> 4609 <p> 4610 <b>Notes:</b> 4611 </p> 4612 <ul> 4613 <li>When adding an element pair to a result, it has to go in the 4614 right order for it to be valid according to the DTD.</li> 4615 <li>The identity element and its children are unaffected by 4616 resolution.</li> 4617 <li>The LDML data must be constructed so as to avoid circularity 4618 in step 2.2.</li> 4619 </ul> 4620 <h4> 4621 <a name="Valid_Data" href="#Valid_Data">4.2.3 Valid Data</a> 4622 </h4> 4623 <p> 4624 The attribute <i>draft="x" </i>in LDML means that the data 4625 has not been approved by the subcommittee. (For more information, see 4626 <a href="http://cldr.unicode.org/index/process">Process</a>). 4627 However, some data that is not explicitly marked as <i>draft </i>may 4628 be implicitly <i>draft</i>, either because it inherits it from a 4629 parent, or from an enclosing element. 4630 </p> 4631 <p> 4632 <b>Example 2. </b>Suppose that new locale data is added for af 4633 (Afrikaans). To indicate that all of the data is <i>unconfirmed</i>, 4634 the attribute can be added to the top level. 4635 </p> 4636 <p> 4637 <code> 4638 <ldml version="1.1" draft="unconfirmed"><br> 4639 <identity><br> <version 4640 number="1.1" /> <br> <language 4641 type="af" /> <br> </identity><br> 4642 <characters>...</characters><br> 4643 <localeDisplayNames>...</localeDisplayNames><br> 4644 </ldml> 4645 </code> 4646 </p> 4647 <p> 4648 Any data can be added to that file, and the status will all be draft=<i>unconfirmed</i>. 4649 Once an item is vetted—<i>whether it is inherited or explicitly 4650 in the file</i>—then its status can be changed to <i>approved</i>. This 4651 can be done either by leaving draft="unconfirmed" on the 4652 enclosing element and marking the child with 4653 draft="approved", such as: 4654 </p> 4655 <p> 4656 <code> 4657 <ldml version="1.1" draft="unconfirmed"><br> 4658 <identity><br> <version 4659 number="1.1" /> <br> <language 4660 type="af" /> <br> </identity><br> 4661 <characters 4662 draft="approved">...</characters><br> 4663 <localeDisplayNames>...</localeDisplayNames><br> 4664 <dates/><br> <numbers/><br> 4665 <collations/><br> </ldml> 4666 </code> 4667 </p> 4668 <p> 4669 However, normally the draft attributes should be canonicalized, which 4670 means they are pushed down to leaf nodes as described in <i><a 4671 href="#Canonical_Form">Section 5.6 Canonical Form</a></i>. If an LDML 4672 file does has draft attributes that are not on leaf nodes, the file 4673 should be interpreted as if it were the canonicalized version of that 4674 file. 4675 </p> 4676 <p>More formally, here is how to determine whether data for an 4677 element chain E is implicitly or explicitly draft, given a locale L. 4678 Sections 1, 2, and 4 are simply formalizations of what is in LDML 4679 already. Item 3 adds the new element.</p> 4680 <h4> 4681 <a name="Checking_for_Draft_Status" href="#Checking_for_Draft_Status">4.2.4 4682 Checking for Draft Status</a> 4683 </h4> 4684 <ol> 4685 <li><b>Parent Locale Inheritance</b> 4686 <ol> 4687 <li>Walk through the locale chain until you find a locale ID 4688 L' with a data file D. (L' may equal L).</li> 4689 <li>Produce the fully resolved data file D' for D.</li> 4690 <li>In D', find the first element pair whose element chain 4691 E' is either equivalent to or an extension of E.</li> 4692 <li>If there is no such E', return <i>true</i></li> 4693 <li>If E' is not equivalent to E, truncate E' to the 4694 length of E.</li> 4695 </ol></li> 4696 <li><b>Enclosing Element Inheritance</b> 4697 <ol> 4698 <li>Walk through the elements in E', from back to front. 4699 <ol> 4700 <li>If you ever encounter draft=<i>x</i>, return <i>x</i></li> 4701 </ol> 4702 </li> 4703 <li>If L' = L, return <i>false</i></li> 4704 </ol></li> 4705 <li><b>Missing File Inheritance</b> 4706 <ol> 4707 <li>Otherwise, walk again through the elements in E', from 4708 back to front. 4709 <ol> 4710 <li>If you encounter a validSubLocales attribute 4711 (deprecated): 4712 <ol> 4713 <li>If L is in the attribute value, return <i>false</i></li> 4714 <li>Otherwise return <i>true</i></li> 4715 </ol> 4716 </li> 4717 </ol> 4718 </li> 4719 </ol></li> 4720 <li><b>Otherwise</b> 4721 <ol> 4722 <li>Return <i>true</i></li> 4723 </ol></li> 4724 </ol> 4725 <p>The validSubLocales in the most specific (farthest from root 4726 file) locale file "wins" through the full resolution step 4727 (data from more specific files replacing data from less specific 4728 ones).</p> 4729 <h4> 4730 <a name="Keyword_and_Default_Resolution" 4731 href="#Keyword_and_Default_Resolution">4.2.5 Keyword and Default 4732 Resolution</a> 4733 </h4> 4734 <p>When accessing data based on keywords, the following process is 4735 used. Consider the following example:</p> 4736 <ul> 4737 <li>The locale 'de' has collation types A, B, C, and no 4738 <default> element</li> 4739 <li>The locale 'de_CH' has <default 4740 type='B'></li> 4741 </ul> 4742 <p>Here are the searches for various combinations.</p> 4743 <table class='simple' border="1" cellpadding="0" cellspacing="0"> 4744 <tr> 4745 <td><strong>User Input</strong></td> 4746 <td><strong>Lookup in Locale</strong></td> 4747 <td><strong>For</strong></td> 4748 <td><strong>Comment</strong></td> 4749 </tr> 4750 <tr> 4751 <td rowspan="3">de_CH<br> <em>no keyword</em></td> 4752 <td>de_CH</td> 4753 <td>default collation type</td> 4754 <td>finds "B"</td> 4755 </tr> 4756 <tr> 4757 <td>de_CH</td> 4758 <td>collation type=B</td> 4759 <td>not found</td> 4760 </tr> 4761 <tr> 4762 <td>de</td> 4763 <td>collation type=B</td> 4764 <td><em>found</em></td> 4765 </tr> 4766 <tr> 4767 <td rowspan="4">de<br> <em>no keyword</em></td> 4768 <td>de</td> 4769 <td>default collation type</td> 4770 <td>not found</td> 4771 </tr> 4772 <tr> 4773 <td>root</td> 4774 <td>default collation type</td> 4775 <td>finds "standard"</td> 4776 </tr> 4777 <tr> 4778 <td>de</td> 4779 <td>collation type=standard</td> 4780 <td>not found</td> 4781 </tr> 4782 <tr> 4783 <td>root</td> 4784 <td>collation type=standard</td> 4785 <td><i>found</i></td> 4786 </tr> 4787 <tr> 4788 <td>de_u_co_A</td> 4789 <td>de</td> 4790 <td>collation type=A</td> 4791 <td><i>found</i></td> 4792 </tr> 4793 <tr> 4794 <td rowspan="2">de_u_co_standard</td> 4795 <td>de</td> 4796 <td>collation type=standard</td> 4797 <td>not found</td> 4798 </tr> 4799 <tr> 4800 <td>root</td> 4801 <td>collation type=standard</td> 4802 <td><i>found</i></td> 4803 </tr> 4804 <tr> 4805 <td rowspan="6">de_u_co_foobar</td> 4806 <td>de</td> 4807 <td>collation type=foobar</td> 4808 <td>not found</td> 4809 </tr> 4810 <tr> 4811 <td>root</td> 4812 <td>collation type=foobar</td> 4813 <td>not found, starts looking for default</td> 4814 </tr> 4815 <tr> 4816 <td>de</td> 4817 <td>default collation type</td> 4818 <td>not found</td> 4819 </tr> 4820 <tr> 4821 <td>root</td> 4822 <td>default collation type</td> 4823 <td>finds "standard"</td> 4824 </tr> 4825 <tr> 4826 <td>de</td> 4827 <td>collation type=standard</td> 4828 <td>not found</td> 4829 </tr> 4830 <tr> 4831 <td>root</td> 4832 <td>collation type=standard</td> 4833 <td><i>found</i></td> 4834 </tr> 4835 </table> 4836 <p>Examples of "search" collator lookup; 'de' has a 4837 language-specific version, but 'en' does not:</p> 4838 <table class='simple' border="1" cellpadding="0" cellspacing="0"> 4839 <tr> 4840 <td><strong>User Input</strong></td> 4841 <td><strong>Lookup in Locale</strong></td> 4842 <td><strong>For</strong></td> 4843 <td><strong>Comment</strong></td> 4844 </tr> 4845 <tr> 4846 <td rowspan="2">de_CH_u_co_search</td> 4847 <td>de_CH</td> 4848 <td>collation type=search</td> 4849 <td>not found</td> 4850 </tr> 4851 <tr> 4852 <td>de</td> 4853 <td>collation type=search</td> 4854 <td><i>found</i></td> 4855 </tr> 4856 <tr> 4857 <td rowspan="3">en_US_u_co_search</td> 4858 <td>en_US</td> 4859 <td>collation type=search</td> 4860 <td>not found</td> 4861 </tr> 4862 <tr> 4863 <td>en</td> 4864 <td>collation type=search</td> 4865 <td>not found</td> 4866 </tr> 4867 <tr> 4868 <td>root</td> 4869 <td>collation type=search</td> 4870 <td><i>found</i></td> 4871 </tr> 4872 </table> 4873 <p>Examples of lookup for Chinese collation types. Note:</p> 4874 <ul> 4875 <li>All of the Chinese-specific collation types are provided in 4876 the 'zh' locale</li> 4877 <li>For 'zh' the <default> element specifies 4878 "pinyin"; for 'zh_Hant' the <default> element 4879 specifies "stroke". However any of the available Chinese 4880 collation types can be explicitly requested for any Chinese locale.</li> 4881 </ul> 4882 <table class='simple' border="1" cellpadding="0" cellspacing="0"> 4883 <tr> 4884 <td><strong>User Input</strong></td> 4885 <td><strong>Lookup in Locale</strong></td> 4886 <td><strong>For</strong></td> 4887 <td><strong>Comment</strong></td> 4888 </tr> 4889 <tr> 4890 <td rowspan="3">zh_Hant<br> <em>no keyword</em></td> 4891 <td>zh_Hant</td> 4892 <td>default collation type</td> 4893 <td>finds "stroke"</td> 4894 </tr> 4895 <tr> 4896 <td>zh_Hant</td> 4897 <td>collation type=stroke</td> 4898 <td>not found</td> 4899 </tr> 4900 <tr> 4901 <td>zh</td> 4902 <td>collation type=stroke</td> 4903 <td><i>found</i></td> 4904 </tr> 4905 <tr> 4906 <td rowspan="3">zh_Hant_HK_u_co_pinyin</td> 4907 <td>zh_Hant_HK</td> 4908 <td>collation type=pinyin</td> 4909 <td>not found</td> 4910 </tr> 4911 <tr> 4912 <td>zh_Hant</td> 4913 <td>collation type=pinyin</td> 4914 <td>not found</td> 4915 </tr> 4916 <tr> 4917 <td>zh</td> 4918 <td>collation type=pinyin</td> 4919 <td><i>found</i></td> 4920 </tr> 4921 <tr> 4922 <td rowspan="2">zh<br> <em>no keyword</em></td> 4923 <td>zh</td> 4924 <td>default collation type</td> 4925 <td>finds "pinyin"</td> 4926 </tr> 4927 <tr> 4928 <td>zh</td> 4929 <td>collation type=pinyin</td> 4930 <td><i>found</i></td> 4931 </tr> 4932 </table> 4933 <blockquote> 4934 <p> 4935 <b>Note: </b>It is an invariant that the default in root for a given 4936 element must<br> always be a value that exists in root. So you 4937 can not have the following in root: 4938 </p> 4939 </blockquote> 4940 <p> 4941 <code> 4942 <someElements><br> <default 4943 type='a'/><br> <someElement 4944 type='b'>...</someElement><br> 4945 <someElement type='c'>...</someElement><br> 4946 <b> <!-- no 'a' --></b><br> 4947 </someElements> 4948 </code> 4949 </p> 4950 <p>For identifiers, such as language codes, script codes, region 4951 codes, variant codes, types, keywords, currency symbols or currency 4952 display names, the default value is the identifier itself whenever if 4953 no value is found in the root. Thus if there is no display name for 4954 the region code 'QA' in root, then the display name is simply 4955 'QA'. </p> 4956 4957 <h4> 4958 <a name="Inheritance_vs_Related" href="#Inheritance_vs_Related">4.2.6 Inheritance vs Related Information</a> 4959 </h4> 4960 <p>There are related types of data and processing that are easy to confuse:</p> 4961 <table class='simple'> 4962 <tr> 4963 <td rowspan="4"><p><strong>Inheritance</strong></p></td> 4964 <td colspan="2">Part of the internal mechanism used by CLDR to organize and manage locale data. 4965 This is used to share common resources, and ease maintenance, and provide the best fallback behavior in the absence of data. <em>Should not be used for locale matching or likely subtags.</em></td> 4966 </tr> 4967 <tr> 4968 <td><em>Example:</em></td> 4969 <td>parent(en_AU) ⇒ en_001<br> 4970 parent(en_001) ⇒ en<br> 4971 parent(en) ⇒ root</td> 4972 </tr> 4973 <tr> 4974 <td><em>Data: </em></td> 4975 <td>supplementalData.xml <parentLocale></td> 4976 </tr> 4977 <tr> 4978 <td><em>Spec:</em></td> 4979 <td><strong>Section <a href="#Inheritance_and_Validity">4.2 Inheritance and Validity</a></strong></td> 4980 </tr> 4981 <tr> 4982 <td rowspan="4"><strong>DefaultContent</strong></td> 4983 <td colspan="2">Part of the internal mechanism used by CLDR to manage locale data. A particular sublocale is designated the defaultContent for a parent, so that the parent exhibits consistent behavior. <em>Should not be used for locale matching or likely subtags.</em></td> 4984 </tr> 4985 <tr> 4986 <td><em>Example:</em></td> 4987 <td>addLikelySubtags(sr-ME) ⇒ sr-Latn-ME, minimize(de-Latn-DE) ⇒ de</td> 4988 </tr> 4989 <tr> 4990 <td><em>Data: </em></td> 4991 <td>supplementalMetadata.xml <defaultContent></td> 4992 </tr> 4993 <tr> 4994 <td><em>Spec:</em></td> 4995 <td><strong>Part 6: Section 9.3 <a href="tr35-info.html#Default_Content">Default Content</a> 4996 </strong></td> 4997 </tr> 4998 <tr> 4999 <td rowspan="4"><strong>LikelySubtags</strong></td> 5000 <td colspan="2">Provides most likely full subtag (script and region) in the absence of other information. A core component of LocaleMatching.</td> 5001 </tr> 5002 <tr> 5003 <td><em>Example:</em></td> 5004 <td>addLikelySubtags(zh) ⇒ zh-Hans-CN<br> 5005 addLikelySubtags(zh-TW) ⇒ zh-Hant-TW <br> 5006minimize(zh-Hans, favorRegion) ⇒ zh-TW</td> 5007 </tr> 5008 <tr> 5009 <td><em>Data: </em></td> 5010 <td>likelySubtags.xml <likelySubtags></td> 5011 </tr> 5012 <tr> 5013 <td><em>Spec:</em></td> 5014 <td><strong>Section <a href="#Likely_Subtags">4.3 Likely 5015 Subtags</a></strong></td> 5016 </tr> 5017 <tr> 5018 <td rowspan="4"><strong>LocaleMatching</strong></td> 5019 <td colspan="2">Provides the best match for the user’s language(s) among an application’s supported languages. </td> 5020 </tr> 5021 <tr> 5022 <td><em>Example:</em></td> 5023 <td>bestLocale(userLangs=<en, fr>, appLangs=<fr-CA, ru>) ⇒ fr-CA</td> 5024 </tr> 5025 <tr> 5026 <td><em>Data: </em></td> 5027 <td>languageInfo.xml <languageMatching></td> 5028 </tr> 5029 <tr> 5030 <td><em>Spec:</em></td> 5031 <td><strong>Section 5032 <a href="#LanguageMatching">4.4 Language Matching</a></strong></td> 5033 </tr> 5034 </table> 5035 5036 5037 <h3> 5038 <a name="Likely_Subtags" href="#Likely_Subtags">4.3 Likely 5039 Subtags</a> 5040 </h3> 5041 <p class="dtd"> 5042 <!ELEMENT likelySubtag EMPTY ><br> <!ATTLIST 5043 likelySubtag from NMTOKEN #REQUIRED><br> <!ATTLIST 5044 likelySubtag to NMTOKEN #REQUIRED> 5045 </p> 5046 <p>There are a number of situations where it is useful to be able 5047 to find the most likely language, script, or region. For example, 5048 given the language "zh" and the region "TW", what 5049 is the most likely script? Given the script "Thai" what is 5050 the most likely language or region? Given the region TW, what is the 5051 most likely language and script?</p> 5052 <p>Conversely, given a locale, it is useful to find out which 5053 fields (language, script, or region) may be superfluous, in the sense 5054 that they contain the likely tags. For example, "en_Latn" 5055 can be simplified down to "en" since "Latn" is 5056 the likely script for "en"; "ja_Jpan_JP" can be 5057 simplified down to "ja".</p> 5058 <p> 5059 The <i>likelySubtag</i> supplemental data provides default 5060 information for computing these values. This data is based on the 5061 default content data, the population data, and the the 5062 suppress-script data in [<a href="#BCP47">BCP47</a>]. It is 5063 heuristically derived, and may change over time. 5064 </p> 5065 <p>For the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see <strong><em>Section 4.2.6 <a 5066 href="tr35.html#Inheritance_vs_Related">Inheritance vs Related Information</a></em></strong>.</p> 5067 <p> 5068 To look up data in the table, see if a locale matches one of the <b>from</b> 5069 attribute values. If so, fetch the corresponding <b>to</b> attribute 5070 value. For example, the Chinese data looks like the following: 5071 </p> 5072 <blockquote> 5073 <p class="example"> 5074 <likelySubtag from="zh" to="zh_Hans_CN"/><br> 5075 <likelySubtag from="zh_HK" 5076 to="zh_Hant_HK"/><br> <likelySubtag 5077 from="zh_Hani" to="zh_Hani_CN"/><br> 5078 <likelySubtag from="zh_Hant" 5079 to="zh_Hant_TW"/><br> <likelySubtag 5080 from="zh_MO" to="zh_Hant_MO"/><br> 5081 <likelySubtag from="zh_TW" 5082 to="zh_Hant_TW"/> 5083 </p> 5084 </blockquote> 5085 <p>So looking up "zh_TW" returns "zh_Hant_TW", 5086 while looking up "zh" returns "zh_Hans_CN".</p> 5087 <p>In more detail, the data is designed to be used in the 5088 following operations.</p> 5089 <p> 5090 Note that as of CLDR v24, any field present in the 'from' field, is 5091 also present in the 'to' field, so an input field will not change in 5092 "Add Likely Subtags" operation. The data and operations can 5093 also be used with language tags using [<a href="#BCP47">BCP47</a>] 5094 syntax, with the appropriate changes. In addition, certain common 5095 'denormalized' language subtags such as 'iw' (for 'he') may occur in 5096 both the 'from' and 'to' fields. This allows for implementations that 5097 use those denormalized subtags to use the data with only minor 5098 changes to the operations. 5099 </p> 5100 <p> </p> 5101 <p> 5102 <i><b>Add Likely Subtags: </b></i><em>Given a source locale X, 5103 to return a locale Y where the empty subtags have been filled in by 5104 the most likely subtags.</em> This is written as X ⇒ Y ("X maximizes 5105 to Y"). 5106 </p> 5107 <p> 5108 A subtag is called <em>empty</em> if it is a missing script or region 5109 subtag, or it is a base language subtag with the value 5110 "und". In the description below, a subscript on a subtag <em>x</em> 5111 indicates which tag it is from: <em>x<sub>s</sub></em> is in the 5112 source, <em>x<sub>m</sub></em>is in a match, and <em>x<sub>r</sub></em> 5113 is in the final result. 5114 </p> 5115 <p>This operation is performed in the following way.</p> 5116 <ol> 5117 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><strong>Canonicalize.</strong> 5118 <ol> 5119 <li>Make sure the input locale is in canonical form: uses the 5120 right separator, and has the right casing.</li> 5121 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Replace 5122 any deprecated subtags with their canonical values using the 5123 <alias> data in supplemental metadata. Use the first value 5124 in the replacement list, if it exists. Language tag replacements 5125 may have multiple parts, such as "sh" ➞ 5126 "sr_Latn" or mo" ➞ "ro_MD". In such a 5127 case, the original script and/or region are retained if there is 5128 one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not 5129 "sr_Latn_AQ".</li> 5130 <li>If the tag is grandfathered (see <variable 5131 id="$grandfathered" type="choice"> in the 5132 supplemental data), then return it.</li> 5133 <li>Remove the script code 'Zzzz' and the region code 5134 'ZZ' if they occur.</li> 5135 <li>Get the components of the cleaned-up source tag <em>(language<sub>s</sub>, 5136 script<sub>s</sub>, 5137 </em>and<em> region<sub>s</sub></em>), plus any variants and extensions. 5138 </li> 5139 </ol></li> 5140 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><strong>Lookup. 5141 </strong>Lookup each of the following in order, and stop on the first match: 5142 <ol> 5143 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><em>language<sub>s</sub>_script<sub>s</sub>_region<sub>s</sub></em></li> 5144 5145 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><em>language<sub>s</sub>_region<sub>s</sub></em></li> 5146 5147 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><em>language<sub>s</sub>_script<sub>s</sub></em></li> 5148 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><em><em>language<sub>s</sub></em></em></li> 5149 <li>und<em>_script<sub>s</sub></em></li> 5150 </ol></li> 5151 <li><strong>Return</strong> 5152 <ol> 5153 <li>If there is no match,either return 5154 <ol> 5155 <li>an error value, or</li> 5156 <li>the match for "und" (in APIs where a valid 5157 language tag is required).</li> 5158 </ol> 5159 </li> 5160 <li>Otherwise there is a match = <span 5161 style="margin-top: 0.5em; margin-bottom: 0.5em"><em>language<sub>m</sub>_script<sub>m</sub>_region<sub>m</sub></em></span></li> 5162 <li>Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is not 5163 empty, and x<sub>m</sub> otherwise. 5164 </li> 5165 <li>R<span style="margin-top: 0.5em; margin-bottom: 0.5em">eturn 5166 the language tag composed of <em>language<sub>r</sub> _ 5167 script<sub>r</sub> _ region<sub>r</sub></em> + variants + extensions 5168 </span>. 5169 </li> 5170 </ol></li> 5171 </ol> 5172 <p>The lookup can be optimized. For example, if any of the tags in 5173 Step 2 are the same as previous ones in that list, they do not need 5174 to be tested.</p> 5175 <p> 5176 <i>Example1:</i> 5177 </p> 5178 <ul> 5179 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5180 <p>Input is ZH-ZZZZ-SG.</p> 5181 </li> 5182 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5183 <p>Normalize to zh_SG.</p> 5184 </li> 5185 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5186 <p>Lookup in table. No match.</p> 5187 </li> 5188 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5189 <p>Lookup zh, and get the match (zh_Hans_CN). Substitute SG, and 5190 return zh_Hans_SG.</p> 5191 </li> 5192 </ul> 5193 <p>To find the most likely language for a country, or language for 5194 a script, use "und" as the language subtag. For example, 5195 looking up "und_TW" returns zh_Hant_TW.</p> 5196 <p>A goal of the algorithm is that if X ⇒ Y, and X' results from 5197 replacing an empty subtag in X by the the corresponding subtag in Y, 5198 then X' ⇒ Y. For example, if und_AF ⇒ fa_Arab_AF, then:</p> 5199 <ul> 5200 <li>fa_Arab_AF ⇒ fa_Arab_AF</li> 5201 <li>und_Arab_AF ⇒ fa_Arab_AF</li> 5202 <li>fa_AF ⇒ fa_Arab_AF</li> 5203 </ul> 5204 <p>There are a small number of exceptions to this goal in the 5205 current data, where X ∈ {und_Bopo, und_Brai, und_Cakm, und_Limb, 5206 und_Shaw}.</p> 5207 <p> 5208 <b><i>Remove</i></b><i><b> Likely Subtags: </b>Given a locale, 5209 remove any fields that Add Likely Subtags would add.</i> 5210 </p> 5211 <p>The reverse operation removes fields that would be added by the 5212 first operation.</p> 5213 <ol> 5214 <li style="margin-top: 0.5em; margin-bottom: 0.5em">First get 5215 max = AddLikelySubtags(inputLocale). If an error is signaled, return 5216 it.</li> 5217 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Remove the 5218 variants from max.</li> 5219 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Then for <i>trial</i> 5220 in {language, language _ region, language _ script} 5221 <ul> 5222 <li style="margin-top: 0.5em; margin-bottom: 0.5em">If 5223 AddLikelySubtags(<i>trial</i>) = max, then return <i>trial</i> + 5224 variants. 5225 </li> 5226 </ul> 5227 </li> 5228 <li style="margin-top: 0.5em; margin-bottom: 0.5em">If you do 5229 not get a match, return max + variants.</li> 5230 </ol> 5231 <p>Example:</p> 5232 <ul> 5233 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5234 <p>Input is zh_Hant. Maximize to get zh_Hant_TW.</p> 5235 </li> 5236 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5237 <p>zh => zh_Hans_CN. No match, so continue.</p> 5238 </li> 5239 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5240 <p>zh_TW => zh_Hant_TW. Matches, so return zh_TW.</p> 5241 </li> 5242 </ul> 5243 <p>A variant of this favors the script over the region, thus using 5244 {language, language_script, language_region} in the above. If that 5245 variant is used, then the result in this example would be zh_Hant 5246 instead of zh_TW. </p> 5247 <h3> 5248 <a name="LanguageMatching" href="#LanguageMatching">4.4 Language 5249 Matching</a> 5250 </h3> 5251 <p class="dtd"> 5252 <!ELEMENT languageMatching ( languageMatches* ) ><br> 5253 <!ELEMENT languageMatches ( paradigmLocales*, matchVariable*, languageMatch* ) ><br> 5254 <!ATTLIST languageMatches type NMTOKEN #REQUIRED ></p> 5255 <p class="dtd"><!ELEMENT languageMatch EMPTY ><br> <!ATTLIST 5256 languageMatch desired CDATA #REQUIRED ><br> <!ATTLIST 5257 languageMatch supported CDATA #REQUIRED ><br> <!ATTLIST 5258 languageMatch percent NMTOKEN #REQUIRED ><br> 5259 <!ATTLIST languageMatch distance NMTOKEN #IMPLIED ><br> 5260 <!ATTLIST languageMatch oneway ( true | false ) #IMPLIED ></p> 5261 <p class="dtd"><!ELEMENT languageMatches ( paradigmLocales*, matchVariable*, languageMatch* ) ><br> 5262 <!ATTLIST languageMatches type NMTOKEN #REQUIRED ></p> 5263 <p class="dtd"><!ELEMENT paradigmLocales EMPTY ><br> 5264 <!ATTLIST paradigmLocales locales NMTOKENS #REQUIRED > 5265 </p> 5266 <p> 5267 Implementers are often faced with the issue of how to match the 5268 user's requested languages with their product's supported languages. 5269 For example, suppose that a product supports {ja-JP, de, zh-TW}. If 5270 the user understands written American English, German, French, Swiss 5271 German, and Italian, then <strong>de</strong> would be the best 5272 match; if s/he understands only Chinese (zh), then zh-TW would be the 5273 best match. 5274 </p> 5275 <p>The standard truncation-fallback algorithm does not work well 5276 when faced with the complexities of natural language. The language 5277 matching data is designed to fill that gap. Stated in those terms, 5278 language matching can have the effect of a more complex fallback, 5279 such as:</p> 5280 <p> 5281 sr-Cyrl-RS<br> sr-Cyrl<br> sr-Latn-RS<br> sr-Latn<br> 5282 sr<br> hr-Latn<br> hr 5283 </p> 5284 <p>Language matching is used to find the best supported locale ID 5285 given a requested list of languages. The requested list could come 5286 from different sources, such as such as the user's list of preferred 5287 languages in the OS Settings, or from a browser Accept-Language list. 5288 For example, if my native tongue is English, I can understand Swiss 5289 German and German, my French is rusty but usable, and Italian basic, 5290 ideally an implementation would allow me to select {gsw, de, fr} as 5291 my preferred list of languages, skipping Italian because my 5292 comprehension is not good enough for arbitrary content.</p> 5293 <p>Language Matching can also be used to get fallback data elements. In 5294 many cases, there may not be full data for a particular locale. For 5295 example, for a Breton speaker, the best fallback if data is 5296 unavailable might be French. That is, suppose we have found a Breton 5297 bundle, but it does not contain translation for the key "CN" 5298 (for the country China). It is best to return "chine", 5299 rather than falling back to the value default language such as Russian 5300 and getting "Кітай". The language matching data can be 5301 used to get the closest fallback locales (of those supported) to a 5302 given language. 5303</p> 5304 <p>For the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching, see <strong><em>Section 4.2.6 <a 5305 href="tr35.html#Inheritance_vs_Related">Inheritance vs Related Information</a></em></strong>.</p> <p> 5306 When such fallback is used for inherited item lookup, the normal 5307 order of inheritance is used for inherited item lookup, except that 5308 before using any data from <strong>root</strong>, the data for the 5309 fallback locales would be used if available. Language matching does 5310 not interact with the fallback of resources <em>within the 5311 locale-parent chain</em>. For example, suppose that we are looking for 5312 the value for a particular path <strong>P</strong> in <strong>nb-NO</strong>. 5313 In the absence of aliases, normally the following lookup is used. 5314 </p> 5315 <blockquote> 5316 <p> 5317 <strong>nb-NO</strong> → <strong>nb</strong> → <strong>root</strong> 5318 </p> 5319 </blockquote> 5320 <p> 5321 That is, we first look in <strong>nb-NO</strong>. If there is no 5322 value for <strong>P</strong> there, then we look in <strong>nb</strong>. 5323 If there is no value for <strong>P</strong> there, we return the 5324 value for <strong>P</strong> in root (or a code value, if there is 5325 nothing there). Remember that if there is an alias element along this 5326 path, then the lookup may restart with a different path in <strong>nb-NO</strong> 5327 (or another locale). 5328 </p> 5329 <p> 5330 However, suppose that <strong>nb-NO</strong> has the fallback values 5331 <strong>[nn da sv en]</strong>, derived from language matching. In 5332 that case, an implementation <em>may</em> progressively lookup each 5333 of the listed locales, with the appropriate substitutions, returning 5334 the first value that is not found in <strong>root</strong>. This 5335 follows roughly the following pseudocode: 5336 </p> 5337 <ul> 5338 <li>value = lookup(P, nb-NO); if (locationFound != root) return 5339 value;</li> 5340 <li>value = lookup(P, nn-NO); if (locationFound != root) return 5341 value;</li> 5342 <li>value = lookup(P, da-NO); if (locationFound != root) return 5343 value;</li> 5344 <li>value = lookup(P, sv-NO); if (locationFound != root) return 5345 value;</li> 5346 <li>value = lookup(P, en-NO); return value;</li> 5347 </ul> 5348 <p> 5349 The locales in the fallback list are not used recursively. For 5350 example, for the lookup of a path in nb-NO, if <strong>fr</strong> 5351 were a fallback value for <strong>da</strong>, it would not matter 5352 for the above process. Only the original language matters. 5353 </p> 5354 <p>The language matching data is intended to be used according to 5355 the following algorithm. This is a logical description, and can be 5356 optimized for production in many ways. In this algorithm, the 5357 languageMatching data is interpreted as an ordered list.</p> 5358 <p>The language matching algorithm takes a list of a user’s 5359 desired languages, and a list of the application’s supported 5360 languages.</p> 5361 <ul> 5362 <li>Set the best weighted distance BWD to ∞</li> 5363 <li>Set the best desired language BD to null</li> 5364 <li>For each desired language D 5365 <ul> 5366 <li>Compute a discount factor F, based on the position in the 5367 list. 5368 <ul> 5369 <li>This discount factor is up to the implementation, but is 5370 typically a positive value that increases according to how far D 5371 is from the start of the desired language list.</li> 5372 </ul> 5373 </li> 5374 <li>For each supported language S 5375 <ul> 5376 <li>Find the matching distance MD as described below.</li> 5377 <li>Compute the weighted distance as F + MD</li> 5378 <li>If WD < BD 5379 <ul> 5380 <li>BWD = WD</li> 5381 <li>BD = D</li> 5382 </ul> 5383 </li> 5384 </ul> 5385 </li> 5386 </ul> 5387 </li> 5388 <li>If the BWD is less than a threshold, return BD. 5389 <ul> 5390 <li>The threshold is implementation-defined, typically set to 5391 greater than a default region difference, and less than a default 5392 script difference.</li> 5393 </ul> 5394 </li> 5395 <li>Otherwise return a default supported language (like 5396 English).</li> 5397 </ul> 5398 <p>To find the matching distance MD between any two languages, 5399 perform the following steps.</p> 5400 <ol> 5401 <li>Maximize each language using Section 4.3 <a 5402 href="#Likely_Subtags">Likely Subtags</a>. 5403 <ul> 5404 <li>und is a special case: see below.</li> 5405 </ul> 5406 </li> 5407 <li>Set the match-distance MD to 0</li> 5408 <li>For each subtag in the list, starting from the end: region, 5409 script, base-language 5410 <ol> 5411 <li>If respective subtags in each language tag are identical, 5412 remove the subtag from each (logically) and continue.</li> 5413 <li>Traverse the languageMatching data until a match is found. 5414 <ul> 5415 <li>* matches any field.</li> 5416 <li>If the oneway flag is false, then the match is 5417 symmetric.</li> 5418 </ul> 5419 </li> 5420 <li>Add 100 minus the <strong>percent</strong> attribute value 5421 to MD. 5422 </li> 5423 <li>Remove the subtag from each (logically)</li> 5424 </ol> 5425 </li> 5426 <li>Return MD</li> 5427 </ol> 5428 <p> 5429 It is typically useful to set the discount factor between successive 5430 elements of the desired languages list to be slightly greater than 5431 the default region difference. That avoids the following problem:<br> 5432 </p> 5433 <p> 5434 <em>Supported languages:</em> "de, fr, ja"<br> 5435 </p> 5436 <p> 5437 <em>User's desired languages:</em> "de-AT, fr" 5438 </p> 5439 <p>This user would expect to get "de", not "fr". In practice, when 5440 a user selects a list of preferred languages, they don't include all 5441 the regional variants ahead of their second base language. Yet while 5442 the user's desired languages really doesn't tell us the priority 5443 ranking among their languages, normally the fall-off between the 5444 user's languages is substantially greater than regional variants. But 5445 unless F is greater than the distance between de-AT and de-DE, then 5446 the user’s second-choice language would be returned.</p> 5447 <p>The base language subtag "und" is a special case. 5448 Suppose we have the following situation:</p> 5449 <ul> 5450 <li>desired languages: {und, it}</li> 5451 <li>supported languages: {en, it}</li> 5452 <li>resulting language: en<br> 5453 </li> 5454 </ul> 5455 <p>Part of this is because 'und' has a special function in BCP 47; 5456 it stands in for 'no supplied base language'. To prevent this from 5457 happening, if the desired base language is und, the language matcher 5458 should not apply likely subtags to it. </p> 5459 <p>Examples:</p> 5460 <p>For example, suppose that nn-DE and nb-FR are being compared. 5461 They are first maximized to nn-Latn-DE and nb-Latn-FR, respectively. 5462 The list is searched. The first match is with "*-*-*", for 5463 a match of 96%. The languages are truncated to nn-Latn and nb-Latn, 5464 then to nn and nb. The first match is also for a value of 96%, so the 5465 result is 92%.</p> 5466 <p>Note that language matching is orthogonal to the how closely 5467 two languages are related linguistically. For example, Breton is more 5468 closely related to Welsh than to French, but French is the better 5469 match (because it is more likely that a Breton reader will understand 5470 French than Welsh). This also illustrates that the matches are often 5471 asymmetric: it is not likely that a French reader will understand 5472 Breton.</p> 5473 <p>The "*" acts as a wild card, as shown in the 5474 following example:</p> 5475 <p class="example"> 5476 <languageMatch desired="es-*-ES" 5477 supported="es-*-ES" percent="100"/><br> 5478 <!-- Latin American Spanishes are closer to each other. 5479 Approximate by having es-ES be further from everything else.--> 5480 </p> 5481 <p> </p> 5482 <p class="example"><languageMatch desired="es-*-ES" 5483 supported="es-*-*" percent="93"/></p> 5484 <p class="example"> 5485 <br> <languageMatch desired="*" 5486 supported="*" percent="1"/><br> <!-- 5487 [Default value - must be at end!] Normally there is no comprehension 5488 of different languages.--> 5489 </p> 5490 <p class="example"> 5491 <br> <languageMatch desired="*-*" 5492 supported="*-*" percent="20"/><br> 5493 <!-- [Default value - must be at end!] Normally there is little 5494 comprehension of different scripts.--> 5495 </p> 5496 <p class="example"> 5497 <br> <languageMatch desired="*-*-*" 5498 supported="*-*-*" percent="96"/><br> 5499 <!-- [Default value - must be at end!] Normally there are small 5500 differences across regions.--> 5501 </p> 5502 <p>When the language+region is not matched, and there is otherwise 5503 no reason to pick among the supported regions for that language, then 5504 some measure of geographic "closeness" can be used. The 5505 results may be more understandable by users. Looking for en-SK, for 5506 example, should fall back to something within Europe (eg en-GB) in 5507 preference to something far away and unrelated (eg en-SG). Such a 5508 closeness metric does not need to be exact; a small amount of data 5509 can be used to give an approximate distance between any two regions. 5510 However, any such data must be used carefully; although Hong Kong is 5511 closer to India than to the UK, it is unlikely that en-IN would be a 5512 better match to en-HK than en-GB would.</p> 5513 5514 <h4><a name="EnhancedLanguageMatching" href="#EnhancedLanguageMatching">4.4.1 Enhanced Language Matching</a></h4> 5515 <p>The enhanced format for language matching adds structure to enable better matching of languages. It is distinguished by having a suffix "_new" on the type, as in the example below. The extended structure allows matching to take into account broad similarities that would give better results. For example, for English the regions that are or inherit from US (AS|GU|MH|MP|PR|UM|VI|US) form a “cluster”. Each region in that cluster should be closer to each other than to any other region. And a region outside the cluster should be closer to another region outside that cluster than to one inside. We get this issue with the “world languages” like English, Spanish, Portuguese, Arabic, etc.</p> 5516 <p><em>Example:</em></p> 5517 <pre> <languageMatches type="written_new"><br> <paradigmLocales locales="en en-GB es es-419 pt-BR pt-PT"/><br> <matchVariable id="$enUS" value="AS+GU+MH+MP+PR+UM+US+VI"/><br> <matchVariable id="$cnsar" value="HK+MO"/><br> <matchVariable id="$americas" value="019"/><br> <matchVariable id="$maghreb" value="MA+DZ+TN+LY+MR+EH"/><br> <languageMatch desired="no" supported="nb" distance="1"/><!-- no ⇒ nb --><br>… 5518 <languageMatch desired="ar_*_$maghreb" supported="ar_*_$maghreb" distance="4"/> 5519 <!-- ar; *; $maghreb ⇒ ar; *; $maghreb --> 5520 <languageMatch desired="ar_*_$!maghreb" supported="ar_*_$!maghreb" distance="4"/> 5521 <!-- ar; *; $!maghreb ⇒ ar; *; $!maghreb --><br>…</pre> 5522<p>The <strong>matchVariable</strong> allows for a rule to matche to multiple regions, as illustrated by <strong>$maghreb</strong>. The syntax is simple: it allows for + for <em>union</em> and - for <em>set difference</em>, but no precedence. So A+B-A+D is interpreted as (((A+B)-A)+D), not as (A+B)-(A+D). The variable <strong>id</strong> has a value of the form [$][a-zA-Z0-9]+. If $X is defined, then $!X automatically means all those regions that are not in $X. </p> 5523<p dir="ltr">When the set is interpreted, then macrolanguages are (logically) transformed into a list of their contents, so “053+GB” → “AU+GB+NF+NZ”. This is done recursively, so 009 → “053+054+057+061+QO” → “AU+NF+NZ+FJ+NC+PG+SB +VU...”. Note that we use 019 for all of the Americas in the variables above, because en-US should be in the same cluster as es-419 and its contents. </p> 5524<p>In the rules, the percent value (100..0) is replaced by a <strong>distance</strong> value, which is the inverse (0..100).</p> 5525<p dir="ltr">These new variables and rules divide up the world into clusters, where items in the same clusters (for specific languages) get the normal regional difference, and items in different clusters get different weights.</p> 5526<br> 5527<p dir="ltr">Each cluster can have one or more associated <strong>paradigmLocales</strong>. These are locales that are preferred within a cluster. So when matching desired=[en-SA] against [en-GU en en-IN en-GB], the value en-GB is returned. Both of {en-GU en} are in a different cluster. While {en-IN en-GB} are in the same cluster, and the same distance from en-SA, the preference is given to en-GB because it is in the paradigm locales. It would be possible to express this in rules, but using this mechanism handles these very common cases without bulking up the tables.<br> 5528</p> 5529<p dir="ltr">The <strong>paradigmLocales</strong> also allow matching to macroregions. For example, desired=[es-419] should match to {es-MX} more closely than to {es}, and vice versa: {es-MX} should match more closely to {es-419} than to {es}. But es-MX should match more closely to es-419 than to any of the other es-419 sublocales. In general, in the absence of other distance data, there is a ‘paradigm’ in each cluster that the others should match more closely to: en(-US), en-GB, es(-ES), es-419, ru(-RU)... </p> 5530 5531 <h2> 5532 <a name="XML_Format" href="#XML_Format">5 XML Format</a> 5533 </h2> 5534 <p>There are two kinds of data that can be expressed in LDML: 5535 language-dependent data and supplementary data. In either case, data 5536 can be split across multiple files, which can be in multiple 5537 directory trees.</p> 5538 <p>For example, the language-dependent data for Japanese in CLDR 5539 is present in the following files:</p> 5540 <ul> 5541 <li>common/collation/ja.xml</li> 5542 <li>common/main/ja.xml</li> 5543 <li>common/rbnf/ja.xml</li> 5544 <li>common/segmentations/ja.xml</li> 5545 </ul> 5546 <p>Data for cased languages such as French are in files like:</p> 5547 <ul> 5548 <li>common/casing/fr.xml</li> 5549 </ul> 5550 <p>The status of the data is the same, whether or not data is 5551 split. That is, for the purpose of validation and lookup, all of the 5552 data for the above ja.xml files is treated as if it was in a single 5553 file. These files have the <ldml> root element and use 5554 ldml.dtd. The file name must match the identity element. For example, 5555 the <ldml> file pa_Arab_PK.xml must contain the following 5556 elements:</p> 5557 <pre> 5558 <strong><ldml></strong><br> <identity><br> …<br> <strong><language type="pa"/><br> <script type="Arab"/><br> <territory type="PK"/></strong><br> </identity> 5559…</pre> 5560 <p>Supplemental data can have different root elements, currently: 5561 ldmlBCP47, supplementalData, keyboard, and platform. Keyboard and 5562 platform files are considered distinct. The ldmlBCP47 files and 5563 supplementalData files that have the same root are all logically part 5564 of the same file; they are simply split into separate files for 5565 convenience. Implementations may split the files in different ways, 5566 also for their convenience. The files in /properties are also 5567 supplemental data files, but are structured like UCD properties.</p> 5568 5569 <p>For example, supplemental data relating to Japan or the 5570 Japanese writing are in:</p> 5571 <ul> 5572 <li>common/supplemental/ (in many files, such as 5573 supplementalData.xml)</li> 5574 <li>common/transforms/Hiragana-Katakana.xml</li> 5575 <li>common/transforms/Hiragana-Latin.xml</li> 5576 <li>common/properties/scriptMetadata.txt</li> 5577 <li>common/bcp47/calendar.xml</li> 5578 <li>uca/allkeys_CLDR.txt (sorting)</li> 5579 <li>/keyboards/chromeos/ja-t-k0-chromeos.xml</li> 5580 <li>...</li> 5581 </ul> 5582 <p>Like the <ldml> files, the keyboard file names must match 5583 internal data: in particular, the locale attribute on the keyboard 5584 element must have a value that corresponds to the file name, such as 5585 <keyboard locale="af-t-k0-android"> for the file 5586 af-t-k0-android.xml.</p> 5587 <p> 5588 The following sections describe the structure of the XML format for 5589 language-dependent data. The more precise syntax is in the ldml.dtd 5590 file<i>; however, the DTD does not describe all the constraints 5591 on the structure.</i> 5592 </p> 5593 <p>To start with, the root element is <ldml>, with the 5594 following DTD entry:</p> 5595 <p class='dtd'> 5596 <!ELEMENT ldml 5597 (identity,(alias|(fallback*,localeDisplayNames?,layout?,contextTransforms?,characters?,<br> 5598 delimiters?,measurement?,dates?,numbers?,units?,listPatterns?,collations?,posix?,<br> 5599 segmentations?,rbnf?,annotations?,metadata?,references?,special*)))> 5600 </p> 5601 5602 <p>The XML structure is stable over releases. Elements and 5603 attributes may be deprecated: they are retained in the DTD but their 5604 usage is strongly discouraged. In most cases, an alternate structure 5605 is provided for expressing the information. There is only one 5606 exception: newer DTDs cannot be used with version 1.1 files, without 5607 some modification.</p> 5608 <p>In general, all translatable text in this format is in element 5609 contents, while attributes are reserved for types and non-translated 5610 information (such as numbers or dates). The reason that attributes 5611 are not used for translatable text is that spaces are not preserved, 5612 and we cannot predict where spaces may be significant in translated 5613 material.</p> 5614 <p> 5615 There are two kinds of elements in LDML: <i>rule</i> elements and <i>structure</i> 5616 elements. For structure elements, there are restrictions to allow for 5617 effective inheritance and processing: 5618 </p> 5619 <ol> 5620 <li>There is no "mixed" content: if an element has 5621 textual content, then it cannot contain any elements.</li> 5622 <li>The [<a href="#XPath">XPath</a>] leading to the content is 5623 unique; no two different pieces of textual content have the same [<a 5624 href="#XPath">XPath</a>]. 5625 </li> 5626 </ol> 5627 <p> 5628 Rule elements do not have this restriction, but also do not inherit, 5629 except as an entire block. The rule elements are listed in 5630 serialElements in the supplemental metadata. See also <i><a 5631 href="#Inheritance_and_Validity">Section 4.2 Inheritance and 5632 Validity</a></i>. For more technical details, see <a 5633 href="http://cldr.unicode.org/development/updating-dtds">Updating-DTDs</a>. 5634 </p> 5635 <p> 5636 Note that the data in examples given below is purely illustrative, 5637 and does not match any particular language. For a more detailed 5638 example of this format, see [<a href="#LDML">Example</a>]. There is 5639 also a DTD for this format, but <i>remember that the DTD alone is 5640 not sufficient to understand the semantics, the constraints, 5641 nor the interrelationships between the different elements and 5642 attributes</i>. You may wish to have copies of each of these to hand as 5643 you proceed through the rest of this document. 5644 </p> 5645 <p>In particular, all elements allow for draft versions to coexist 5646 in the file at the same time. Thus most elements are marked in the 5647 DTD as allowing multiple instances. However, unless an element is 5648 listed as a serialElement, or has a distinguishing attribute, it can 5649 only occur once as a subelement of a given element. Thus, for 5650 example, the following is illegal even though allowed by the DTD:</p> 5651 <p> 5652 <languages><br> <language 5653 type="aa">...</language><br> 5654 <language type="aa">..</language> 5655 </p> 5656 <p>There must be only one instance of these per parent, unless 5657 there are other distinguishing attributes (such as an alt element).</p> 5658 <p>In general, LDML data should be in NFC format. However, certain 5659 elements may need to contain characters that are not in NFC, 5660 including exemplars, transforms, segmentations, and 5661 p/s/t/i/pc/sc/tc/ic rules in collation. These elements must not be 5662 normalized (either to NFC or NFD), or their meaning may be changed. 5663 Thus LDML documents must not be normalized as a whole. To prevent 5664 problems with normalization, no element value can start with a 5665 combining slash (U+0338 COMBINING LONG SOLIDUS OVERLAY).</p> 5666 <p> 5667 Lists, such as <span class="attribute">singleCountries</span> are 5668 space-delimited. That means that they are separated by one or more 5669 XML whitespace characters, 5670 </p> 5671 <ul> 5672 <li>singleCountries</li> 5673 <li>preferenceOrdering</li> 5674 <li>references</li> 5675 </ul> 5676 <h3> 5677 <a name="Common_Elements" href="#Common_Elements">5.1 Common 5678 Elements</a> 5679 </h3> 5680 <p>At any level in any element, two special elements are allowed.</p> 5681 <h4> 5682 <a name="special" href="#special">5.1.1 Element special</a> 5683 </h4> 5684 <p> 5685 This element is designed to allow for arbitrary additional annotation 5686 and data that is product-specific. It has one required attribute <span 5687 class="attribute">xmlns</span>, which specifies the XML <a 5688 href="http://www.w3.org/TR/REC-xml-names/">namespace</a> of the 5689 special data. For example, the following used the version 1.0 POSIX 5690 special element. 5691 </p> 5692 <pre><!DOCTYPE ldml SYSTEM "<span style="color: blue">http://unicode.org/cldr/dtd/1.0/ldml.dtd</span>" [ 5693 <!ENTITY % posix SYSTEM "<span style="color: blue">http://unicode.org/cldr/dtd/1.0/ldmlPOSIX.dtd</span>"> 5694<span style="color: blue">%posix;</span> 5695]> 5696<ldml> 5697... 5698<special xmlns:posix="<span style="color: blue">http://www.opengroup.org/regproducts/xu.htm</span>"> 5699 <span style="color: green"><!-- old abbreviations for pre-GUI days --></span> 5700 <posix:messages> 5701 <posix:yesstr><span style="color: blue">Yes</span></posix:yesstr> 5702 <posix:nostr><span style="color: blue">No</span></posix:nostr> 5703 <posix:yesexpr><span style="color: blue">^[Yy].*</span></posix:yesexpr> 5704 <posix:noexpr><span style="color: blue">^[Nn].*</span></posix:noexpr> 5705 </posix:messages> 5706 </special> 5707</ldml> 5708</pre> 5709 <h5> 5710 <a name="Sample_Special_Elements" href="#Sample_Special_Elements">5.1.1.1 5711 Sample Special Elements</a> 5712 </h5> 5713 <p> 5714 The elements in this section are <i><b>not</b></i> part of the Locale 5715 Data Markup Language 1.0 specification. Instead, they are special 5716 elements used for application-specific data to be stored in the 5717 Common Locale Repository. They may change or be removed future 5718 versions of this document, and are present her more as examples of 5719 how to extend the format. (Some of these items may move into a future 5720 version of the Locale Data Markup Language specification.) 5721 </p> 5722 <ul> 5723 <li><a href="http://unicode.org/cldr/dtd/1.1/ldmlICU.dtd">http://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</a></li> 5724 <li><a href="http://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd">http://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd</a></li> 5725 </ul> 5726 <p>The above examples are old versions: consult the documentation 5727 for the specific application to see which should be used.</p> 5728 <p>These DTDs use namespaces and the special element. To include 5729 one or more, use the following pattern to import the special DTDs 5730 that are used in the file:</p> 5731 <pre><?xml version="<span style="color: blue">1.0</span>" encoding="<span 5732 style="color: blue">UTF-8</span>" ?> 5733<!DOCTYPE ldml SYSTEM "<span style="color: blue">http://unicode.org/cldr/dtd/1.1/ldml.dtd</span>" [ 5734 <!ENTITY % <span style="color: blue">icu</span> SYSTEM "<span 5735 style="color: blue">http://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</span>"> 5736 <!ENTITY % <span style="color: blue">openOffice</span> SYSTEM "<span 5737 style="color: blue">http://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd</span>"> 5738<span style="color: blue">%icu; 5739%openOffice; 5740</span>]></pre> 5741 <p>Thus to include just the ICU DTD, one uses:</p> 5742 <pre><?xml version="<span style="color: blue">1.0</span>" encoding="<span 5743 style="color: blue">UTF-8</span>" ?> 5744<!DOCTYPE ldml SYSTEM "<span style="color: blue">http://unicode.org/cldr/dtd/1.1/ldml.dtd</span>" [ 5745 <!ENTITY % icu SYSTEM "<span style="color: blue">http://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</span>"> 5746<span style="color: blue">%icu; 5747</span>]></pre> 5748 <blockquote> 5749 <p> 5750 <b>Note: </b>A previous version of this document contained a special 5751 element for <a 5752 href="http://www.open-std.org/jtc1/sc22/wg20/docs/n897-14652w25.pdf">ISO 5753 TR 14652</a> compatibility data. That element has been withdrawn, 5754 pending further investigation, since<b><i> </i></b>14652 is a Type 1 5755 TR: "when the required support cannot be obtained for the 5756 publication of an International Standard, despite repeated 5757 effort". See the ballot comments on <a 5758 href="http://www.open-std.org/jtc1/sc22/wg20/docs/n948-J1N6769-14652.pdf">14652 5759 Comments</a> for details on the 14652 defects. For example, most of 5760 these patterns make little provision for substantial changes in 5761 format when elements are empty, so are not particularly useful in 5762 practice. Compare, for example, the mail-merge capabilities of 5763 production software such as Microsoft Word or OpenOffice. 5764 </p> 5765 <p> 5766 <b>Note: </b>While the CLDR specification guarantees backwards 5767 compatibility, the definition of specials is up to other 5768 organizations. Any assurance of backwards compatibility is up to 5769 those organizations. 5770 </p> 5771 </blockquote> 5772 <p> 5773 A number of the elements above can have extra information for <a 5774 name="OpenOffice" href="#OpenOffice">openoffice.org</a>, such as the 5775 following example: 5776 </p> 5777 <pre> <special xmlns:openOffice="<span 5778 style="color: blue">http://www.openoffice.org</span>"> 5779 <openOffice:search> 5780 <openOffice:searchOptions> 5781 <openOffice:transliterationModules><span 5782 style="color: blue">IGNORE_CASE</span></openOffice:transliterationModules> 5783 </openOffice:searchOptions> 5784 </openOffice:search> 5785 </special> 5786</pre> 5787 <h4> 5788 <a name="Alias_Elements" href="#Alias_Elements">5.1.2 Element 5789 alias</a> 5790 </h4> 5791 <p class="dtd"> 5792 <!ELEMENT alias (special*) ><br> <!ATTLIST alias source 5793 NMTOKEN #REQUIRED ><br> <!ATTLIST alias path CDATA 5794 #IMPLIED> 5795 </p> 5796 <p>The contents of any element in root can be replaced by an 5797 alias, which points to the path where the data can be found.</p> 5798 <p>Aliases will only ever appear in root with the form 5799 //ldml/.../alias[@source="locale"][@path="..."].</p> 5800 <p>Consider the following example in root:</p> 5801 <pre> 5802 <calendar type="gregorian"><br> <months><br> <default choice="format"/><br> <monthContext type="format"><br> <default choice="wide"/><br> <monthWidth type="abbreviated"><br> <strong><alias source="locale" path="../monthWidth[@type='wide']"/></strong><br> </monthWidth></pre> 5803 <p> 5804 If the locale "de_DE" is being accessed for a month name 5805 for format/abbreviated, then a resource bundle at "de_DE" 5806 will be searched for a resource element at the that path. If not 5807 found there, then the resource bundle at "de" will be 5808 searched, and so on. When the alias is found in root, then the search 5809 is restarted, but searching for format/<strong>wide</strong> element 5810 instead of format/abbreviated. 5811 </p> 5812 <p> 5813 If the <b>path</b> attribute is present, then its value is an [<a 5814 href="#XPath">XPath</a>] that points to a different node in the 5815 tree. For example: 5816 </p> 5817 <pre><alias source="locale" path="../monthWidth[@type='wide']"/></pre> 5818 <p> 5819 The default value if the path is not present is the same position in 5820 the tree. All of the attributes in the [<a href="#XPath">XPath</a>] 5821 must be <i>distinguishing</i> elements. For more details, see <a 5822 href="#Inheritance_and_Validity">Section 4.2 Inheritance and 5823 Validity</a>. 5824 </p> 5825 <p> 5826 There is a special value for the source attribute, the constant <b>source="locale"</b>. 5827 This special value is equivalent to the locale being resolved. For 5828 example, consider the following example, where locale data for 5829 'de' is being resolved: 5830 </p> 5831 <div align="center"> 5832 <center> 5833 <table border="1" cellpadding="0" cellspacing="1"> 5834 <caption> 5835 <a name="Inheritance_with_source_locale_" 5836 href="#Inheritance_with_source_locale_">Inheritance with 5837 source="locale"</a> 5838 </caption> 5839 <tr> 5840 <th>Root</th> 5841 <th>de</th> 5842 <th bgcolor="#C0C0C0">Resolved</th> 5843 </tr> 5844 <tr> 5845 <td><code> 5846 <x><br> <a>1</a><br> 5847 <b>2</b><br> <c>3</c><br> 5848 <br> </x> 5849 </code></td> 5850 <td><code> 5851 <x><br> <a>11</a><br> 5852 <b>12</b><br> <br> 5853 <d>14</d><br> </x> 5854 </code></td> 5855 <td bgcolor="#C0C0C0"><code> 5856 <x><br> <a>11</a><br> 5857 <b>12</b><br> <span 5858 style="background-color: #FFFF00"><span 5859 class="inherited"><span style="font-weight: 400;"><c>3</c></span></span></span><br> 5860 <d>14</d><br> </x> 5861 </code></td> 5862 </tr> 5863 <tr> 5864 <td><code> 5865 <y><br> <alias source="locale" 5866 path="../x"><br> </y> 5867 </code></td> 5868 <td><code> 5869 <y><br> <br> <b>22</b><br> 5870 <br> <br> <e>25</e><br> 5871 </y> 5872 </code></td> 5873 <td bgcolor="#C0C0C0"><code> 5874 <y><br> <span style="background-color: #FFFF00"><span 5875 class="inherited"><span style="font-weight: 400;"><a>11</a></span></span></span><br> 5876 <b>22</b><br> <span 5877 style="background-color: #FFFF00"><span 5878 class="inherited"><span style="font-weight: 400;"><c>3</c></span></span></span><br> 5879 <span style="background-color: #FFFF00"><span 5880 class="inherited"><span style="font-weight: 400;"><d>14</d></span></span></span><br> 5881 <e>25</e><br> </y> 5882 </code></td> 5883 </tr> 5884 </table> 5885 </center> 5886 </div> 5887 <p>The first row shows the inheritance within the <x> 5888 element, whereby <c> is inherited from root. The second shows 5889 the inheritance within the <y> element, whereby <a>, 5890 <c>, and <d> are inherited also from root, but from an 5891 alias there. The alias in root is logically replaced not by the 5892 elements in root itself, but by elements in the 'target' 5893 locale.</p> 5894 <p> 5895 For more details on data resolution, see <a 5896 href="#Inheritance_and_Validity">Section 4.2 Inheritance and 5897 Validity</a>. 5898 </p> 5899 <p> 5900 Aliases must be resolved recursively. An alias may point to another 5901 path that results in another alias being found, and so on. For 5902 example, looking up Thai buddhist abbreviated months for the locale <strong>xx-YY</strong> 5903 may result in the following chain of aliases being followed: 5904 </p> 5905 <blockquote> 5906 <p>../../calendar[@type="buddhist"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"] 5907 </p> 5908 <p>xx-YY → xx → root // finds alias that changes path to:</p> 5909 <p>../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"] 5910 </p> 5911 <p>xx-YY → xx → root // finds alias that changes path to:</p> 5912 <p>../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="wide"] 5913 </p> 5914 <p>xx-YY → xx // finds value here</p> 5915 </blockquote> 5916 <p>It is an error to have a circular chain of aliases. That is, a 5917 collection of LDML XML documents must not have situations where a 5918 sequence of alias lookups (including inheritance and lateral 5919 inheritance) can be followed indefinitely without terminating.</p> 5920 <h4> 5921 <a name="Element_displayName" href="#Element_displayName">5.1.3 5922 Element displayName</a> 5923 </h4> 5924 <p>Many elements can have a display name. This is a translated 5925 name that can be presented to users when discussing the particular 5926 service. For example, a number format, used to format numbers using 5927 the conventions of that locale, can have translated name for 5928 presentation in GUIs.</p> 5929 <pre> <numberFormat> 5930 <displayName><span style="color: blue">Prozentformat</span></displayName> 5931... 5932 <numberFormat></pre> 5933 <p> 5934 Where present, the display names must be unique; that is, two 5935 distinct code would not get the same display name. (There is 5936 one exception to this: in time zones, where parsing results would 5937 give the same GMT offset, the standard and daylight display names can 5938 be the same across different time zone IDs.) Any translations should 5939 follow customary practice for the locale in question. For more 5940 information, see [<a href="#DataFormats">Data Formats</a>]. 5941 </p> 5942 <h4> 5943 <a name="Escaping_Characters" href="#Escaping_Characters">5.1.4 5944 Escaping Characters</a> 5945 </h4> 5946 <p>Unfortunately, XML does not have the capability to contain all 5947 Unicode code points. Due to this, in certain instances extra syntax 5948 is required to represent those code points that cannot be otherwise 5949 represented in element content. The escaping syntax is only defined 5950 on a few types of elements, such as in collation or exemplar sets, 5951 and uses the appropriate syntax for that type.</p> 5952 <p>The element <cp>, which was formerly used for this 5953 purpose, has been deprecated.</p> 5954 5955 <h3> 5956 <a name="Common_Attributes" href="#Common_Attributes">5.2 Common 5957 Attributes</a> 5958 </h3> 5959 <h4> 5960 <a name="Attribute_type" href="#Attribute_type">5.2.1 Attribute 5961 type</a> 5962 </h4> 5963 <p> 5964 The attribute <i>type</i> is also used to indicate an alternate 5965 resource that can be selected with a matching type=option in the 5966 locale id modifiers, or be referenced by a default element. For 5967 example: 5968 </p> 5969 <pre><ldml> 5970 ... 5971 <currencies> 5972 <currency><span style="color: blue">...</span></currency> 5973 <currency type="<span style="color: blue">preEuro</span>"><span 5974 style="color: blue">...</span></currency> 5975 </currencies> 5976</ldml></pre> 5977 <h4> 5978 <a name="Attribute_draft" href="#Attribute_draft">5.2.2 Attribute 5979 draft</a> 5980 </h4> 5981 <p> 5982 If this attribute is present, it indicates the status of all the data 5983 in this element and any subelements (unless they have a contrary <i>draft</i> 5984 value), as per the following: 5985 </p> 5986 <ul> 5987 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><i>approved:</i> 5988 fully approved by the technical committee (equals the CLDR 1.3 value 5989 of <i>false</i>, or an absent <i>draft</i> attribute). This does not 5990 mean that the data is guaranteed to be error-free—this is the best 5991 judgment of the committee.</li> 5992 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><i>contributed</i>: 5993 partially approved by the technical committee.</li> 5994 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><i>provisional</i>: 5995 partially confirmed. Implementations may choose to accept the 5996 provisional data, especially if there is no translated alternative.</li> 5997 <li style="margin-top: 0.5em; margin-bottom: 0.5em"><i>unconfirmed</i>: 5998 no confirmation available.</li> 5999 </ul> 6000 <p> 6001 For more information on precisely how these values are computed for 6002 any given release, see <a 6003 href="http://cldr.unicode.org/index/process#TOC-Data-Submission-and-Vetting-Process">Data 6004 Submission and Vetting Process</a> on the CLDR website. 6005 </p> 6006 <p> 6007 The draft attribute should only occur on "leaf" elements, and is deprecated elsewhere. For a more 6008 formal description of how elements are inherited, and what their 6009 draft status is, see <i><a href="#Inheritance_and_Validity">Section 6010 4.2 Inheritance and Validity</a></i>. 6011 </p> 6012 <h4> 6013 <a name="alt_attribute" href="#alt_attribute">5.2.3 Attribute alt</a> 6014 </h4> 6015 <p> 6016 This attribute labels an alternative value for an element. The value 6017 is a <i>descriptor</i> indicates what kind of alternative it is, and 6018 takes one of the following 6019 </p> 6020 <ul> 6021 <li><i>variantname</i> meaning that the value is a variant of 6022 the normal value, and may be used in its place in certain 6023 circumstances. If a variant value is absent for a particular locale, 6024 the normal value is used. The variant mechanism should only be used 6025 when such a fallback is acceptable.</li> 6026 <li><span style="color: blue">proposed</span>, optionally 6027 followed by a number, indicating that the value is a proposed 6028 replacement for an existing value.</li> 6029 <li><i>variantname</i><span style="color: blue">-proposed</span>, 6030 optionally followed by a number, indicating that the value is a 6031 proposed replacement variant value.</li> 6032 </ul> 6033 <p> 6034 "<span style="color: blue">proposed</span>" should only be 6035 present if the draft status is not "approved". It indicates 6036 that the data is proposed replacement data that has been added 6037 provisionally until the differences between it and the other data can 6038 be vetted. For example, suppose that the translation for September 6039 for some language is "Settembru", and a bug report is filed 6040 that that should be "Settembro". The new data can be 6041 entered in, but marked as <i>alt="proposed"</i> until it is 6042 vetted. 6043 </p> 6044 <pre>... 6045<month type="9">Settembru</month> 6046<month type="9" draft="unconfirmed" alt="proposed">Settembro</month> 6047<month type="10">...</pre> 6048 <p>Now assume another bug report comes in, saying that the correct 6049 form is actually "Settembre". Another alternative can be 6050 added:</p> 6051 <pre>... 6052<month type="9" draft="unconfirmed" alt="proposed2">Settembre</month> 6053...</pre> 6054 <p> 6055 The values for <i>variantname</i> at this time include "<span 6056 style="color: blue">variant</span>", "<span 6057 style="color: blue">list</span>", "<span 6058 style="color: blue">email</span>", "<span 6059 style="color: blue">www</span>", "<span 6060 class="attributeValue">short</span>", and "<span 6061 style="color: blue">secondary</span>". 6062 </p> 6063 <p> 6064 For a more complete description of how draft applies to data, see <i><a 6065 href="#Inheritance_and_Validity">Section 4.2 Inheritance and 6066 Validity</a></i>. 6067 </p> 6068 <p class="element2"> 6069 Attribute <a name="references_attribute" href="#references_attribute">references</a> 6070 </p> 6071 <p>The value of this attribute is a token representing a reference 6072 for the information in the element, including standards that it may 6073 conform to. <references>. (In older versions of CLDR, the value 6074 of the attribute was freeform text. That format is deprecated.)</p> 6075 <p> 6076 <i>Example:</i> 6077 </p> 6078 <p class="example"><territory type="UM" 6079 references="R222">USAs yttre öar</territory></p> 6080 <p>The reference element may be inherited. Thus, for example, R222 6081 may be used in sv_SE.xml even though it is not defined there, if it 6082 is defined in sv.xml.</p> 6083 <p><... allow="verbatim" ...> (deprecated)</p> 6084 <p>This attribute was originally intended for use in marking 6085 display names whose capitalization differed from what was indicated 6086 by the now-deprecated <inText> element (perhaps, for example, 6087 because the names included a proper noun). It was never supported in 6088 the dtd and is not needed for use with the new 6089 <contextTransforms> element.</p> 6090 <h3> 6091 <a name="Common_Structures" href="#Common_Structures">5.3 Common 6092 Structures</a> 6093 </h3> 6094 <h4> 6095 <a name="Date_Ranges" href="#Date_Ranges">5.3.1 Date and Date 6096 Ranges</a> 6097 </h4> 6098 <p> 6099 When attribute specify date ranges, it is usually done with 6100 attributes <i>from</i> and <i>to</i>. The <i>from</i> attribute 6101 specifies the starting point, and the <i>to</i> attribute specifies 6102 the end point. The deprecated <i>time</i> attribute was formerly used 6103 to specify time with the deprecated weekEndStart and weekEndEnd 6104 elements, which were themselves inherently <i>from</i> or <i>to</i>. 6105 </p> 6106 <p> 6107 The data format is a restricted ISO 8601 format, restricted to the 6108 fields <i>year, month, day, hour, minute, </i>and<i> second</i> in 6109 that order, with "-" used as a separator between date 6110 fields, a space used as the separator between the date and the time 6111 fields, and ":" used as a separator between the time 6112 fields. If the minute or minute and second are absent, they are 6113 interpreted as zero. If the hour is also missing, then it is 6114 interpreted based on whether the attribute is <i>from</i> or <i>to</i>. 6115 </p> 6116 <ul> 6117 <li> 6118 <p class="note"> 6119 <i>from</i> defaults to "00:00:00" (midnight at the start 6120 of the day). 6121 </p> 6122 </li> 6123 <li> 6124 <p class="note"> 6125 <i>to </i>defaults to "24:00:00" (midnight at the end of 6126 the day). 6127 </p> 6128 </li> 6129 </ul> 6130 <p class="note"> 6131 That is, Friday at 24:00:00 is the same time as Saturday at 00:00:00. 6132 Thus when the hour is missing, the <i>from and to</i> are interpreted 6133 inclusively: the range includes all of the day mentioned. 6134 </p> 6135 <p class="note">For example, the following are equivalent:</p> 6136 <table style="margin-top: 0.5em; margin-bottom: 0.5em" id="table25"> 6137 <tr> 6138 <td><usesMetazone from="1991-10-27" 6139 to="2006-04-02" .../></td> 6140 </tr> 6141 <tr> 6142 <td><usesMetazone from="1991-10-27 00:00:00" 6143 to="2006-04-02 24:00:00" .../></td> 6144 </tr> 6145 <tr> 6146 <td><usesMetazone from="1991-10-<font color="#FF0000"><b>26 6147 24</b></font>:00:00" to="2006-04-<font color="#FF0000"><b>03 6148 00</b></font>:00:00" .../> 6149 </td> 6150 </tr> 6151 </table> 6152 6153 <p> 6154 If the <i>from</i> element is missing, it is assumed to be as far 6155 backwards in time as there is data for; if the <i>to</i> element is 6156 missing, then it is from this point onwards, with no known end point. 6157 </p> 6158 <p>The dates and times are specified in local time, unless 6159 otherwise noted. (In particular, the metazone values are in UTC (also 6160 known as GMT).</p> 6161 <h4> 6162 <a name="Text_Directionality" href="#Text_Directionality">5.3.2 6163 Text Directionality</a> 6164 </h4> 6165 <p>The content of certain elements, such as date or number 6166 formats, may consist of several sub-elements with an inherent order 6167 (for example, the year, month, and day for dates). In some cases, the 6168 order of these sub-elements may be changed depending on the 6169 bidirectional context in which the element is embedded.</p> 6170 <p>For example, short date formats in languages such as Arabic may 6171 contain neutral or weak characters at the beginning or end of the 6172 element content. In such a case, the overall order of the 6173 sub-elements may change depending on the surrounding text.</p> 6174 <p>Element content whose display may be affected in this way 6175 should include an explicit direction mark, such as U+200E 6176 LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK, at the beginning or 6177 end of the element content, or both.</p> 6178 <h4> 6179 <a name="Unicode_Sets" href="#Unicode_Sets">5.3.3 Unicode Sets</a> 6180 </h4> 6181 <p> 6182 Some attribute values or element contents use <em>UnicodeSet</em> 6183 notation. A UnicodeSet represents a finite set of Unicode code points 6184 and strings, and is defined by lists of code points and strings, 6185 Unicode property sets, and set operators, all bounded by square 6186 brackets. In this context, a code point means a string consisting of 6187 exactly one code point. 6188 </p> 6189 <p> 6190 A UnicodeSet implements the semantics in <i>UTS 6191 #18: Unicode Regular Expressions</i> [<a 6192 href="http://www.unicode.org/reports/tr41/#UTS18">UTS18</a>] Levels 1 & 2 that are relevant to determining sets of characters. Note however that it may deviate from the syntax provided in [<a 6193 href="http://www.unicode.org/reports/tr41/#UTS18">UTS18</a>], which is illustrative rather than a requirement. There is one exception to the supported semantics, Section <a href="http://unicode.org/reports/tr18/#RL2.6">RL2.6</a> <em>Wildcards in Property Values</em>. That feature can be supported in clients such as ICU by implementing a “hook” as is done in the <a href="https://unicode.org/cldr/utility/list-unicodeset.jsp?a=\p{name=/APPLE/}">online UnicodeSet utilities</a>.</p> 6194 <p>A UnicodeSet may be cited in specifications 6195 outside of the domain of LDML. In such a case, the specification may 6196 specify a subset of the syntax provided here.</p> 6197 <p>The following provides EBNF syntax for a UnicodeSet:</p> 6198 <div align='center'> 6199 <table class='simple'> 6200<tr> 6201 <th>Symbol</th> 6202 <th>Expression</th> 6203 <th>Examples</th> 6204</tr> 6205<tr><th>root</th> 6206 <td><code>= prop <br>| '[-]' <br>| '[' [\-\^]? s seq+ ']'</code></td> 6207 <td>\p{x=y},<br> 6208 [abc]</td> 6209</tr> 6210<tr><th>seq</th> 6211 <td><code>= root (s [\&\-] s root)* s <br>| range s</code></td> 6212 <td>[abc]-[cde], a <br></td> 6213</tr> 6214<tr><th>range</th> 6215 <td><code>= char ('-' char)? <br>| '{' (s char)+ s '}'</code></td> 6216 <td>a, a-c, {abc}</td> 6217</tr> 6218<tr><th>prop</th> 6219 <td><code>= '\\' [pP] '{' propName ([≠=] s value1+)? '}' <br>| '[:' '^'? propName ([≠=] s value2+)? ':]'</code></td> 6220 <td>\p{x=y}, [:x=y:]<br></td> 6221</tr> 6222<tr><th>propName</th> 6223 <td><code>= s [A-Za-z0-9] [A-Za-z0-9_\x20]* s</code></td> 6224 <td>General_Category,<br> 6225 General Category</td> 6226</tr> 6227<tr><th>value1</th> 6228 <td><code>= [^\}] <br> 6229 | '\\' quoted </code></td> 6230 <td>Lm,<br> 6231 \n,<br> 6232 \}</td> 6233</tr> 6234<tr><th>value2</th> 6235 <td><code>= [^:] <br> 6236 | '\\' quoted</code></td> 6237 <td>Lm,<br> 6238 \n,<br> 6239 \:</td> 6240</tr> 6241<tr><th>char</th> 6242 <td><code>= [^\& \- \[ \[ \] \\ \} \{ [:Pat_WS:]] <br> 6243 | '\\' quoted</code></td> 6244 <td>a, b, c, \n</td> 6245</tr> 6246<tr><th>quoted</th> 6247<td><code>= 'u' (hex{4} | bracketedHex) <br> 6248 | 'x' (hex{2} | bracketedHex) <br> | 'U00' ('0' hex{5} | '10' hex{4}) <br>| 'N{' propName '}' <br>| [\u0000-\U00010FFFF]</code></td> 6249<td> </td> 6250</tr> 6251<tr><th>bracketedHex</th> 6252 <td><code>= '{' s hexCodePoint (s hexCodePoint)* s '}'</code></td> 6253 <td>{61 2019 62}</td> 6254</tr> 6255<tr><th>hexCodePoint</th> 6256 <td><code>= hex{1,5} | '10' hex{4}</code></td> 6257 <td> </td> 6258</tr> 6259<tr><th>hex</th> 6260 <td><code>= [0-9A-Fa-f]</code></td> 6261 <td> </td> 6262</tr> 6263<tr><th>s</th> 6264 <td><code>= [:Pattern_White_Space:]*</code></td> 6265 <td>optional whitespace</td> 6266</tr> 6267 </table> 6268</div> 6269 <p>Some constraints on UnicodeSet syntax are not captured by this EBNF. Notably, property names and values are restricted to those supported by the implementation.</p> 6270 <p>The syntax characters are listed in the table below:</p> 6271 <table> 6272 <tbody> 6273 <tr> 6274 <th>Char</th> 6275 <th>Hex</th> 6276 <th>Name</th> 6277 <th>Usage</th> 6278 </tr> 6279 <tr> 6280 <td>$</td> 6281 <td>U+0024</td> 6282 <td>DOLLAR SIGN</td> 6283 <td>Equivalent of \uFFFF (This is for implementations that return \uFFFF when accessing before the first or after the last character)</td> 6284 </tr> 6285 <tr> 6286 <td>&</td> 6287 <td>U+0026</td> 6288 <td>AMPERSAND</td> 6289 <td>Intersecting UnicodeSets</td> 6290 </tr> 6291 <tr> 6292 <td>-</td> 6293 <td>U+002D</td> 6294 <td>HYPHEN-MINUS</td> 6295 <td>Ranges of characters; also set difference.</td> 6296 </tr> 6297 <tr> 6298 <td>:</td> 6299 <td>U+003A</td> 6300 <td>COLON</td> 6301 <td>POSIX-style property syntax</td> 6302 </tr> 6303 <tr> 6304 <td>[</td> 6305 <td>U+005B</td> 6306 <td>LEFT SQUARE BRACKET</td> 6307 <td>Grouping; POSIX property syntax</td> 6308 </tr> 6309 <tr> 6310 <td>]</td> 6311 <td>U+005D</td> 6312 <td>RIGHT SQUARE BRACKET</td> 6313 <td>Grouping; POSIX property syntax</td> 6314 </tr> 6315 <tr> 6316 <td>\</td> 6317 <td>U+005C</td> 6318 <td>REVERSE SOLIDUS</td> 6319 <td>Escaping</td> 6320 </tr> 6321 <tr> 6322 <td>^</td> 6323 <td>U+005E</td> 6324 <td>CIRCUMFLEX ACCENT</td> 6325 <td>Posix negation syntax</td> 6326 </tr> 6327 <tr> 6328 <td>{</td> 6329 <td>U+007B</td> 6330 <td>LEFT CURLY BRACKET</td> 6331 <td>Strings in set; Perl property syntax</td> 6332 </tr> 6333 <tr> 6334 <td>}</td> 6335 <td>U+007D</td> 6336 <td>RIGHT CURLY BRACKET</td> 6337 <td>Strings in set; Perl property syntax</td> 6338 </tr> 6339 <tr> 6340 <td> </td> 6341 <td>U+0020 U+0009..U+000D U+0085<br> 6342 U+200E U+200F<br> 6343 U+2028 U+2029</td> 6344 <td>ASCII whitespace,<br> 6345 LRM, RLM,<br> 6346 LINE/PARAGRAPH SEPARATOR</td> 6347 <td>Ignored except when escaped</td> 6348 </tr> 6349 </tbody> 6350 </table> 6351 <br> 6352 <h5> 6353 <a href="#Lists_of_Code_Points" name="Lists_of_Code_Points">5.3.3.1 6354 Lists of Code Points</a> 6355 </h5> 6356 <p> 6357 Lists are a sequence of strings that may include ranges, which are 6358 indicated by a '-' between two code points, as in 6359 "a-z". The sequence<em> start-end</em> specifies the range 6360 of all code points from the start to end, inclusive, in Unicode 6361 order. For example, <b>[a c d-f m]</b> is equivalent to <b>[a c d 6362 e f m]</b>. Whitespace can be freely used for clarity, as <b>[a c 6363 d-f m]</b> means the same as <b>[acd-fm]</b>. 6364 </p> 6365 <p> 6366 A string with multiple code points is represented in a list by being 6367 surrounded by curly braces, such as in <strong>[a-z {ch}]</strong>. 6368 It can be used with the range notation, as described in <em>Section 6369 <a href="#String_Range">5.3.4 String Range</a> 6370 </em>. There is an additional restriction on string ranges in a 6371 UnicodeSet: the number of codepoints in the first string of the range 6372 must be identical to the number in the second. Thus [{ab}-{c}] and 6373 [{ab}-c] are invalid. 6374 </p> 6375 <p>In UnicodeSets, there are two ways to quote syntax code points: 6376 </p> 6377 <p> 6378 <a name="Backslash_Escapes"></a>Outside of single quotes, certain 6379 backslashed code point sequences can be used to quote code points: 6380 </p> 6381 <table class='simple'> 6382 <tr> 6383 <td>\x{h...h}<br> 6384 \u{h...h}</td> 6385 <td>list of 1-6 hex digits ([0-9A-Fa-f]), separated by spaces</td> 6386 </tr> 6387 <tr> 6388 <td>\xhh</td> 6389 <td>1-2 hex digits</td> 6390 </tr> 6391 <tr> 6392 <td>\uhhhh</td> 6393 <td>Exactly 4 hex digits</td> 6394 </tr> 6395 <tr> 6396 <td>\Uhhhhhhhh</td> 6397 <td>Exactly 8 hex digits</td> 6398 </tr> 6399 <tr> 6400 <td>\a</td> 6401 <td>U+0007 (BEL / ALERT)</td> 6402 </tr> 6403 <tr> 6404 <td>\b</td> 6405 <td>U+0008 (BACKSPACE)</td> 6406 </tr> 6407 <tr> 6408 <td>\t</td> 6409 <td>U+0009 (TAB / CHARACTER TABULATION)</td> 6410 </tr> 6411 <tr> 6412 <td>\n</td> 6413 <td>U+000A (LINE FEED)</td> 6414 </tr> 6415 <tr> 6416 <td>\v</td> 6417 <td>U+000B (LINE TABULATION)</td> 6418 </tr> 6419 <tr> 6420 <td>\f</td> 6421 <td>U+000C (FORM FEED)</td> 6422 </tr> 6423 <tr> 6424 <td>\r</td> 6425 <td>U+000D (CARRIAGE RETURN)</td> 6426 </tr> 6427 <tr> 6428 <td>\\</td> 6429 <td>U+005C (BACKSLASH / REVERSE SOLIDUS)</td> 6430 </tr> 6431 <tr> 6432 <td>\N{name}</td> 6433 <td>The Unicode code point named "name".</td> 6434 </tr> 6435 <tr> 6436 <td>\p{…},\P{…}</td> 6437 <td>Unicode property (see below)</td> 6438 </tr> 6439 </table><br> 6440 <p>Anything else following a backslash is mapped to itself, except 6441 the property syntax described below, or in an environment where it is 6442 defined to have some special meaning. </p> 6443 <p> 6444 Any code point formed as the result of a backslash escape loses any 6445 special meaning and is treated as a literal. In particular, note that 6446 \x, \u and \U escapes create literal code points. (In contrast, Java 6447 treats Unicode escapes as just a way to represent arbitrary code 6448 points in an ASCII source file, and any resulting code points are <i><b>not</b></i> 6449 tagged as literals.) 6450 </p> 6451 <p> 6452 Unicode property sets are defined as described as described in <i>UTS 6453 #18: Unicode Regular Expressions</i> [<a 6454 href="http://www.unicode.org/reports/tr41/#UTS18">UTS18</a>], Level 6455 1 and RL2.5, including the syntax where given. For an example of a 6456 concrete implementation of this, see [<a href="#ICUUnicodeSet">ICUUnicodeSet</a>]. 6457 </p> 6458 <h5> 6459 <a href="#Unicode_Properties" name="Unicode_Properties">5.3.3.2 6460 Unicode Properties</a> 6461 </h5> 6462 6463 <p> 6464 Briefly, Unicode property sets are specified by any Unicode property 6465 and a value of that property, such as <b>[:General_Category=Letter:]</b>. 6466 for Unicode letters or <b>\p{uppercase}</b> is the set of upper case 6467 letters in Unicode. The property names are defined by the 6468 PropertyAliases.txt file and the property values by the 6469 PropertyValueAliases.txt file. For more information, see [<a 6470 href="http://unicode.org/reports/tr41/#UAX44">UAX44</a>]. The syntax 6471 for specifying the property sets is an extension of either POSIX or 6472 Perl syntax, by the addition of "=<value>". For 6473 example, you can match letters by using the POSIX-style syntax: 6474 </p> 6475 <p> 6476 <b>[:General_Category=Letter:]</b> 6477 </p> 6478 <p>or by using the Perl-style syntax</p> 6479 <p> 6480 <b>\p{General_Category=Letter}</b>. 6481 </p> 6482 <p> 6483 Property names and values are case-insensitive, and whitespace, 6484 "-", and "_" are ignored. The property name can 6485 be omitted for the <strong>General_Category</strong> and <strong>Script</strong> 6486 properties, but is required for other properties. If the property 6487 value is omitted, it is assumed to represent a boolean property with 6488 the value "true". Thus <b>[:Letter:]</b> is equivalent to <b>[:General_Category=Letter:]</b>, 6489 and <b>[:Wh-ite-s pa_ce:]</b> is equivalent to <b>[:Whitespace=true:]</b>. 6490 </p> 6491 <p> 6492 The table below shows the two kinds of syntax: POSIX and Perl style. 6493 Also, the table shows the "Negative" version, which is a 6494 property that excludes all code points of a given kind. For example, 6495 <b>[:^Letter:]</b> matches all code points that are not <b>[:Letter:]</b>. 6496 </p> 6497 <table> 6498 <tr> 6499 <th> </th> 6500 <th>Positive</th> 6501 <th>Negative</th> 6502 </tr> 6503 <tr> 6504 <td>POSIX-style Syntax</td> 6505 <td>[:type=value:]</td> 6506 <td>[:^type=value:]</td> 6507 </tr> 6508 <tr> 6509 <td>Perl-style Syntax</td> 6510 <td>\p{type=value}</td> 6511 <td>\P{type=value}</td> 6512 </tr> 6513 </table> 6514 <h5> 6515 <a href="#Boolean_Operations" name="Boolean_Operations">5.3.3.3 6516 Boolean Operations</a> 6517 </h5> 6518 6519 <p>The low-level lists or properties then can be freely combined 6520 with the normal set operations (union, inverse, difference, and 6521 intersection):</p> 6522 <ul> 6523 <li>To union two sets, simply concatenate them. For example, <b>[[:letter:] 6524 [:number:]]</b></li> 6525 <li>To intersect two sets, use the '&' operator. For 6526 example, <b>[[:letter:] & [a-z]] </b> 6527 </li> 6528 <li>To take the set-difference of two sets, use the '-' 6529 operator. For example, <b>[[:letter:] - [a-z]]</b> 6530 </li> 6531 <li>To invert a set, place a '^' immediately after the 6532 opening '['. For example, <b>[^a-z]</b>. In any other 6533 location, the '^' does not have a special meaning. The 6534 inversion [^X] is equivalent to [[\x{0}-\x{10FFFF}]-[X]]. Thus 6535 multi-code point strings are discarded. 6536 </li> 6537 <li>Symmetric difference (~) is not supported.</li> 6538 </ul> 6539 <p> 6540 The binary operators '&', '-', and the implicit 6541 union have equal precedence and bind left-to-right. Thus <b>[[:letter:]-[a-z]-[\u0100-\u01FF]]</b> 6542 is equal to <b>[[[:letter:]-[a-z]]-[\u0100-\u01FF]]</b>. Another 6543 example is the set <b>[[ace][bdf] - [abc][def]]</b>, which is not the 6544 empty set, but instead equal to <b>[[[[ace] [bdf]] - [abc]] 6545 [def]]</b>, which equals <b>[[[abcdef] - [abc]] [def]]</b>, which equals 6546 <b>[[def] [def]]</b>, which equals <b>[def]</b>. 6547 </p> 6548 <p> 6549 <strong>One caution:</strong> the '&' and '-' 6550 operators operate between sets. That is, they must be immediately 6551 preceded and immediately followed by a set. For example, the pattern 6552 <b>[[:Lu:]-A]</b> is illegal, since it is interpreted as the set <b>[:Lu:]</b> 6553 followed by the incomplete range <b>-A</b>. To specify the set of 6554 upper case letters except for 'A', enclose the 'A' in 6555 brackets: <b>[[:Lu:]-[A]]</b>. 6556 </p> 6557 <h5> 6558 <a href="#UnicodeSet_Examples" name="UnicodeSet_Examples">5.3.3.4 6559 UnicodeSet Examples</a> 6560 </h5> 6561 <p>The following table summarizes the syntax that can be used.</p> 6562 <table style="margin-top: 0.5em; margin-bottom: 0.5em" id="table18"> 6563 <tr> 6564 <th>Example</th> 6565 <th>Description</th> 6566 </tr> 6567 <tr> 6568 <td nowrap>[a]</td> 6569 <td>The set containing 'a' alone</td> 6570 </tr> 6571 <tr> 6572 <td nowrap>[a-z]</td> 6573 <td>The set containing 'a' through 'z' and all 6574 letters in between, in Unicode order.<br> Thus it is the same 6575 as [\u0061-\u007A]. 6576 </td> 6577 </tr> 6578 <tr> 6579 <td nowrap>[^a-z]</td> 6580 <td>The set containing all code points but 'a' through 6581 'z'.<br> Thus it is the same as [\u0000-\u0060 6582 \u007B-\x{10FFFF}]. 6583 </td> 6584 </tr> 6585 <tr> 6586 <td nowrap>[[pat1][pat2]]</td> 6587 <td>The union of sets specified by pat1 and pat2</td> 6588 </tr> 6589 <tr> 6590 <td nowrap>[[pat1]&[pat2]]</td> 6591 <td>The intersection of sets specified by pat1 and pat2</td> 6592 </tr> 6593 <tr> 6594 <td nowrap>[[pat1]-[pat2]]</td> 6595 <td>The asymmetric difference of sets specified by pat1 and 6596 pat2</td> 6597 </tr> 6598 <tr> 6599 <td nowrap>[a {ab} {ac}]</td> 6600 <td>The code point 'a' and the multi-code point strings 6601 "ab" and "ac"</td> 6602 </tr> 6603 <tr> 6604 <td nowrap>[x\u{61 2019 62}y]</td> 6605 <td>Equivalent to [x\u0061\u201\u0062y] (= [xa’by])</td> 6606 </tr> 6607 <tr> 6608 <td nowrap>[{ax}-{bz}]</td> 6609 <td>The set containing [{ax} {ay} {az} {bx} {by} {bz}], using 6610 the range syntax to get all the strings from {ax} to {bz} as 6611 described in <em>Section <a href="#String_Range">5.3.4 6612 String Range</a></em>. 6613 </td> 6614 </tr> 6615 <tr> 6616 <td nowrap>[:Lu:]</td> 6617 <td>The set of code points with a given property value, as 6618 defined by PropertyValueAliases.txt. In this case, these are the 6619 Unicode upper case letters. The long form for this is <b>[:General_Category=Uppercase_Letter:]</b>. 6620 </td> 6621 </tr> 6622 <tr> 6623 <td nowrap>[:L:]</td> 6624 <td>The set of code points belonging to all Unicode categories 6625 starting with 'L', that is, <b>[[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]]</b>. 6626 The long form for this is <b>[:General_Category=Letter:]</b>. 6627 </td> 6628 </tr> 6629 </table> 6630 <br> 6631 <h4> 6632 <a name="String_Range" href="#String_Range">5.3.4 String Range</a> 6633 </h4> 6634 <p>A String Range is a compact format for specifying a list of 6635 strings.</p> 6636 <p> 6637 <strong>Syntax:<br> 6638 </strong> 6639 </p> 6640 <blockquote> 6641 <p> 6642 X <em>sep</em> Y<br> 6643 </p> 6644 </blockquote> 6645 <p>The separator and the format of strings X, Y may vary depending 6646 on the domain. For example,</p> 6647 <ul> 6648 <li>for the validity files the separator is ~,</li> 6649 <li>for UnicodeSet the separator is 6650 -, and any multi-codepoint string is 6651 enclosed in {…}. 6652 </li> 6653 </ul> 6654 <p> 6655 <strong>Validity: <br> 6656 </strong> 6657 </p> 6658 <blockquote> 6659 <p> 6660 A string range X <em>sep</em> Y is valid iff len(X) ≥ len(Y) > 0, 6661 where len(X) is the length of X in code points. 6662 </p> 6663 <p> 6664 <em>There may be additional, domain-specific requirements for 6665 validity of the expansion of the string range.</em> 6666 </p> 6667 </blockquote> 6668 <p> 6669 <strong>Interpretation:<br> 6670 </strong> 6671 </p> 6672 <ol> 6673 <li>Break X into P and S, where len(S) = len(Y) 6674 <ul> 6675 <li>Note that P will be an empty string if the lengths of X 6676 and Y are equal.</li> 6677 </ul> 6678 </li> 6679 <li>Form the combinations of all P+(s₀..y₀)+(s₁..y₁)+...(sₙ..yₙ) 6680 <ul> 6681 <li>s₀ is the first code point in S, etc.</li> 6682 </ul> 6683 </li> 6684 </ol> 6685 <p> 6686 <strong>Examples:</strong> 6687 </p> 6688 <table> 6689 <tbody> 6690 <tr> 6691 <td>ab-ad</td> 6692 <td>→</td> 6693 <td>ab ac ad</td> 6694 </tr> 6695 <tr> 6696 <td>ab-d</td> 6697 <td>→</td> 6698 <td>ab ac ad</td> 6699 </tr> 6700 <tr> 6701 <td>ab-cd</td> 6702 <td>→</td> 6703 <td>ab ac ad bb bc bd cb cc cd</td> 6704 </tr> 6705 <tr> 6706 <td>-</td> 6707 <td>→</td> 6708 <td> </td> 6709 </tr> 6710 <tr> 6711 <td>-</td> 6712 <td>→</td> 6713 <td> </td> 6714 </tr> 6715 </tbody> 6716 </table> 6717 <br> 6718 <h3> 6719 <a name="Identity_Elements" href="#Identity_Elements">5.4 6720 Identity Elements</a> 6721 </h3> 6722 <p class="dtd"><!ELEMENT identity (alias | (version, 6723 generation?, language, script?, territory?, variant?, special*) ) 6724 ></p> 6725 <p>The identity element contains information identifying the 6726 target locale for this data, and general information about the 6727 version of this data.</p> 6728 <p class="element2"> 6729 <version number="<u>$</u>Revision: 1.227 <u>$</u>"> 6730 </p> 6731 <p>The version element provides, in an attribute, the version of 6732 this file. The contents of the element can contain textual 6733 notes about the changes between this version and the last. For 6734 example:</p> 6735 <blockquote> 6736 <pre><version number="<span style="color: blue">1.1</span>"><span 6737 style="color: blue">Various notes and changes in version 1.1</span></version></pre> 6738 <p>This is not to be confused with the version attribute on the 6739 ldml element, which tracks the dtd version.</p> 6740 </blockquote> 6741 <p class="element2"> 6742 <generation date="<u>$</u>Date: 2007/07/17 23:41:16 <u>$</u>" 6743 /> 6744 </p> 6745 <p>The generation element is now deprecated. It was used to 6746 contain the last modified date for the data. This could be in two 6747 formats: ISO 8601 format, or CVS format (illustrated by the example 6748 above).</p> 6749 <p class="element2"> 6750 <language type="<span style="color: blue">en</span>"/> 6751 </p> 6752 <p>The language code is the primary part of the specification of 6753 the locale id, with values as described above.</p> 6754 <p class="element2"> 6755 <script type="<span style="color: blue">Latn</span>" 6756 /> 6757 </p> 6758 <p>The script code may be used in the identification of written 6759 languages, with values described above.</p> 6760 <p class="element2"> 6761 <territory type="<span style="color: blue">US</span>"/> 6762 </p> 6763 <p>The territory code is a common part of the specification of the 6764 locale id, with values as described above.</p> 6765 <p class="element2"> 6766 <variant type="<span class="attributeValue">NYNORSK</span>"/> 6767 </p> 6768 <p>The variant code is the tertiary part of the specification of 6769 the locale id, with values as described above.</p> 6770 6771 <p> 6772 When combined according to the rules described in <i> <a 6773 href="#Unicode_Language_and_Locale_Identifiers">Section 3, 6774 Unicode Language and Locale Identifiers</a></i>, the language element, 6775 along with any of the optional script, territory, and variant 6776 elements, must identify a known, stable locale identifier. Otherwise, 6777 it is an error. 6778 </p> 6779 <h3> 6780 <a name="Valid_Attribute_Values" href="#Valid_Attribute_Values">5.5 6781 Valid Attribute Values</a> 6782 </h3> 6783 <p>The valid attribute values, as well as other validity 6784 information is contained in the supplementalMetadata.xml file. (Some, 6785 but not all, of this information could have been represented in XML 6786 Schema or a DTD.) Most of this is primarily for internal tool use.</p> 6787 6788 <p>The <elementOrder> and <attributeOrder> elements 6789 are now deprecated, since the information regarding element and 6790 attribute ordering is now contained in the DTD.</p> 6791 <p> 6792 <i>The suppress elements are those that are suppressed in 6793 canonicalization.</i> 6794 </p> 6795 <p> 6796 <i>The serialElements are those that do not inherit, and may have 6797 ordering</i> 6798 </p> 6799 <blockquote> 6800 <pre><serialElements>attributeValues base comment extend first_non_ignorable first_primary_ignorable 6801first_secondary_ignorable first_tertiary_ignorable first_trailing first_variable i ic languagePopulation 6802last_non_ignorable last_primary_ignorable last_secondary_ignorable last_tertiary_ignorable last_trailing 6803last_variable optimize p pc reset rules s sc settings suppress_contractions t tRule tc variable x 6804</serialElements></pre> 6805 </blockquote> 6806 <p> 6807 <i>The validity elements give the possible attribute values. They 6808 are in the format of a series of variables, followed by 6809 attributeValues. </i> 6810 </p> 6811 <blockquote> 6812 <pre><variable id="$calendar" type="choice"> 6813buddhist coptic ethiopic ethiopic-amete-alem chinese gregorian hebrew indian islamic islamic-civil 6814japanese arabic civil-arabic thai-buddhist persian roc</variable></pre> 6815 </blockquote> 6816 <p>The types indicate the style of match:</p> 6817 <ul> 6818 <li>choice: for a list of possible values</li> 6819 <li>regex: for a regular expression match</li> 6820 <li>notDoneYet: for items without matching criteria</li> 6821 <li>locale: for locale IDs</li> 6822 <li>list: for a space-delimited list of values</li> 6823 <li>path: for a valid [<a href="#XPath">XPath</a>] 6824 </li> 6825 </ul> 6826 <p>If the attribute order="given" is supplied, it 6827 indicates the order of elements when canonicalizing (see below).</p> 6828 <p>The variable values are intended for internal testing, and the 6829 definition and usage may change between releases. They do not 6830 necessarily include all valid elements. For example, for primary 6831 language codes, they include the subset that occur in CLDR locale 6832 data. They are intended for a particular version of CLDR, and may 6833 omit codes that were present in earlier versions, such as deprecated 6834 codes.</p> 6835 <p>The <deprecated> element lists elements, attributes, and 6836 attribute values that are deprecated. If any deprecatedItems element 6837 contains more than one attribute, then only the listed combinations 6838 are deprecated. Thus the following means not that the draft attribute 6839 is deprecated, but that the true and false values for that attribute 6840 are:</p> 6841 <blockquote> 6842 <pre><deprecatedItems attributes="draft" values="true false"/> </pre> 6843 </blockquote> 6844 <p> 6845 Similarly, the following means that the <i>type</i> attribute is 6846 deprecated, but only for the listed elements: 6847 </p> 6848 <blockquote> 6849 <pre><deprecatedItems elements="abbreviationFallback default ... preferenceOrdering" attributes="type"/> </pre> 6850 </blockquote> 6851 <p class="dtd"> 6852 <!ELEMENT blockingItems EMPTY ><br> <!ATTLIST 6853 blockingItems elements NMTOKENS #IMPLIED > 6854 </p> 6855 <p> 6856 The blockingItems were used to indicate which elements (and their child elements) 6857 do not inherit. For example, because supplementalData is a blocking 6858 item, all paths containing the element <span class="element">supplementalData</span> 6859 do not inherit. However, <strong>the <blockingItems> element is now deprecated,</strong> 6860 having been replaced by the annotations in the DTD and the DTDData classes in CLDR tooling. 6861 </p> 6862 <pre class="dtd"><!ELEMENT distinguishingItems EMPTY > 6863<!ATTLIST distinguishingItems exclude ( true | false ) #IMPLIED > 6864<!ATTLIST distinguishingItems elements NMTOKENS #IMPLIED > 6865<!ATTLIST distinguishingItems attributes NMTOKENS #IMPLIED ></pre> 6866 <p> 6867 The distinguishing items were used to indicate which combinations of elements and 6868 attributes (in unblocked environments) are <i>distinguishing</i> in 6869 performing inheritance. For example, the attribute type is 6870 distinguishing <i>except</i> in combination with certain elements, 6871 such as in the following. However, <strong>the <distinguishingItems> element is now deprecated,</strong> 6872 having been replaced by the annotations in the DTD and the DTDData classes in CLDR tooling. 6873 </p> 6874 <pre><distinguishingItems 6875 exclude="true" 6876 elements="default measurementSystem mapping abbreviationFallback preferenceOrdering" 6877 attributes="type"/> 6878</pre> 6879 <h3> 6880 <a name="Canonical_Form" href="#Canonical_Form">5.6 Canonical 6881 Form</a> 6882 </h3> 6883 <p>The following are restrictions on the format of LDML files to 6884 allow for easier parsing and comparison of files.</p> 6885 <p>Peer elements have consistent order. That is, if the DTD or 6886 this specification requires the following order in an element foo:</p> 6887 <pre><foo> 6888 <pattern> 6889 <somethingElse> 6890</foo></pre> 6891 <p>It can never require the reverse order in a different element 6892 bar.</p> 6893 <pre><foo> 6894 <somethingElse> 6895 <pattern> 6896</foo></pre> 6897 <p>Note that there was one case that had to be corrected in order 6898 to make this true. For that reason, pattern occurs twice under 6899 currency:</p> 6900 <pre class="dtd"><!ELEMENT currency (alias | (pattern*, displayName?, symbol?, pattern*, 6901decimal?, group?, special*)) ></pre> 6902 <p> 6903 <a href="http://www.w3.org/TR/REC-xml/">XML</a> files can have a wide 6904 variation in textual form, while representing precisely the same 6905 data. By putting the LDML files in the repository into a canonical 6906 form, this allows us to use the simple diff tools used widely (and in 6907 CVS) to detect differences when vetting changes, without those tools 6908 being confused. This is not a requirement on other uses of LDML; just 6909 simply a way to manage repository data more easily. 6910 </p> 6911 <h4> 6912 <a name="Content" href="#Content">5.6.1 Content</a> 6913 </h4> 6914 <ol> 6915 <li>All start elements are on their own line, indented by <i>depth</i> 6916 tabs. 6917 </li> 6918 <li>All end elements (except for leaf nodes) are on their own 6919 line, indented by <i>depth</i> tabs. 6920 </li> 6921 <li>Any leaf node with empty content is in the form 6922 <foo/>.</li> 6923 <li>There are no blank lines except within comments or content.</li> 6924 <li>Spaces are used within a start element. There are no extra 6925 spaces within elements. 6926 <ul> 6927 <li><code><version number="1.2"/></code>, not 6928 <code><version number = "1.2" /></code></li> 6929 <li><code></identity></code>, not <code></identity 6930 ></code></li> 6931 </ul> 6932 </li> 6933 <li>All attribute values use double quote ("), not single 6934 (').</li> 6935 <li>There are no CDATA sections, and no escapes except those 6936 absolutely required. 6937 <ul> 6938 <li>no &apos; since it is not necessary</li> 6939 <li>no '&#x61;', it would be just 'a'</li> 6940 </ul> 6941 </li> 6942 <li>All attributes with defaulted values are suppressed.</li> 6943 <li>The draft and alt="proposed.*" attributes are only 6944 on leaf elements.</li> 6945 <li>The tzid are canonicalized in the following way: 6946 <ol> 6947 <li type="a">All tzids as of as CLDR 1.1 (2004.06.08) in 6948 zone.tab are canonical.</li> 6949 <li>After that point, the first time a tzid is introduced, 6950 that is the canonical form.</li> 6951 </ol> 6952 <p> 6953 That is, new IDs are added, but existing ones keep the original 6954 form. The <i>TZ</i> timezone database keeps a set of equivalences 6955 in the "backward" file. These are used to map other tzids 6956 to the canonical form. For example, when 6957 <code>America/Argentina/Catamarca</code> 6958 was introduced as the new name for the previous 6959 <code>America/Catamarca</code> 6960 , a link was added in the backward file. 6961 </p> 6962 <p> 6963 <code>Link America/Argentina/Catamarca America/Catamarca</code> 6964 </p> 6965 </li> 6966 </ol> 6967 <p> 6968 <i>Example:</i> 6969 </p> 6970 <pre><ldml draft="unconfirmed" > 6971 <identity> 6972 <version number="1.2"/> 6973 <language type="en"/> 6974 <territory type="AS"/> 6975 </identity> 6976 <numbers> 6977 <currencyFormats> 6978 <currencyFormatLength> 6979 <currencyFormat> 6980 <pattern>¤#,##0.00;(¤#,##0.00)</pattern> 6981 </currencyFormat> 6982 </currencyFormatLength> 6983 </currencyFormats> 6984 </numbers> 6985</ldml></pre> 6986 <h4> 6987 <a name="Ordering" href="#Ordering">5.6.2 Ordering</a> 6988 </h4> 6989 <p>An element is ordered first by the element name, and then if 6990 the element names are identical, by the sorted set of attribute-value 6991 pairs. For the latter, compare the first pair in each (in sorted 6992 order by attribute pair). If not identical, go to the second pair, 6993 and so on.</p> 6994 <p>Elements and attributes are ordered according to their order in 6995 the respective DTDs. Attribute value comparison is a bit more 6996 complicated, and may depend on the attribute and type. This is 6997 currently done with specific ordering tables.</p> 6998 <p> 6999 Any future additions to the DTD must be structured so as to allow 7000 compatibility with this ordering. See also <a 7001 href="#Valid_Attribute_Values">Section 5.5 Valid Attribute 7002 Values.</a> 7003 </p> 7004 7005 <h4> 7006 <a name="Comments" href="#Comments">5.6.3 Comments</a> 7007 </h4> 7008 <ol> 7009 <li>Comments are of the form <!-- <i>stuff</i> -->. 7010 </li> 7011 <li>They are logically attached to a node. There are 4 kinds: 7012 <ol> 7013 <li>Inline always appear after a leaf node, on the same line 7014 at the end. These are a single line.</li> 7015 <li>Preblock comments always precede the attachment node, and 7016 are indented on the same level.</li> 7017 <li>Postblock comments always follow the attachment node, and 7018 are indented on the same level.</li> 7019 <li>Final comment, after </ldml></li> 7020 </ol> 7021 </li> 7022 <li>Multiline comments (except the final comment) have each line 7023 after the first indented to one deeper level.</li> 7024 </ol> 7025 <p> 7026 <b>Examples:</b> 7027 </p> 7028 <pre><eraAbbr> 7029 <era type="0">BC</era> <!-- might add alternate BDE in the future --> 7030... 7031<timeZoneNames> 7032 <!-- Note: zones that do not use daylight time need further work --> 7033 <zone type="America/Los_Angeles"> 7034 ... 7035 <!-- Note: the following is known to be sparse, 7036 and needs to be improved in the future --> 7037 <zone type="Asia/Jerusalem"></pre> 7038 7039 <h3> 7040 <a name="DTD_Annotations" href="#DTD_Annotations">5.7 DTD Annotations</a> 7041 </h3> 7042 <p>The information in a standard DTD is insufficient for use in CLDR. To make up for that, DTD annotations are added. These are of the form<br> 7043 <!--@...--><br> 7044 and are included below the !ELEMENT or !ATTLIST line that they apply to. The current annotations are:</p> 7045 <table> 7046 <tr><th>Type</th><th>Description</th></tr> 7047 <tr> 7048 <td><!--@VALUE--></td> 7049 <td>The attribute is not distinguishing, and is treated like an element value</td></tr> 7050 <tr> 7051 <td><!--@METADATA--></td> 7052 <td>The attribute is a “comment” on the data, like the draft status. It is not typically used in implementations.</td> 7053 </tr> 7054 <tr> 7055 <td><!--@ORDERED--></td> 7056 <td>The element's children are ordered, and do not inherit.</td> 7057 </tr> 7058 <tr> 7059 <td><!--@DEPRECATED--></td> 7060 <td>The element or attribute is deprecated, and should not be used.</td> 7061 </tr> 7062 <tr> 7063 <td><!--@DEPRECATED: attribute-value1, attribute-value2--></td> 7064 <td>The attribute values are deprecated, and should not be used. Spaces 7065 between tokens are not significant.</td> 7066 </tr> 7067 </table> 7068 7069 <p> There is additional information in the attributeValueValidity.xml 7070 file that is used internally for testing. For example, the following 7071 line indicates that the 'currency' element in the ldml dtd must have 7072 values from the bcp47 'cu' type.</p> 7073 <p class='example'> <attributeValues dtds='ldml' elements='currency' 7074 attributes='type'>$_bcp47_cu</attributeValues></p> 7075 <p>The element values may be literals, regular expressions, or variables 7076 (some of which are set programmatically according to other CLDR data, 7077 such as the above. However, the information as this point does not 7078 cover all attribute values, is used only for testing, and should not 7079 be used in implementations since the structure may change without 7080 notice.</p> 7081 7082 <h2> 7083 <a name="Property_Data" href="#Property_Data">6 Property Data</a> 7084 </h2> 7085 <p>Some data in CLDR does not use an XML format, but rather a 7086 semicolon-delimited format derived from that of the Unicode Character 7087 Database. That is because the data is more likely to be parsed by 7088 implementations that already parse UCD data. Those files are present 7089 in the common/properties directory.</p> 7090 <p>Each file has a header that explains the format and usage of 7091 the data.</p> 7092 <h3><a name="Script_Metadata" href="#Script_Metadata">6.1 Script Metadata</a></h3> 7093 <p><code>scriptMetadata.txt</code>: </p> 7094 <p>This file provides general information about scripts that may be useful to implementations processing text. The information is the best currently available, and may change between versions of CLDR. The format is similar to Unicode Character Database property file, and is documented in the header of the data file.</p> 7095 <h3><a name="Extended_Pictographic" href="#Extended_Pictographic">6.2 Extended Pictographic</a> </h3> 7096 <p><code>ExtendedPictographic.txt</code></p> 7097 <p>This file was used to define the ExtendedPictographic data used for “future-proofing” emoji behavior, especially in segmentation. As of Emoji version 11.0, the set of Extended_Pictographic is incorporated into the emoji data files found at <a href="https://unicode.org/Public/emoji/">unicode.org/Public/emoji/</a>.</p> 7098 7099 7100 7101 7102 7103 7104 7105 7106 7107 7108 7109 7110 7111 7112 7113 7114 7115 <h3><a name="Labels.txt" href="#Labels.txt">6.3 Labels.txt</a> </h3> 7116 <p><code>labels.txt</code>: </p> 7117 <p>This file provides general information about associations of labels to characters that may be useful to implementations of character-picking applications. The information is the best currently available, and may change between versions of CLDR. The format is similar to Unicode Character Database property file, and is documented in the header of the data file.</p> 7118 <p>Initially, the contents are focused on emoji, but may be expanded in the future to other types of characters. Note that a character may have multiple labels.</p> 7119 7120 <h2> 7121 <a name="Format_Parse_Issues" href="#Format_Parse_Issues">7 7122 Issues in Formatting and Parsing</a> 7123 </h2> 7124 <h3> 7125 <a name="Lenient_Parsing" href="#Lenient_Parsing">7.1 Lenient Parsing</a> 7126 </h3> 7127 <h4> 7128 <a name="Motivation" href="#Motivation">7.1.1 Motivation</a> 7129 </h4> 7130 <p>User input is frequently messy. Attempting to parse it by 7131 matching it exactly against a pattern is likely to be unsuccessful, 7132 even when the meaning of the input is clear to a human being. For 7133 example, for a date pattern of "MM/dd/yy", the input 7134 "June 1, 2006" will fail.</p> 7135 <p>The goal of lenient parsing is to accept user input whenever it 7136 is possible to decipher what the user intended. Doing so requires 7137 using patterns as data to guide the parsing process, rather than an 7138 exact template that must be matched. This informative section 7139 suggests some heuristics that may be useful for lenient parsing of 7140 dates, times, and numbers.</p> 7141 <h4> 7142 <a name="Loose_Matching" href="#Loose_Matching">7.1.2 Loose Matching</a> 7143 </h4> 7144 <p>Loose matching ignores attributes of the strings being compared 7145 that are not important to matching. It involves the following steps:</p> 7146 <ul> 7147 <li>Remove "." from currency symbols and other fields 7148 used for matching, and also from the input string unless: 7149 <ul> 7150 <li>"." is in the decimal set, and</li> 7151 <li>its position in the input string is immediately before a 7152 decimal digit</li> 7153 </ul> 7154 </li> 7155 <li>Ignore all format characters: in particular, ignore any 7156 RLM, LRM or ALM used to control BIDI formatting.</li> 7157 <li>Ignore all characters in [:Zs:] unless they occur between 7158 letters. (In the heuristics below, even those between letters are 7159 ignored except to delimit fields)</li> 7160 <li>Map all characters in [:Dash:] to U+002D HYPHEN-MINUS</li> 7161 <li>Use the data in the <character-fallback> element to 7162 map equivalent characters (for example, curly to straight 7163 apostrophes). Other apostrophe-like characters should also be 7164 treated as equivalent, especially if the character actually used in 7165 a format may be unavailable on some keyboards. For example: 7166 <ul> 7167 <li>U+02BB MODIFIER LETTER TURNED COMMA (ʻ) might be typed 7168 instead as U+2018 LEFT SINGLE QUOTATION MARK (‘).</li> 7169 <li>U+02BC MODIFIER LETTER APOSTROPHE (ʼ) might be typed 7170 instead as U+2019 RIGHT SINGLE QUOTATION MARK (’), U+0027 7171 APOSTROPHE, etc.</li> 7172 <li>U+05F3 HEBREW PUNCTUATION GERESH (׳) might be typed 7173 instead as U+0027 APOSTROPHE.</li> 7174 </ul> 7175 </li> 7176 <li>Apply mappings particular to the domain (i.e., for dates or 7177 for numbers, discussed in more detail below)</li> 7178 <li>Apply case folding (possibly including language-specific 7179 mappings such as Turkish i)</li> 7180 <li>Normalize to NFKC; thus <i>no-break space</i> will map to <i> 7181 space</i>; half-width <i>katakana</i> will map to full-width. 7182 </li> 7183 </ul> 7184 <p>Loose matching involves (logically) applying the above 7185 transform to both the input text and to each of the field elements 7186 used in matching, before applying the specific heuristics below. For 7187 example, if the input number text is " - NA f. 1,000.00", 7188 then it is mapped to "-naf1,000.00" before processing. The 7189 currency signs are also transformed, so "NA f." is 7190 converted to "naf" for purposes of matching. As with other 7191 Unicode algorithms, this is a logical statement of the process; 7192 actual implementations can optimize, such as by applying the 7193 transform incrementally during matching.</p> 7194 <h3> 7195 <a name="Invalid_Patterns" href="#Invalid_Patterns">7.2 Handling 7196 Invalid Patterns</a> 7197 </h3> 7198 <p>Processes sometimes encounter invalid number or 7199 date patterns, such as a number pattern with “¤¤¤¤¤” (valid pattern 7200 character but invalid length in current CLDR), a date pattern with 7201 “nn” (invalid pattern character in current CLDR), or a date pattern 7202 with “MMMMMM” (invalid length in current CLDR). The recommended 7203 behavior for handling such an invalid pattern field is:</p> 7204 <ul> 7205 <li>For a field using a currently-invalid length for a valid 7206 pattern character: 7207 <ul> 7208 <li>In <strong>formatting, </strong>emit U+FFFD REPLACEMENT 7209 CHARACTER for the invalid field. 7210 </li> 7211 <li>In <strong>parsing, </strong>the field may be parsed as if 7212 it had a valid length. 7213 </li> 7214 </ul> 7215 </li> 7216 <li>For a pattern that contains a currently-invalid pattern 7217 character (applies only to date patterns, for which A-Za-z are 7218 reserved as pattern characters but not all defined as valid): 7219 <ul> 7220 <li>Produce an error (set an error code or throw an exception) 7221 when an attempt is made to create a formatter with such a pattern 7222 or to apply such a pattern to an existing formatter.</li> 7223 </ul> 7224 </li> 7225 </ul> 7226 <h2> 7227 <a name="Deprecated_Structure" href="#Deprecated_Structure">Annex A 7228 Deprecated Structure</a> 7229 </h2> 7230 <p>The deprecated elements, attributes, and values are listed in 7231 the supplementalMetadata.xml file, under <deprecatedItems>. 7232 While valid LDML, it is strongly discouraged, and no longer used in 7233 CLDR.</p> 7234 <p>The remainder of this section describes selected cases of 7235 deprecated structure that were present in previous versions of CLDR. 7236 </p> 7237 <h3> 7238 <a name="Fallback_Elements" href="#Fallback_Elements">A.1 Element 7239 fallback</a> 7240 </h3> 7241 <p class="dtd"><!ELEMENT fallback (#PCDATA) ></p> 7242 <p> 7243 The fallback element is deprecated. Implementations should use 7244 instead the information in <em><a href="#LanguageMatching">Section 7245 4.4 Language Matching</a></em> for doing language fallback. 7246 </p> 7247 <h3> 7248 <a name="BCP47_Keyword_Mapping" href="#BCP47_Keyword_Mapping">A.2 7249 BCP 47 Keyword Mapping</a> 7250 </h3> 7251 7252 <p> 7253 <b>Note:</b> <i>This structure is deprecated and replaced with <a 7254 href="#Unicode_Locale_Extension_Data_Files">Section 3.6.4 U 7255 Extension Data Files</a>. 7256 </i> 7257 </p> 7258 7259 <p class="dtd"> 7260 <!ELEMENT bcp47KeywordMappings ( mapKeys?, mapTypes* ) ><br> 7261 <!ELEMENT mapKeys ( keyMap* ) ><br> <!ELEMENT keyMap 7262 EMPTY ><br> <!ATTLIST keyMap type NMTOKEN #REQUIRED ><br> 7263 <!ATTLIST keyMap bcp47 NMTOKEN #REQUIRED ><br> 7264 <!ELEMENT mapTypes ( typeMap* ) ><br> <!ATTLIST 7265 mapTypes type NMTOKEN #REQUIRED ><br> <!ELEMENT typeMap 7266 EMPTY ><br> <!ATTLIST typeMap type CDATA #REQUIRED ><br> 7267 <!ATTLIST typeMap bcp47 NMTOKEN #REQUIRED ><br> 7268 </p> 7269 <p> 7270 This section defines mappings between old Unicode locale identifier 7271 key/type values and their BCP 47 'u' extension subtag 7272 representations. The 'u' extension syntax described in <a 7273 href="#u_Extension">Section 3.6 Unicode BCP 47 U Extension</a> 7274 restricts a key to two ASCII alphanumerics and a type to three to 7275 eight ASCII alphanumerics. A key or a type which does not meet that 7276 syntax requirement is converted according to the mapping data defined 7277 by the mapKeys or mapTypes elements. For example, a keyword 7278 "collation=phonebook" is converted to BCP 47 'u' extension subtags 7279 "co-phonebk" by the mapping data below: 7280 </p> 7281 <pre> <mapKeys> 7282 ... 7283 <keyMap type="collation" bcp47="co"/> 7284 ... 7285 </mapKeys> 7286 <mapTypes type="collation"> 7287 ... 7288 <typeMap type="phonebook" bcp47="phonebk"/> 7289 ... 7290 </mapTypes> 7291 </pre> 7292 <h3> 7293 <a name="Choice_Patterns" href="#Choice_Patterns">A.3 Choice 7294 Patterns</a> 7295 </h3> 7296 <p> 7297 <b>Note:</b> <i>This structure is deprecated and replaced with 7298 count attributes.</i> 7299 </p> 7300 <p>A choice pattern is a string that chooses among a number of 7301 strings, based on numeric value. It has the following form:</p> 7302 <p> 7303 <choice_pattern> = <choice> ( '|' <choice> 7304 )*<br> <choice> = 7305 <number><relation><string><br> <number> 7306 = ('+' | '-')? (<font size="3">'∞' | 7307 [0-9]+ ('.' [0-9]+)?)<br> <relation> = 7308 '<' | ' 7309 </font><span style="color: blue">≤'</span> 7310 </p> 7311 <p>The interpretation of a choice pattern is that given a number 7312 N, the pattern is scanned from right to left, for each choice 7313 evaluating <number> <relation> N. The first choice that 7314 matches results in the corresponding string. If no match is found, 7315 then the first string is used. For example:</p> 7316 <table border="1" cellpadding="0" cellspacing="0"> 7317 <tr> 7318 <td width="33%">Pattern</td> 7319 <td width="33%">N</td> 7320 <td width="34%">Result</td> 7321 </tr> 7322 <tr> 7323 <td width="33%" rowspan="4">0≤Rf|1≤Ru|1<Re</td> 7324 <td width="33%">-<font size="3">∞, </font>-3, -1, -0.000001 7325 </td> 7326 <td width="34%">Rf (defaulted to first string)</td> 7327 </tr> 7328 <tr> 7329 <td width="33%">0, 0.01, 0.9999</td> 7330 <td width="34%">Rf</td> 7331 </tr> 7332 <tr> 7333 <td width="33%">1</td> 7334 <td width="34%">Ru</td> 7335 </tr> 7336 <tr> 7337 <td width="33%">1.00001, 5, 99, <font size="3">∞</font></td> 7338 <td width="34%">Re</td> 7339 </tr> 7340 </table> 7341 <p>Quoting is done using ' characters, as in date or number 7342 formats.</p> 7343 <h3> 7344 <a name="Element_default" href="#Element_default">A.4 Element 7345 default</a> 7346 </h3> 7347 <p> 7348 <b>Note:</b> <i>This structure is deprecated. </i> Use replacement 7349 structure instead, for example: 7350 </p> 7351 <ul> 7352 <li>For <collations>, now use the <defaultCollation> 7353 element.</li> 7354 <li>For <calendars>, the default calendar type for a 7355 locale is now specified by <i><a 7356 href="tr35-dates.html#Calendar_Preference_Data">Calendar 7357 Preference Data</a></i>. 7358 </li> 7359 </ul> 7360 <p>In some cases, a number of elements are present. The default 7361 element can be used to indicate which of them is the default, in the 7362 absence of other information. The value of the choice attribute is to 7363 match the value of the type attribute for the selected item.</p> 7364 <pre><timeFormats> 7365 <default choice="<span style="color: red">medium</span>" /> 7366 <timeFormatLength type="<span style="color: blue">full</span>"> 7367 <timeFormat type="<span style="color: blue">standard</span>"> 7368 <pattern type="<span style="color: blue">standard</span>"><span 7369 style="color: blue">h:mm:ss a z</span></pattern> 7370 </timeFormat> 7371 </timeFormatLength> 7372 <timeFormatLength type="<span style="color: blue">long</span>"> 7373 <timeFormat type="<span style="color: blue">standard</span>"> 7374 <pattern type="<span style="color: blue">standard</span>"><span 7375 style="color: blue">h:mm:ss a z</span></pattern> 7376 </timeFormat> 7377 </timeFormatLength> 7378 <timeFormatLength type="<span style="color: red">medium</span>"> 7379 <timeFormat type="<span style="color: blue">standard</span>"> 7380 <pattern type="<span style="color: blue">standard</span>"><span 7381 style="color: blue">h:mm:ss a</span></pattern> 7382 </timeFormat> 7383 </timeFormatLength> 7384...</pre> 7385 <p>Like all other elements, the <default> element is 7386 inherited. Thus, it can also refer to inherited resources. For 7387 example, suppose that the above resources are present in fr, and that 7388 in fr_BE we have the following:</p> 7389 <pre><timeFormats> 7390 <default choice="<span style="color: red">long</span>"/> 7391</timeFormats></pre> 7392 <p>In that case, the default time format for fr_BE would be the 7393 inherited "long" resource from fr. Now suppose that we had 7394 in fr_CA:</p> 7395 <pre> <timeFormatLength type="<span style="color: red">medium</span>"> 7396 <timeFormat type="<span style="color: blue">standard</span>"> 7397 <pattern type="<span style="color: blue">standard</span>"><span 7398 style="color: blue">...</span></pattern> 7399 </timeFormat> 7400 </timeFormatLength> 7401 </pre> 7402 <p>In this case, the <default> is inherited from fr, and has 7403 the value "medium". It thus refers to this new 7404 "medium" pattern in this resource bundle.</p> 7405 <h3> 7406 <a name="Deprecated_Common_Attributes" 7407 href="#Deprecated_Common_Attributes">A.5 Deprecated Common 7408 Attributes</a> 7409 </h3> 7410 <h4> 7411 <a name="Attribute_standard" href="#Attribute_standard">A.5.1 Attribute standard</a> 7412 </h4> 7413 <p class="element2"> 7414 <b>Note: </b>This attribute is deprecated. Instead, use a reference 7415 element with the attribute standard="true". 7416 </p> 7417 <p>The value of this attribute is a list of strings representing 7418 standards: international, national, organization, or vendor 7419 standards. The presence of this attribute indicates that the data in 7420 this element is compliant with the indicated standards. Where 7421 possible, for uniqueness, the string should be a URL that represents 7422 that standard. The strings are separated by commas; leading or 7423 trailing spaces on each string are not significant. Examples:</p> 7424 <p> 7425 <code> 7426 <collation standard="<span style="color: blue">MSA 7427 200:2002</span>"><br> ...<br> <dateFormatStyle 7428 standard=”http://www.iso.ch/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=26780&amp;ICS1=1&amp;ICS2=140&amp;ICS3=30”> 7429 </code> 7430 </p> 7431 7432 <h4> 7433 <a name="Attribute_draft_nonLeaf" href="#Attribute_draft_nonLeaf">A.5.2 7434 Attribute draft in non-leaf elements</a> 7435 </h4> 7436 <p>The draft attribute is deprecated except in 7437 leaf elements (elements that do not have any subelements)</p> 7438 7439 <h3> 7440 <a name="Element_base" href="#Element_base">A.6 Element base</a> 7441 </h3> 7442 <p> 7443 <b>Note:</b> <i>This element is deprecated.</i> Use the collation 7444 <import> element instead. 7445 </p> 7446 <p> 7447 The optional base element 7448 <code> 7449 <base><span style="color: blue">...</span></base> 7450 </code> 7451 , contains an alias element that points to another data source that 7452 defines a <i>base </i>collation. If present, it indicates that the 7453 settings and rules in the collation are modifications applied on <i>top 7454 of the</i> respective elements in the base collation. That is, any 7455 successive settings, where present, override what is in the base as 7456 described in <a href="tr35-collation.html#Setting_Options">Setting 7457 Options</a>. Any successive rules are concatenated to the end of the 7458 rules in the base. The results of multiple rules applying to the same 7459 characters is covered in <a href="tr35-collation.html#Orderings">Orderings</a>. 7460 </p> 7461 7462 <h3> 7463 <a name="Element_rules" href="#Element_rules">A.7 Element rules</a> 7464 </h3> 7465 <p> 7466 <b>Note:</b> <i>The XML collation syntax is deprecated; this 7467 includes the <rules> element and its subelements, except that 7468 the <import> element has been moved up to be a subelement of 7469 <collation>.</i> Use the basic collation syntax with the <a 7470 href="tr35-collation.html#Rules"><cr> element</a> instead. 7471 </p> 7472 <p class="dtd"><!ELEMENT rules (alias | ( ( reset | import ), ( 7473 reset | import | p | pc | s | sc | t | tc | i | ic | x)* )) ></p> 7474 7475 <h3> 7476 <a name="Deprecated_subelements_of_dates" 7477 href="#Deprecated_subelements_of_dates">A.8 Deprecated 7478 subelements of <dates></a> 7479 </h3> 7480 <ul> 7481 <li><localizedPatternChars></li> 7482 <li><dateRangePattern>, replaced by 7483 <intervalFormats>.</li> 7484 </ul> 7485 7486 <h3> 7487 <a name="Deprecated_subelements_of_calendars" 7488 href="#Deprecated_subelements_of_calendars">A.9 Deprecated 7489 subelements of <calendars></a> 7490 </h3> 7491 <ul> 7492 <li><monthNames> and <monthAbbr>; month name forms 7493 are specified in the <months> element. The older monthNames, 7494 monthAbbr are equivalent to: using the months element with the 7495 context type="<span style="color: blue">format</span>" and 7496 the width type="<span style="color: blue">wide</span>" 7497 (for ...Names) and type="<span style="color: blue">narrow</span>" 7498 (for ...Abbr), respectively. 7499 </li> 7500 <li><dayNames> and <dayAbbr>; weekday name forms are 7501 specified in the <days> element. The older dayNames, dayAbbr 7502 are equivalent to: using the days element with the context 7503 type="<span style="color: blue">format</span>" and the 7504 width type="<span style="color: blue">wide</span>" (for 7505 ...Names) and type="<span style="color: blue">narrow</span>" 7506 (for ...Abbr), respectively. 7507 </li> 7508 <li><a name="week" href="#week"><week></a> is deprecated 7509 in the main LDML files, because the data is more appropriately 7510 organized as connected to territories, not to linguistic data. Use 7511 the supplemental <weekData> element instead.</li> 7512 <li><am> and <pm>; these are now included as part of 7513 the <dayPeriods> element</li> 7514 <li><fields> is deprecated as a subelement of 7515 <calendars> instead, a <fields> element should be 7516 located just under a <dates> element. See <a 7517 href="tr35-dates.html#Calendar_Fields">Calendar Fields</a>. 7518 </li> 7519 </ul> 7520 7521 <h3> 7522 <a name="Deprecated_subelements_of_timeZoneNames" 7523 href="#Deprecated_subelements_of_timeZoneNames">A.10 Deprecated 7524 subelements of <timeZoneNames></a> 7525 </h3> 7526 <ul> 7527 <li><hoursFormat> e.g. "{0}/{1}" for 7528 "-0800/-0700"</li> 7529 <li><a name="fallbackRegionFormat" href="#fallbackRegionFormat"><fallbackRegionFormat></a> 7530 (deprecated), e.g. "{0} Time ({1})" for "United 7531 States Time (New York)"</li> 7532 <li><abbreviationFallback></li> 7533 <li><preferenceOrdering>, a preference ordering among 7534 modern zones; use metazones instead.</li> 7535 <li><singleCountries>, use <a 7536 href="tr35-dates.html#Primary_Zones">Primary Zones</a></li> 7537 </ul> 7538 7539 <h3> 7540 <a name="Deprecated_subelements_of_zone_metazone" 7541 href="#Deprecated_subelements_of_zone_metazone">A.11 Deprecated 7542 subelements of <zone> and <metazone></a> 7543 </h3> 7544 <ul> 7545 <li><commonlyUsed>, formerly used to indicate whether a 7546 zone was commonly used in the locale.</li> 7547 </ul> 7548 7549 <h3> 7550 <a name="Renamed_attribute_values_for_contextTransformUsage" 7551 href="#Renamed_attribute_values_for_contextTransformUsage">A.12 7552 Renamed attribute values for <contextTransformUsage> element</a> 7553 </h3> 7554 <p> 7555 The <contextTransformUsage> element was introduced in CLDR 21. 7556 The values for its <em>type</em> attribute are documented in <a 7557 href="tr35-general.html#contextTransformUsage_type_attribute_values"> 7558 <contextTransformUsage> type attribute values</a>. In CLDR 25, 7559 some of these values were renamed from their previous values for 7560 improved clarity: 7561 </p> 7562 <ul> 7563 <li>"type" was renamed to "keyValue"</li> 7564 <li>"displayName" was renamed to "currencyName"</li> 7565 <li>"displayName-count" was renamed to "currencyName-count"</li> 7566 <li>"tense" was renamed to "relative"</li> 7567 </ul> 7568 7569 <h3> 7570 <a name="Deprecated_subelements_of_segmentations" 7571 href="#Deprecated_subelements_of_segmentations">A.13 Deprecated 7572 subelements of <segmentations></a> 7573 </h3> 7574 <ul> 7575 <li><exceptions> and <exceptions> were deprecated 7576 and replaced with <suppressions> and <suppression>.</li> 7577 </ul> 7578 <h3> 7579 <a name="Element_cp" href="#Element_cp">A.14 Element cp</a> 7580 </h3> 7581 <p>The cp element was used to escape characters that cannot be 7582 represented in XML, even with NCRs. These escapes were only allowed 7583 in certain elements, according to the DTD.</p> 7584 <p>However, this mechanism is very clumsy, and was replaced by 7585 specialized syntax.</p> 7586 <table> 7587 <tr> 7588 <th>Code Point</th> 7589 <th>XML Example</th> 7590 </tr> 7591 <tr> 7592 <td><code>U+0000</code></td> 7593 <td><code><cp hex="0"></code></td> 7594 </tr> 7595 </table> 7596 <p> </p> 7597 <h3> 7598 <a name="validSubLocales" href="#validSubLocales">A.15 Attribute 7599 validSubLocales</a> 7600 </h3> 7601 <p> 7602 The attribute <i>validSubLocales</i> allowed sublocales in a given 7603 tree to be treated as though a file for them were present when there 7604 was not one. It only had an effect for locales that inherit from the 7605 current file where a file is missing. 7606 </p> 7607 <p> 7608 <b>Example 1. </b>Suppose that in a particular LDML tree, there are 7609 no region locales for German, for example, there is a de.xml file, 7610 but no files for de_AT.xml, de_CH.xml, or de_DE.xml. Then no elements 7611 are valid for any of those region locales. If we want to mark one of 7612 those files as having valid elements, then we introduce an empty 7613 file, such as the following. 7614 </p> 7615 <p> 7616 <code> 7617 <ldml version="1.1"><br> <identity><br> 7618 <version number="1.1" /> <br> <language type="de" /> <br> 7619 <territory type="AT" /> <br> 7620 </identity><br> </ldml> 7621 </code> 7622 </p> 7623 <p> 7624 With the <i>validSubLocales</i> attribute, instead of adding the 7625 empty files for de_AT.xml, de_CH.xml, and de_DE.xml, in the de file 7626 we could add to the parent locale a list of the child locales that 7627 should behave as if files were present. 7628 </p> 7629 <p> 7630 <code> 7631 <ldml version="1.1" validSubLocales="de_AT de_CH 7632 de_DE"><br> <identity><br> 7633 <version number="1.1" /> <br> 7634 <language type="de" /> <br> 7635 </identity><br> ...<br> </ldml> 7636 </code> 7637 </p> 7638 <p> 7639 Now that the <i>validSubLocales</i> attribute has been deprecated, it 7640 is recommended to simply add empty files to specify which sublocales 7641 are valid. This convention is used throughout the CLDR. 7642 </p> 7643 <h3> 7644 <a name="postCodeElements" href="#postCodeElements">A.16 Elements 7645 postalCodeData, postCodeRegex</a> 7646 </h3> 7647 <p>The postal code validation data has been deprecated. Please see 7648 other services that are kept up to date, such as:</p> 7649 <ul> 7650 <li><a href="http://i18napis.appspot.com/address/data/US">http://i18napis.appspot.com/address/data/US</a></li> 7651 <li><a href="http://i18napis.appspot.com/address/data/CH">http://i18napis.appspot.com/address/data/CH</a></li> 7652 <li>...</li> 7653 </ul> 7654 <p> 7655 See <a href="tr35-info.html#Postal_Code_Validation">Postal Code 7656 Validation</a> 7657 </p> 7658 7659 <h3> 7660 <a name="telephoneCodeData" href="#telephoneCodeData">A.17 Element 7661 telephoneCodeData</a> 7662 </h3> 7663 <p>The element <telephoneCodeData> and its subelements have 7664 been deprecated and the data removed.</p> 7665 7666 <hr> 7667 <h2> 7668 <a name="Links_to_Other_Parts" href="#Links_to_Other_Parts">Annex B 7669 Links to Other Parts</a> 7670 </h2> 7671 <p> 7672 The LDML specification is split into several <a href="#Parts">parts</a> 7673 by topic, with one HTML document per part. The following tables 7674 provide redirects for links to specific topics. Please update your 7675 links and bookmarks. 7676 </p> 7677 7678 <p>Part 1 Links: Core (this document): No redirects needed.</p> 7679 7680 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 7681 <caption> 7682 <a href="#Part_2_Links" name="Part_2_Links">Part 2 Links</a>: <a 7683 href="tr35-general.html">General</a> (display names & 7684 transforms, etc.) 7685 </caption> 7686 <tr> 7687 <th>Old section</th> 7688 <th>Section in new part</th> 7689 </tr> 7690 <tr> 7691 <td>5.4 <a name="Display_Name_Elements" 7692 href="#Display_Name_Elements">Display Name Elements</a></td> 7693 <td>1 <a href="tr35-general.html#Display_Name_Elements">Display 7694 Name Elements</a></td> 7695 </tr> 7696 <tr> 7697 <td>5.5 <a name="Layout_Elements" href="#Layout_Elements">Layout 7698 Elements</a></td> 7699 <td>2 <a href="tr35-general.html#Layout_Elements">Layout 7700 Elements</a></td> 7701 </tr> 7702 <tr> 7703 <td>5.6 <a name="Character_Elements" href="#Character_Elements">Character 7704 Elements</a></td> 7705 <td>3 <a href="tr35-general.html#Character_Elements">Character 7706 Elements</a></td> 7707 </tr> 7708 <tr> 7709 <td>5.6.1 <a name="ExemplarSyntax" href="#ExemplarSyntax">Exemplar 7710 Syntax</a></td> 7711 <td>3.1 <a href="tr35-general.html#ExemplarSyntax">Exemplar 7712 Syntax</a></td> 7713 </tr> 7714 <tr> 7715 <td>5.6.2 Restrictions</td> 7716 <td>3.1 <a href="tr35-general.html#ExemplarSyntax">Exemplar 7717 Syntax</a></td> 7718 </tr> 7719 <tr> 7720 <td>5.6.3 Mapping</td> 7721 <td>3.2 <a href="tr35-general.html#Character_Mapping">Mapping</a></td> 7722 </tr> 7723 <tr> 7724 <td>5.6.4 <a name="IndexLabels" href="#IndexLabels">Index 7725 Labels</a></td> 7726 <td>3.3 <a href="tr35-general.html#IndexLabels">Index 7727 Labels</a></td> 7728 </tr> 7729 <tr> 7730 <td>5.6.5 Ellipsis</td> 7731 <td>3.4 <a href="tr35-general.html#Ellipsis">Ellipsis</a></td> 7732 </tr> 7733 <tr> 7734 <td>5.6.6 More Information</td> 7735 <td>3.5 <a href="tr35-general.html#Character_More_Info">More 7736 Information</a></td> 7737 </tr> 7738 <tr> 7739 <td>5.7 <a name="Delimiter_Elements" href="#Delimiter_Elements">Delimiter 7740 Elements</a></td> 7741 <td>4 <a href="tr35-general.html#Delimiter_Elements">Delimiter 7742 Elements</a></td> 7743 </tr> 7744 <tr> 7745 <td>C.6 <a name="Measurement_System_Data" 7746 href="#Measurement_System_Data">Measurement System Data</a></td> 7747 <td>5 <a href="tr35-general.html#Measurement_System_Data">Measurement 7748 System Data</a></td> 7749 </tr> 7750 <tr> 7751 <td>5.8 <a name="Measurement_Elements" 7752 href="#Measurement_Elements">Measurement Elements (deprecated)</a></td> 7753 <td>5.1 <a href="tr35-general.html#Measurement_Elements">Measurement 7754 Elements (deprecated)</a></td> 7755 </tr> 7756 <tr> 7757 <td>5.11 <a name="Unit_Elements" href="#Unit_Elements">Unit 7758 Elements</a></td> 7759 <td>6 <a href="tr35-general.html#Unit_Elements">Unit 7760 Elements</a></td> 7761 </tr> 7762 <tr> 7763 <td>5.12 <a name="POSIX_Elements" href="#POSIX_Elements">POSIX 7764 Elements</a></td> 7765 <td>7 <a href="tr35-general.html#POSIX_Elements">POSIX 7766 Elements</a></td> 7767 </tr> 7768 <tr> 7769 <td>5.13 <a name="Reference_Elements" 7770 href="#Reference_Elements">Reference Element</a></td> 7771 <td>8 <a href="tr35-general.html#Reference_Elements">Reference 7772 Element</a></td> 7773 </tr> 7774 <tr> 7775 <td>5.15 <a name="Segmentations" href="#Segmentations">Segmentations</a></td> 7776 <td>9 <a href="tr35-general.html#Segmentations">Segmentations</a></td> 7777 </tr> 7778 <tr> 7779 <td>5.15.1 <a name="Segmentation_Inheritance" 7780 href="#Segmentation_Inheritance">Segmentation Inheritance</a></td> 7781 <td>9.1 <a href="tr35-general.html#Segmentation_Inheritance">Segmentation 7782 Inheritance</a></td> 7783 </tr> 7784 <tr> 7785 <td>5.16 <a name="Transforms" href="#Transforms">Transforms</a></td> 7786 <td>10 <a href="tr35-general.html#Transforms">Transforms</a></td> 7787 </tr> 7788 <tr> 7789 <td>N <a name="Transform_Rules" href="#Transform_Rules">Transform 7790 Rules</a></td> 7791 <td>10.3 <a href="tr35-general.html#Transform_Rules_Syntax">Transform 7792 Rules Syntax</a></td> 7793 </tr> 7794 <tr> 7795 <td>5.18 <a name="ListPatterns" href="#ListPatterns">List 7796 Patterns</a></td> 7797 <td>11 <a href="tr35-general.html#ListPatterns">List 7798 Patterns</a></td> 7799 </tr> 7800 <tr> 7801 <td>C.20 <a name="List_Gender" href="#List_Gender">Gender 7802 of Lists</a></td> 7803 <td>11.1 <a href="tr35-general.html#List_Gender">Gender of 7804 Lists</a></td> 7805 </tr> 7806 <tr> 7807 <td>5.19 <a name="Context_Transform_Elements" 7808 href="#Context_Transform_Elements">ContextTransform Elements</a></td> 7809 <td>12 <a href="tr35-general.html#Context_Transform_Elements">ContextTransform 7810 Elements</a></td> 7811 </tr> 7812 <tr> 7813 <td></td> 7814 <td><a href="tr35-general.html#"></a></td> 7815 </tr> 7816 </table> 7817 7818 7819 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 7820 <caption> 7821 <a href="#Part_3_Links" name="Part_3_Links">Part 3 Links</a>: <a 7822 href="tr35-numbers.html">Numbers</a> (number & currency 7823 formatting) 7824 </caption> 7825 <tr> 7826 <th>Old section</th> 7827 <th>Section in new part</th> 7828 </tr> 7829 <tr> 7830 <td>C.13 <a name="Numbering_Systems" href="#Numbering_Systems">Numbering 7831 Systems</a></td> 7832 <td>1 <a href="tr35-numbers.html#Numbering_Systems">Numbering 7833 Systems</a></td> 7834 </tr> 7835 <tr> 7836 <td>5.10 <a name="Number_Elements" href="#Number_Elements">Number 7837 Elements</a></td> 7838 <td>2 <a href="tr35-numbers.html#Number_Elements">Number 7839 Elements</a></td> 7840 </tr> 7841 <tr> 7842 <td>5.10.1 <a name="Number_Symbols" href="#Number_Symbols">Number 7843 Symbols</a></td> 7844 <td>2.3 <a href="tr35-numbers.html#Number_Symbols">Number 7845 Symbols</a></td> 7846 </tr> 7847 <tr> 7848 <td>G <a name="Number_Format_Patterns" 7849 href="#Number_Format_Patterns">Number Format Patterns</a></td> 7850 <td>3 <a href="tr35-numbers.html#Number_Format_Patterns">Number 7851 Format Patterns</a></td> 7852 </tr> 7853 <tr> 7854 <td>5.10.2 <a name="Currencies" href="#Currencies">Currencies</a></td> 7855 <td>4 <a href="tr35-numbers.html#Currencies">Currencies</a></td> 7856 </tr> 7857 <tr> 7858 <td>C.1 <a name="Supplemental_Currency_Data" 7859 href="#Supplemental_Currency_Data">Supplemental Currency Data</a></td> 7860 <td>4.1 <a href="tr35-numbers.html#Supplemental_Currency_Data">Supplemental 7861 Currency Data</a></td> 7862 </tr> 7863 <tr> 7864 <td>C.11 <a name="Language_Plural_Rules" 7865 href="#Language_Plural_Rules">Language Plural Rules</a></td> 7866 <td>5 <a href="tr35-numbers.html#Language_Plural_Rules">Language 7867 Plural Rules</a></td> 7868 </tr> 7869 <tr> 7870 <td>5.17 <a name="Rule-Based_Number_Formatting" 7871 href="#Rule-Based_Number_Formatting">Rule-Based Number 7872 Formatting</a></td> 7873 <td>6 <a href="tr35-numbers.html#Rule-Based_Number_Formatting">Rule-Based 7874 Number Formatting</a></td> 7875 </tr> 7876 </table> 7877 7878 7879 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 7880 <caption> 7881 <a href="#Part_4_Links" name="Part_4_Links">Part 4 Links</a>: <a 7882 href="tr35-dates.html">Dates</a> (date, time, time zone formatting) 7883 </caption> 7884 <tr> 7885 <th>Old section</th> 7886 <th>Section in new part</th> 7887 </tr> 7888 <tr> 7889 <td><a name="Date_Elements" href="#Date_Elements">5.9 Date 7890 Elements</a></td> 7891 <td>1 <a 7892 href="tr35-dates.html#Overview_Dates_Element_Supplemental">Overview: 7893 Dates Element, Supplemental Date and Calendar Information</a></td> 7894 </tr> 7895 <tr> 7896 <td><a name="Calendar_Elements" href="#Calendar_Elements">5.9.1 7897 Calendar Elements</a></td> 7898 <td>2 <a href="tr35-dates.html#Calendar_Elements">Calendar 7899 Elements</a></td> 7900 </tr> 7901 <tr> 7902 <td><a name="months_days_quarters_eras" 7903 href="#months_days_quarters_eras">Elements months, days, 7904 quarters, eras</a></td> 7905 <td>2.1 <a href="tr35-dates.html#months_days_quarters_eras">Elements 7906 months, days, quarters, eras</a></td> 7907 </tr> 7908 <tr> 7909 <td><a name="monthPatterns_cyclicNameSets" 7910 href="#monthPatterns_cyclicNameSets">Elements monthPatterns, 7911 cyclicNameSets</a></td> 7912 <td>2.2 <a href="tr35-dates.html#monthPatterns_cyclicNameSets">Elements 7913 monthPatterns, cyclicNameSets</a></td> 7914 </tr> 7915 <tr> 7916 <td><a name="dayPeriods" href="#dayPeriods">Element 7917 dayPeriods</a></td> 7918 <td>2.3 <a href="tr35-dates.html#dayPeriods">Element 7919 dayPeriods</a></td> 7920 </tr> 7921 <tr> 7922 <td><a name="dateFormats" href="#dateFormats">Element 7923 dateFormats</a></td> 7924 <td>2.4 <a href="tr35-dates.html#dateFormats">Element 7925 dateFormats</a></td> 7926 </tr> 7927 <tr> 7928 <td><a name="timeFormats" href="#timeFormats">Element 7929 timeFormats</a></td> 7930 <td>2.5 <a href="tr35-dates.html#timeFormats">Element 7931 timeFormats</a></td> 7932 </tr> 7933 <tr> 7934 <td><a name="dateTimeFormats" href="#dateTimeFormats">Element 7935 dateTimeFormats</a></td> 7936 <td>2.6 <a href="tr35-dates.html#dateTimeFormats">Element 7937 dateTimeFormats</a></td> 7938 </tr> 7939 <tr> 7940 <td><a name="Calendar_Fields" href="#Calendar_Fields">5.9.2 7941 Calendar Fields</a></td> 7942 <td>3 <a href="tr35-dates.html#Calendar_Fields">Calendar 7943 Fields</a></td> 7944 </tr> 7945 <tr> 7946 <td>5.9.3 <a name="Timezone_Names" href="#Timezone_Names">Time 7947 Zone Names</a></td> 7948 <td>5 <a href="tr35-dates.html#Time_Zone_Names">Time Zone 7949 Names</a></td> 7950 </tr> 7951 <tr> 7952 <td><a name="Supplemental_Calendar_Data" 7953 href="#Supplemental_Calendar_Data">C.5 Supplemental Calendar 7954 Data</a></td> 7955 <td>4 <a href="tr35-dates.html#Supplemental_Calendar_Data">Supplemental 7956 Calendar Data</a></td> 7957 </tr> 7958 <tr> 7959 <td><a name="Supplemental_Timezone_Data" 7960 href="#Supplemental_Timezone_Data">C.7 Supplemental Time Zone 7961 Data</a></td> 7962 <td>6 <a href="tr35-dates.html#Supplemental_Time_Zone_Data">Supplemental 7963 Time Zone Data</a></td> 7964 </tr> 7965 <tr> 7966 <td><a name="Calendar_Preference_Data" 7967 href="#Calendar_Preference_Data">C.15 Calendar Preference Data</a></td> 7968 <td>4.2 <a href="tr35-dates.html#Calendar_Preference_Data">Calendar 7969 Preference Data</a></td> 7970 </tr> 7971 <tr> 7972 <td><a name="DayPeriodRules" href="#DayPeriodRules">C.17 7973 DayPeriod Rules</a></td> 7974 <td>4.5 <a href="tr35-dates.html#Day_Period_Rules">Day 7975 Period Rules</a></td> 7976 </tr> 7977 <tr> 7978 <td><a name="Date_Format_Patterns" href="#Date_Format_Patterns">Appendix 7979 F: Date Format Patterns</a></td> 7980 <td>8 <a href="tr35-dates.html#Date_Format_Patterns">Date 7981 Format Patterns</a></td> 7982 </tr> 7983 <tr> 7984 <td><a name="Date_Field_Symbol_Table" 7985 href="#Date_Field_Symbol_Table">Date Field Symbol Table</a></td> 7986 <td><a href="tr35-dates.html#Date_Field_Symbol_Table">Date 7987 Field Symbol Table</a></td> 7988 </tr> 7989 <tr> 7990 <td><a name="Localized_Pattern_Characters" 7991 href="#Localized_Pattern_Characters">F.1 Localized Pattern 7992 Characters (deprecated)</a></td> 7993 <td>8.1 <a href="tr35-dates.html#Localized_Pattern_Characters">Localized 7994 Pattern Characters (deprecated)</a></td> 7995 </tr> 7996 <tr> 7997 <td><a name="Time_Zone_Fallback" href="#Time_Zone_Fallback">Appendix 7998 J: Time Zone Display Names</a></td> 7999 <td>7 <a href="tr35-dates.html#Using_Time_Zone_Names">Using 8000 Time Zone Names</a></td> 8001 </tr> 8002 <tr> 8003 <td><a name="fallbackFormat" href="#fallbackFormat"><b>fallbackFormat</b>:</a></td> 8004 <td><a href="tr35-dates.html#fallbackFormat"><b>fallbackFormat</b>:</a></td> 8005 </tr> 8006 <tr> 8007 <td>O.4 Parsing Dates and Times</td> 8008 <td>9 <a href="tr35-dates.html#Parsing_Dates_Times">Parsing 8009 Dates and Times</a></td> 8010 </tr> 8011 </table> 8012 8013 8014 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 8015 <caption> 8016 <a href="#Part_5_Links" name="Part_5_Links">Part 5 Links</a>: <a 8017 href="tr35-collation.html">Collation</a> (sorting, searching, 8018 grouping) 8019 </caption> 8020 <tr> 8021 <th>Old section</th> 8022 <th>Section in new part</th> 8023 </tr> 8024 <tr> 8025 <td>5.14 <a name="Collation_Elements" 8026 href="#Collation_Elements">Collation Elements</a></td> 8027 <td>3 <a href="tr35-collation.html#Collation_Tailorings">Collation 8028 Tailorings</a></td> 8029 </tr> 8030 <tr> 8031 <td>5.14.1 <a name="Collation_Version" 8032 href="#Collation_Version">Version</a></td> 8033 <td>3.1 <a href="tr35-collation.html#Collation_Version">Version</a></td> 8034 </tr> 8035 <tr> 8036 <td>5.14.2 <a name="Collation_Element" 8037 href="#Collation_Element">Collation Element</a></td> 8038 <td>3.2 <a href="tr35-collation.html#Collation_Element">Collation 8039 Element</a></td> 8040 </tr> 8041 <tr> 8042 <td>5.14.3 <a name="Setting_Options" href="#Setting_Options">Setting 8043 Options</a></td> 8044 <td>3.3 <a href="tr35-collation.html#Setting_Options">Setting 8045 Options</a></td> 8046 </tr> 8047 <tr> 8048 <td>Table <a name="Collation_Settings" 8049 href="#Collation_Settings">Collation Settings</a></td> 8050 <td>Table <a href="tr35-collation.html#Collation_Settings">Collation 8051 Settings</a></td> 8052 </tr> 8053 <tr> 8054 <td>5.14.4 <a name="Rules" href="#Rules">Collation Rule 8055 Syntax</a></td> 8056 <td>3.4 <a href="tr35-collation.html#Rules">Collation Rule 8057 Syntax</a></td> 8058 </tr> 8059 <tr> 8060 <td>5.14.5 <a name="Orderings" href="#Orderings">Orderings</a></td> 8061 <td>3.5 <a href="tr35-collation.html#Orderings">Orderings</a></td> 8062 </tr> 8063 <tr> 8064 <td>5.14.6 <a name="Contractions" href="#Contractions">Contractions</a></td> 8065 <td>3.6 <a href="tr35-collation.html#Contractions">Contractions</a></td> 8066 </tr> 8067 <tr> 8068 <td>5.14.7 <a name="Expansions" href="#Expansions">Expansions</a></td> 8069 <td>3.7 <a href="tr35-collation.html#Expansions">Expansions</a></td> 8070 </tr> 8071 <tr> 8072 <td>5.14.8 <a name="Context_Before" href="#Context_Before">Context 8073 Before</a></td> 8074 <td>3.8 <a href="tr35-collation.html#Context_Before">Context 8075 Before</a></td> 8076 </tr> 8077 <tr> 8078 <td>5.14.9 <a name="Placing_Characters_Before_Others" 8079 href="#Placing_Characters_Before_Others">Placing Characters 8080 Before Others</a></td> 8081 <td>3.9 <a 8082 href="tr35-collation.html#Placing_Characters_Before_Others">Placing 8083 Characters Before Others</a></td> 8084 </tr> 8085 <tr> 8086 <td>5.14.10 <a name="Logical_Reset_Positions" 8087 href="#Logical_Reset_Positions">Logical Reset Positions</a></td> 8088 <td>3.10 <a href="tr35-collation.html#Logical_Reset_Positions">Logical 8089 Reset Positions</a></td> 8090 </tr> 8091 <tr> 8092 <td>5.14.11 <a name="Special_Purpose_Commands" 8093 href="#Special_Purpose_Commands">Special-Purpose Commands</a></td> 8094 <td>3.11 <a href="tr35-collation.html#Special_Purpose_Commands">Special-Purpose 8095 Commands</a></td> 8096 </tr> 8097 <tr> 8098 <td>5.14.12 <a name="Script_Reordering" 8099 href="#Script_Reordering">Collation Reordering</a></td> 8100 <td>3.12 <a href="tr35-collation.html#Script_Reordering">Collation 8101 Reordering</a></td> 8102 </tr> 8103 <tr> 8104 <td>5.14.13 <a name="Case_Parameters" href="#Case_Parameters">Case 8105 Parameters</a></td> 8106 <td>3.13 <a href="tr35-collation.html#Case_Parameters">Case 8107 Parameters</a></td> 8108 </tr> 8109 <tr> 8110 <td>Definition: <a name="UncasedExceptions" 8111 href="#UncasedExceptions">UncasedExceptions</a></td> 8112 <td>removed: see 3.13 <a 8113 href="tr35-collation.html#Case_Parameters">Case Parameters</a></td> 8114 </tr> 8115 <tr> 8116 <td>Definition: <a name="LowerExceptions" 8117 href="#LowerExceptions">LowerExceptions</a></td> 8118 <td>removed: see 3.13 <a 8119 href="tr35-collation.html#Case_Parameters">Case Parameters</a></td> 8120 </tr> 8121 <tr> 8122 <td>Definition: <a name="UpperExceptions" 8123 href="#UpperExceptions">UpperExceptions</a></td> 8124 <td>removed: see 3.13 <a 8125 href="tr35-collation.html#Case_Parameters">Case Parameters</a></td> 8126 </tr> 8127 <tr> 8128 <td>5.14.14 <a name="Visibility" href="#Visibility">Visibility</a></td> 8129 <td>3.14 <a href="tr35-collation.html#Visibility">Visibility</a></td> 8130 </tr> 8131 </table> 8132 8133 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 8134 <caption> 8135 <a href="#Part_6_Links" name="Part_6_Links">Part 6 Links</a>: <a 8136 href="tr35-info.html">Supplemental</a> (supplemental data) 8137 </caption> 8138 <tr> 8139 <th>Old section</th> 8140 <th>Section in new part</th> 8141 </tr> 8142 8143 <tr> 8144 <td>C <a name="Supplemental_Data" href="#Supplemental_Data">Supplemental 8145 Data</a></td> 8146 <td>Introduction <a href="tr35-info.html#Supplemental_Data">Supplemental 8147 Data</a></td> 8148 </tr> 8149 8150 <tr> 8151 <td>C.2 <a name="Supplemental_Territory_Containment" 8152 href="#Supplemental_Territory_Containment">Supplemental 8153 Territory Containment</a></td> 8154 <td>1.1 <a 8155 href="tr35-info.html#Supplemental_Territory_Containment">Supplemental 8156 Territory Containment</a></td> 8157 </tr> 8158 <tr> 8159 <td>C.4 <a name="Supplemental_Territory_Information" 8160 href="#Supplemental_Territory_Information">Supplemental 8161 Territory Information</a></td> 8162 <td>1.2 <a 8163 href="tr35-info.html#Supplemental_Territory_Information">Supplemental 8164 Territory Information</a></td> 8165 </tr> 8166 <tr> 8167 <td>C.3 <a name="Supplemental_Language_Data" 8168 href="#Supplemental_Language_Data">Supplemental Language Data</a></td> 8169 <td>2 <a href="tr35-info.html#Supplemental_Language_Data">Supplemental 8170 Language Data</a></td> 8171 </tr> 8172 <tr> 8173 <td>C.9 <a name="Supplemental_Code_Mapping" 8174 href="#Supplemental_Code_Mapping">Supplemental Code Mapping</a></td> 8175 <td>4 <a href="tr35-info.html#Supplemental_Code_Mapping">Supplemental 8176 Code Mapping</a></td> 8177 </tr> 8178 <tr> 8179 <td>C.12 <a name="Telephone_Code_Data" 8180 href="#Telephone_Code_Data">Telephone Code Data</a></td> 8181 <td>5 <a href="tr35-info.html#Telephone_Code_Data">Telephone 8182 Code Data</a></td> 8183 </tr> 8184 <tr> 8185 <td>C.14 <a name="Postal_Code_Validation" 8186 href="#Postal_Code_Validation">Postal Code Validation</a></td> 8187 <td>6 <a href="tr35-info.html#Postal_Code_Validation">Postal 8188 Code Validation</a></td> 8189 </tr> 8190 <tr> 8191 <td>C.8 <a name="Supplemental_Character_Fallback_Data" 8192 href="#Supplemental_Character_Fallback_Data">Supplemental 8193 Character Fallback Data</a></td> 8194 <td>7 <a 8195 href="tr35-info.html#Supplemental_Character_Fallback_Data">Supplemental 8196 Character Fallback Data</a></td> 8197 </tr> 8198 <tr> 8199 <td>M <a name="Coverage_Levels" href="#Coverage_Levels">Coverage 8200 Levels</a></td> 8201 <td>8 <a href="tr35-info.html#Coverage_Levels">Coverage 8202 Levels</a></td> 8203 </tr> 8204 <tr> 8205 <td>5.20 <a name="Metadata_Elements" 8206 href="tr35-info.html#Metadata_Elements">Metadata Elements</a></td> 8207 <td>10 <a href="tr35-info.html#Metadata_Elements">Locale 8208 Metadata Element</a></td> 8209 </tr> 8210 <tr> 8211 <td>P <a name="Appendix_Supplemental_Metadata" 8212 href="tr35-info.html#Appendix_Supplemental_Metadata">Supplemental 8213 Metadata</a><br> P.1 <a name="Supplemental_Alias_Information" 8214 href="tr35-info.html#Supplemental_Alias_Information">Supplemental 8215 Alias Information</a><br> P.2 <a 8216 name="Supplemental_Deprecated_Information" 8217 href="tr35-info.html#Supplemental_Deprecated_Information">Supplemental 8218 Deprecated Information</a><br> P.3 <a name="Default_Content" 8219 href="tr35-info.html#Default_Content">Default Content</a> 8220 </td> 8221 <td>9 <a href="tr35-info.html#Appendix_Supplemental_Metadata">Supplemental 8222 Metadata</a> <br> 9.1 <a 8223 href="tr35-info.html#Supplemental_Alias_Information">Supplemental 8224 Alias Information</a><br> 9.2 <a 8225 href="tr35-info.html#Supplemental_Deprecated_Information">Supplemental 8226 Deprecated Information</a><br> 9.3 <a 8227 href="tr35-info.html#Default_Content">Default Content</a> 8228 </td> 8229 </tr> 8230 </table> 8231 8232 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 8233 <caption> 8234 <a href="#Part_7_Links" name="Part_7_Links">Part 7 Links</a>: <a 8235 href="tr35-keyboards.html">Keyboards</a> (keyboard mappings) 8236 </caption> 8237 <tr> 8238 <th>Old section</th> 8239 <th>Section in new part</th> 8240 </tr> 8241 8242 <tr> 8243 <td>S <a name="Keyboards" href="#Keyboards">Keyboards</a></td> 8244 <td>1 <a href="tr35-keyboards.html#Keyboards">Keyboards</a></td> 8245 </tr> 8246 8247 <tr> 8248 <td>S <a name="Goals_and_Nongoals" href="#Goals_and_Nongoals">Goals 8249 and Nongoals</a></td> 8250 <td><a href="tr35-keyboards.html#Goals_and_Nongoals">Goals 8251 and Nongoals</a></td> 8252 </tr> 8253 8254 <tr> 8255 <td>S <a name="File_and_Dir_Structure" 8256 href="#File_and_Dir_Structure">File and Directory Structure</a></td> 8257 <td><a href="tr35-keyboards.html#File_and_Dir_Structure">File 8258 and Directory Structure</a></td> 8259 </tr> 8260 8261 <tr> 8262 <td>S <a name="Element_Heirarchy_Layout_File" 8263 href="#Element_Heirarchy_Layout_File">Element Hierarchy - 8264 Layout File</a></td> 8265 <td><a href="tr35-keyboards.html#Element_Heirarchy_Layout_File">Element 8266 Hierarchy - Layout File</a></td> 8267 </tr> 8268 8269 <tr> 8270 <td>S <a name="Element_Heirarchy_Platform_File" 8271 href="#Element_Heirarchy_Platform_File">Element Hierarchy - 8272 Platform File</a></td> 8273 <td><a 8274 href="tr35-keyboards.html#Element_Heirarchy_Platform_File">Element 8275 Hierarchy - Platform File</a></td> 8276 </tr> 8277 8278 <tr> 8279 <td>S <a name="Invariants" href="#Invariants">Invariants</a></td> 8280 <td><a href="tr35-keyboards.html#Invariants">Invariants</a></td> 8281 </tr> 8282 8283 <tr> 8284 <td>S <a name="Data_Sources" href="#Data_Sources">Data 8285 Sources</a></td> 8286 <td><a href="tr35-keyboards.html#Data_Sources">Data Sources</a></td> 8287 </tr> 8288 8289 <tr> 8290 <td>S <a name="Keyboard_IDs" href="#Keyboard_IDs">Keyboard 8291 IDs</a></td> 8292 <td><a href="tr35-keyboards.html#Keyboard_IDs">Keyboard IDs</a></td> 8293 </tr> 8294 8295 <tr> 8296 <td>S <a name="Platform_Behaviors_in_Edge_Cases" 8297 href="#Platform_Behaviors_in_Edge_Cases">Platform Behaviors in 8298 Edge Cases</a></td> 8299 <td><a 8300 href="tr35-keyboards.html#Platform_Behaviors_in_Edge_Cases">Platform 8301 Behaviors in Edge Cases</a></td> 8302 </tr> 8303 8304 <tr> 8305 <td>S <a name="Element_Keyboard" href="#Element_Keyboard">Element: 8306 keyboard</a></td> 8307 <td><a href="tr35-keyboards.html#Element_Keyboard">Element: 8308 keyboard</a></td> 8309 </tr> 8310 8311 <tr> 8312 <td>S <a name="Element_version" href="#Element_version">Element: 8313 version</a></td> 8314 <td><a href="tr35-keyboards.html#Element_version">Element: 8315 version</a></td> 8316 </tr> 8317 8318 <tr> 8319 <td>S <a name="Element_generation" href="#Element_generation">Element: 8320 generation</a></td> 8321 <td><a href="tr35-keyboards.html#Element_generation">Element: 8322 generation</a></td> 8323 </tr> 8324 8325 <tr> 8326 <td>S <a name="Element_names" href="#Element_names">Element: 8327 names</a></td> 8328 <td><a href="tr35-keyboards.html#Element_names">Element: 8329 names</a></td> 8330 </tr> 8331 8332 <tr> 8333 <td>S <a name="Element_name" href="#Element_name">Element: 8334 name</a></td> 8335 <td><a href="tr35-keyboards.html#Element_name">Element: 8336 name</a></td> 8337 </tr> 8338 8339 <tr> 8340 <td>S <a name="Element_settings" href="#Element_settings">Element: 8341 settings</a></td> 8342 <td><a href="tr35-keyboards.html#Element_settings">Element: 8343 settings</a></td> 8344 </tr> 8345 8346 <tr> 8347 <td>S <a name="Element_keyMap" href="#Element_keyMap">Element: 8348 keyMap</a></td> 8349 <td><a href="tr35-keyboards.html#Element_keyMap">Element: 8350 keyMap</a></td> 8351 </tr> 8352 8353 <tr> 8354 <td>S <a name="Element_map" href="#Element_map">Element: 8355 map</a></td> 8356 <td><a href="tr35-keyboards.html#Element_map">Element: map</a></td> 8357 </tr> 8358 8359 <tr> 8360 <td>S <a name="Element_transforms" href="#Element_transforms">Element: 8361 transforms</a></td> 8362 <td><a href="tr35-keyboards.html#Element_transforms">Element: 8363 transforms</a></td> 8364 </tr> 8365 8366 <tr> 8367 <td>S <a name="Element_transform" href="#Element_transform">Element: 8368 transform</a></td> 8369 <td><a href="tr35-keyboards.html#Element_transform">Element: 8370 transform</a></td> 8371 </tr> 8372 8373 <tr> 8374 <td>S <a name="Element_platform" href="#Element_platform">Element: 8375 platform</a></td> 8376 <td><a href="tr35-keyboards.html#Element_platform">Element: 8377 platform</a></td> 8378 </tr> 8379 8380 <tr> 8381 <td>S <a name="Element_hardwareMap" href="#Element_hardwareMap">Element: 8382 hardwareMap</a></td> 8383 <td><a href="tr35-keyboards.html#Element_hardwareMap">Element: 8384 hardwareMap</a></td> 8385 </tr> 8386 8387 <tr> 8388 <td>S <a name="Principles_for_Keyboard_Ids" 8389 href="#Principles_for_Keyboard_Ids">Principles for Keyboard Ids</a></td> 8390 <td><a href="tr35-keyboards.html#Principles_for_Keyboard_Ids">Principles 8391 for Keyboard Ids</a></td> 8392 </tr> 8393 8394 </table> 8395 <hr> 8396 <h2> 8397 <a name="References" href="#References">References</a> 8398 </h2> 8399 <table cellpadding="4" cellspacing="0" class="noborder" border="0"> 8400 <tr> 8401 <th class="noborder" width="148">Ancillary Information</th> 8402 <td class="noborder" width="730"><i>To properly localize, 8403 parse, and format data requires ancillary information, which is 8404 not expressed in Locale Data Markup Language. Some of the formats 8405 for values used in Locale Data Markup Language are constructed 8406 according to external specifications. The sources for this data 8407 and/or formats include the following:<br> 8408 </i></td> 8409 </tr> 8410 <tr> 8411 <td class="noborder" width="148">[<a name="Bugs" href="#Bugs">Bugs</a>] 8412 </td> 8413 <td class="noborder" width="730">CLDR Bug Reporting form<br> 8414 <a href="http://cldr.unicode.org/index/bug-reports"> 8415 http://cldr.unicode.org/index/bug-reports</a></td> 8416 </tr> 8417 <tr> 8418 <td class="noborder" width="148">[<a name="Charts" 8419 href="#Charts">Charts</a>] 8420 </td> 8421 <td class="noborder" width="730">The online code charts can be 8422 found at <a href="http://unicode.org/charts/">http://unicode.org/charts/</a> 8423 An index to character names with links to the corresponding chart 8424 is found at <a href="http://unicode.org/charts/charindex.html">http://unicode.org/charts/charindex.html</a> 8425 </td> 8426 </tr> 8427 <tr> 8428 <td class="noborder" width="148">[<a name="DUCET" href="#DUCET">DUCET</a>] 8429 </td> 8430 <td class="noborder" width="730">The Default Unicode Collation 8431 Element Table (DUCET)<br> For the base-level collation, of 8432 which all the collation tables in this document are tailorings.<br> 8433 <a 8434 href="http://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table">http://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table</a> 8435 </td> 8436 </tr> 8437 <tr> 8438 <td class="noborder" width="148">[<a name="FAQ" href="#FAQ">FAQ</a>] 8439 </td> 8440 <td class="noborder" valign="top" width="730">Unicode 8441 Frequently Asked Questions<br> <a 8442 href="http://unicode.org/faq/">http://unicode.org/faq/<br> 8443 </a><i>For answers to common questions on technical issues.</i> 8444 </td> 8445 </tr> 8446 <tr> 8447 <td class="noborder" width="148">[<a name="FCD" href="#FCD">FCD</a>] 8448 </td> 8449 <td class="noborder" width="730">As defined in UTN #5 Canonical 8450 Equivalences in Applications<br> <a 8451 href="http://unicode.org/notes/tn5/">http://unicode.org/notes/tn5/</a> 8452 </td> 8453 </tr> 8454 <tr> 8455 <td class="noborder" width="148">[<a name="Glossary" 8456 href="#Glossary">Glossary</a>] 8457 </td> 8458 <td class="noborder" width="730">Unicode Glossary<a 8459 href="http://unicode.org/glossary/"><br> 8460 http://unicode.org/glossary/<br> </a><i>For explanations of 8461 terminology used in this and other documents.</i></td> 8462 </tr> 8463 <tr> 8464 <td class="noborder" width="148">[<a name="JavaChoice" 8465 href="#JavaChoice">JavaChoice</a>] 8466 </td> 8467 <td class="noborder" width="730">Java ChoiceFormat<br> <a 8468 href="http://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html"> 8469 http://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html</a></td> 8470 </tr> 8471 <tr> 8472 <td class="noborder" width="148">[<a name="Olson" href="#Olson">Olson</a>] 8473 </td> 8474 <td class="noborder" width="730">The <i>TZ</i>ID Database (aka 8475 Olson timezone database)<br> Time zone and daylight savings 8476 information.<br> <a href="http://www.iana.org/time-zones">http://www.iana.org/time-zones</a><br> 8477 For archived data, see <br> <a 8478 href="ftp://ftp.iana.org/tz/releases/">ftp://ftp.iana.org/tz/releases/</a></td> 8479 </tr> 8480 <tr> 8481 <td class="noborder" width="148">[<a name="Reports" 8482 href="#Reports">Reports</a>] 8483 </td> 8484 <td class="noborder" width="730">Unicode Technical Reports<br> 8485 <a href="http://unicode.org/reports/">http://unicode.org/reports/<br> 8486 </a><i>For information on the status and development process for 8487 technical reports, and for a list of technical reports.</i></td> 8488 </tr> 8489 <tr> 8490 <td class="noborder" width="148">[<a name="Unicode" 8491 href="#Unicode">Unicode</a>] 8492 </td> 8493 <td class="noborder" width="730">The Unicode Consortium. <em>The 8494 Unicode Standard, Version 7.0.0</em>, (Mountain View, CA: The 8495 Unicode Consortium, 2014. ISBN 978-1-936213-09-2)<br> <a 8496 href="http://www.unicode.org/versions/Unicode7.0.0/"> 8497 http://www.unicode.org/versions/Unicode7.0.0/</a> 8498 </td> 8499 </tr> 8500 <tr> 8501 <td class="noborder" width="148">[<a name="Versions" 8502 href="#Versions">Versions</a>] 8503 </td> 8504 <td class="noborder" width="730">Versions of the Unicode 8505 Standard<br> <a href="http://www.unicode.org/versions/"> 8506 http://www.unicode.org/versions/</a><br> <i>For information 8507 on version numbering, and citing and referencing the Unicode 8508 Standard, the Unicode Character Database, and Unicode Technical 8509 Reports.</i> 8510 </td> 8511 </tr> 8512 <tr> 8513 <td class="noborder" width="148">[<a name="XPath" href="#XPath">XPath</a>] 8514 </td> 8515 <td class="noborder" width="730"><a 8516 href="http://www.w3.org/TR/xpath/"> http://www.w3.org/TR/xpath/</a></td> 8517 </tr> 8518 <tr> 8519 <th class="noborder" width="148">Other Standards</th> 8520 <td class="noborder" width="730"><i>Various standards 8521 define codes that are used as keys or values in Locale Data Markup 8522 Language. These include:</i></td> 8523 </tr> 8524 <tr> 8525 <td class="noborder">[<a name="BCP47" href="#BCP47">BCP47</a>] 8526 </td> 8527 <td class="noborder"><a 8528 href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt"> 8529 http://www.rfc-editor.org/rfc/bcp/bcp47.txt</a> 8530 <p> 8531 The Registry<br> <a 8532 href="http://www.iana.org/assignments/language-subtag-registry">http://www.iana.org/assignments/language-subtag-registry</a> 8533 </p></td> 8534 </tr> 8535 <tr> 8536 <td class="noborder" width="148">[<a name="ISO639" 8537 href="#ISO639">ISO639</a>] 8538 </td> 8539 <td class="noborder" width="730">ISO Language Codes<br> <a 8540 href="http://www.loc.gov/standards/iso639-2/">http://www.loc.gov/standards/iso639-2/</a><br> 8541 Actual List<br> <a 8542 href="http://www.loc.gov/standards/iso639-2/langcodes.html">http://www.loc.gov/standards/iso639-2/langcodes.html</a></td> 8543 </tr> 8544 <tr> 8545 <td class="noborder" width="148">[<a name="ISO1000" 8546 href="#ISO1000">ISO1000</a>] 8547 </td> 8548 <td class="noborder" width="730">ISO 1000: SI units and 8549 recommendations for the use of their multiples and of certain other 8550 units, International Organization for Standardization, 1992.<br> 8551 <a href="http://www.iso.org/iso/catalogue_detail?csnumber=5448">http://www.iso.org/iso/catalogue_detail?csnumber=5448</a> 8552 </td> 8553 </tr> 8554 <tr> 8555 <td class="noborder" width="148">[<a name="ISO3166" 8556 href="#ISO3166">ISO3166</a>] 8557 </td> 8558 <td class="noborder" width="730">ISO Region Codes<br> <a 8559 href="http://www.iso.org/iso/country_codes">http://www.iso.org/iso/country_codes</a><br> 8560 Actual List<br> <a 8561 href="http://www.iso.org/iso/country_names_and_code_elements">http://www.iso.org/iso/country_names_and_code_elements</a></td> 8562 </tr> 8563 <tr> 8564 <td class="noborder" width="148">[<a name="ISO4217" 8565 href="#ISO4217">ISO4217</a>] 8566 </td> 8567 <td class="noborder" width="730">ISO Currency Codes<br> <a 8568 href="http://www.iso.org/iso/home/standards/currency_codes.htm">http://www.iso.org/iso/home/standards/currency_codes.htm</a> 8569 <p> 8570 <i>(Note that as of this point, there are significant problems 8571 with this list. The supplemental data file contains the best 8572 compendium of currency information available.)</i> 8573 </p> 8574 </td> 8575 </tr> 8576 <tr> 8577 <td class="noborder" width="148">[<a name="ISO8601" 8578 href="#ISO8601">ISO8601</a>] 8579 </td> 8580 <td class="noborder" width="730">ISO Date and Time Format<br> 8581 <a href="http://www.iso.org/iso/iso8601">http://www.iso.org/iso/iso8601</a> 8582 </td> 8583 </tr> 8584 <tr> 8585 <td class="noborder" width="148">[<a name="ISO15924" 8586 href="#ISO15924">ISO15924</a>] 8587 </td> 8588 <td class="noborder" width="730">ISO Script Codes<br> <a 8589 href="http://www.unicode.org/iso15924/standard/index.html">http://www.unicode.org/iso15924/standard/index.html</a><br> 8590 Actual List<br> <a 8591 href="http://www.unicode.org/iso15924/codelists.html">http://www.unicode.org/iso15924/codelists.html</a></td> 8592 </tr> 8593 <tr> 8594 <td class="noborder" width="148">[<a name="LOCODE" 8595 href="#LOCODE">LOCODE</a>] 8596 </td> 8597 <td class="noborder" width="730">United Nations Code for Trade 8598 and Transport Locations, commonly known as "UN/LOCODE"<br> <a 8599 href="http://www.unece.org/cefact/locode/welcome.html"> 8600 http://www.unece.org/cefact/locode/welcome.html</a><br> Download 8601 at: <a 8602 href="http://www.unece.org/cefact/codesfortrade/codes_index.htm"> http://www.unece.org/cefact/codesfortrade/codes_index.htm</a> 8603 </td> 8604 </tr> 8605 <tr> 8606 <td class="noborder" width="148">[<a name="RFC6067" 8607 href="#RFC6067">RFC6067</a>] 8608 </td> 8609 <td class="noborder" width="730">BCP 47 Extension U<br> <a 8610 href="http://www.ietf.org/rfc/rfc6067.txt">http://www.ietf.org/rfc/rfc6067.txt</a></td> 8611 </tr> 8612 <tr> 8613 <td class="noborder" width="148">[<a name="RFC6497" 8614 href="#RFC6497">RFC6497</a>] 8615 </td> 8616 <td class="noborder" width="730">BCP 47 Extension T - 8617 Transformed Content<br> <a 8618 href="http://www.ietf.org/rfc/rfc6497.txt">http://www.ietf.org/rfc/rfc6497.txt</a> 8619 </td> 8620 </tr> 8621 <tr> 8622 <td class="noborder" width="148">[<a name="UNM49" href="#UNM49">UNM49</a>] 8623 </td> 8624 <td class="noborder" width="730">UN M.49: UN Statistics 8625 Division 8626 <p> 8627 Country or area & region codes<br> <a 8628 href="http://unstats.un.org/unsd/methods/m49/m49.htm">http://unstats.un.org/unsd/methods/m49/m49.htm</a> 8629 </p> 8630 <p> 8631 Composition of macro geographical (continental) regions, 8632 geographical sub-regions, and selected economic and other 8633 groupings<br> <a 8634 href="http://unstats.un.org/unsd/methods/m49/m49regin.htm">http://unstats.un.org/unsd/methods/m49/m49regin.htm</a> 8635 </p> 8636 </td> 8637 </tr> 8638 <tr> 8639 <td class="noborder" width="148">[<a name="XMLSchema" 8640 href="#XMLSchema">XML Schema</a>] 8641 </td> 8642 <td class="noborder" width="730">W3C XML Schema<br> <a 8643 href="http://www.w3.org/XML/Schema">http://www.w3.org/XML/Schema</a></td> 8644 </tr> 8645 <tr> 8646 <th class="noborder" width="148">General</th> 8647 <td class="noborder" width="730"><i>The following are 8648 general references from the text:</i></td> 8649 </tr> 8650 <tr> 8651 <td class="noborder" width="148">[<a name="ByType" 8652 href="#ByType">ByType</a>] 8653 </td> 8654 <td class="noborder" width="730">CLDR Comparison Charts<br> 8655 <a href="http://www.unicode.org/cldr/comparison_charts.html">http://www.unicode.org/cldr/comparison_charts.html</a></td> 8656 </tr> 8657 <tr> 8658 <td class="noborder" width="148">[<a name="Calendars" 8659 href="#Calendars">Calendars</a>] 8660 </td> 8661 <td class="noborder" width="730">Calendrical Calculations: The 8662 Millennium Edition by Edward M. Reingold, Nachum Dershowitz; 8663 Cambridge University Press; Book and CD-ROM edition (July 1, 2001); 8664 ISBN: 0521777526. Note that the algorithms given in this book are 8665 copyrighted.</td> 8666 </tr> 8667 <tr> 8668 <td class="noborder" width="148">[<a name="Comparisons" 8669 href="#Comparisons">Comparisons</a>] 8670 </td> 8671 <td class="noborder" width="730">Comparisons between locale 8672 data from different sources<br> <a 8673 href="http://unicode.org/cldr/data/diff/">http://unicode.org/cldr/data/diff/</a> 8674 </td> 8675 </tr> 8676 <tr> 8677 <td class="noborder" width="148">[<a name="CurrencyInfo" 8678 href="#CurrencyInfo">CurrencyInfo</a>] 8679 </td> 8680 <td class="noborder" width="730">UNECE Currency Data<br> <a 8681 href="http://www.currency-iso.org/en/home/tables.html">http://www.currency-iso.org/en/home/tables.html</a></td> 8682 </tr> 8683 <tr> 8684 <td class="noborder" width="148">[<a name="DataFormats" 8685 href="#DataFormats">DataFormats</a>] 8686 </td> 8687 <td class="noborder" width="730">CLDR Translation Guidelines<br> 8688 <a href="http://cldr.unicode.org/translation">http://cldr.unicode.org/translation</a></td> 8689 </tr> 8690 <tr> 8691 <td class="noborder" width="148">[<a name="LDML" href="#LDML">Example</a>] 8692 </td> 8693 <td class="noborder" width="730">A sample in Locale Data Markup 8694 Language<br> <a 8695 href="http://unicode.org/cldr/dtd/1.1/ldml-example.xml">http://unicode.org/cldr/dtd/1.1/ldml-example.xml</a> 8696 </td> 8697 </tr> 8698 <tr> 8699 <td class="noborder" width="148">[<a name="ICUCollation" 8700 href="#ICUCollation">ICUCollation</a>] 8701 </td> 8702 <td class="noborder" width="730">ICU rule syntax<br> <a 8703 href="http://www.icu-project.org/userguide/Collate_Customization.html">http://www.icu-project.org/userguide/Collate_Customization.html</a></td> 8704 </tr> 8705 <tr> 8706 <td class="noborder" width="148">[<a name="ICUTransforms" 8707 href="#ICUTransforms">ICUTransforms</a>] 8708 </td> 8709 <td class="noborder" width="730">Transforms<br> <a 8710 href="http://www.icu-project.org/userguide/Transformations.html">http://www.icu-project.org/userguide/Transformations.html</a><br> 8711 Transforms Demo<br> <a 8712 href="http://demo.icu-project.org/icu-bin/translit/">http://demo.icu-project.org/icu-bin/translit/</a></td> 8713 </tr> 8714 <tr> 8715 <td class="noborder" width="148">[<a name="ICUUnicodeSet" 8716 href="#ICUUnicodeSet">ICUUnicodeSet</a>] 8717 </td> 8718 <td class="noborder" width="730">ICU UnicodeSet<br> <a 8719 href="http://www.icu-project.org/userguide/unicodeSet.html">http://www.icu-project.org/userguide/unicodeSet.html<br> 8720 </a>API<br> <a 8721 href="http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html">http://www.icu-project.org/apiref/icu4j/com/ibm/icu/text/UnicodeSet.html</a></td> 8722 </tr> 8723 <tr> 8724 <td class="noborder" width="148">[<a name="ITUE164" 8725 href="#ITUE164">ITUE164</a>] 8726 </td> 8727 <td class="noborder" width="730">International 8728 Telecommunication Union: List Of ITU Recommendation E.164 Assigned 8729 Country Codes<br> available at <a 8730 href="http://www.itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2">http://www.itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2</a> 8731 </td> 8732 </tr> 8733 <tr> 8734 <td class="noborder" width="148">[<a name="LocaleExplorer" 8735 href="#LocaleExplorer">LocaleExplorer</a>] 8736 </td> 8737 <td class="noborder" width="730">ICU Locale Explorer<br> <a 8738 href="http://demo.icu-project.org/icu-bin/locexp">http://demo.icu-project.org/icu-bin/locexp</a></td> 8739 </tr> 8740 <tr> 8741 <td class="noborder" width="148">[<a name="localeProject" 8742 href="#localeProject">LocaleProject</a>] 8743 </td> 8744 <td class="noborder" width="730">Common Locale Data Repository 8745 Project<br> <a href="http://unicode.org/cldr/">http://unicode.org/cldr/</a> 8746 </td> 8747 </tr> 8748 <tr> 8749 <td class="noborder" width="148">[<a name="NamingGuideline" 8750 href="#NamingGuideline">NamingGuideline</a>] 8751 </td> 8752 <td class="noborder" width="730">OpenI18N Locale Naming 8753 Guideline<br> formerly at 8754 http://www.openi18n.org/docs/text/LocNameGuide-V10.txt 8755 </td> 8756 </tr> 8757 <tr> 8758 <td class="noborder" width="148">[<a name="RBNF" href="#RBNF">RBNF</a>] 8759 </td> 8760 <td class="noborder" width="730">Rule-Based Number Format<br> 8761 <a 8762 href="http://www.icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html">http://www.icu-project.org/apiref/icu4c/classRuleBasedNumberFormat.html#_details</a></td> 8763 </tr> 8764 <tr> 8765 <td class="noborder" width="148">[<a name="RBBI" href="#RBBI">RBBI</a>] 8766 </td> 8767 <td class="noborder" width="730">Rule-Based Break Iterator<br> 8768 <a 8769 href="http://www.icu-project.org/userguide/boundaryAnalysis.html">http://www.icu-project.org/userguide/boundaryAnalysis.html</a></td> 8770 </tr> 8771 <tr> 8772 <td class="noborder" width="148">[<a name="RFC5234" 8773 href="#RFC5234">RFC5234</a>] 8774 </td> 8775 <td class="noborder" width="730">RFC5234 Augmented BNF for 8776 Syntax Specifications: ABNF<br> <a 8777 href="http://www.ietf.org/rfc/rfc5234.txt">http://www.ietf.org/rfc/rfc5234.txt</a> 8778 </td> 8779 </tr> 8780 <tr> 8781 <td class="noborder" width="148">[<a name="UCAChart" 8782 href="#UCAChart">UCAChart</a>] 8783 </td> 8784 <td class="noborder" width="730">Collation Chart<a 8785 href="http://unicode.org/charts/collation/"><br> 8786 http://unicode.org/charts/collation/</a></td> 8787 </tr> 8788 <tr> 8789 <td class="noborder" width="148">[<a name="UTCInfo" 8790 href="#UTCInfo">UTCInfo</a>] 8791 </td> 8792 <td class="noborder" width="730">NIST Time and Frequency 8793 Division Home Page<br> <a href="http://tf.nist.gov/">http://tf.nist.gov/<br> 8794 </a>U.S. Naval Observatory: What is Universal Time?<br> <a 8795 href="http://aa.usno.navy.mil/faq/docs/UT.php">http://aa.usno.navy.mil/faq/docs/UT.php</a> 8796 </td> 8797 </tr> 8798 <tr> 8799 <td class="noborder" width="148">[<a name="WindowsCulture" 8800 href="#WindowsCulture">WindowsCulture</a>] 8801 </td> 8802 <td class="noborder" width="730">Windows Culture Info 8803 (with mappings from [<a href="#BCP47">BCP47</a>]-style codes 8804 to LCIDs)<br> <a 8805 href="http://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo(vs.71).aspx">http://msdn2.microsoft.com/en-us/library/system.globalization.cultureinfo(vs.71).aspx</a> 8806 </td> 8807 </tr> 8808 </table> 8809 <h2> 8810 <a name="Acknowledgments" href="#Acknowledgments">Acknowledgments</a> 8811 </h2> 8812 <p>Special thanks to the following people for their continuing 8813 overall contributions to the CLDR project, and for their specific 8814 contributions in the following areas. These descriptions only touch 8815 on the many contributions that they have made.</p> 8816 <ul> 8817 <li><a 8818 href="https://plus.google.com/114199149796022210033?rel=author">Mark 8819 Davis</a> for creating the initial version of LDML, and adding to and 8820 maintaining this specification, and for his work on the LDML code 8821 and tests, much of the supplemental data and overall structure, and 8822 transforms and keyboards.</li> 8823 <li>John Emmons for the POSIX conversion tool and metazones.</li> 8824 <li>Deborah Goldsmith for her contributions to LDML architecture 8825 and this specification.</li> 8826 <li>Chris Hansten for coordinating and managing data submissions 8827 and vetting.</li> 8828 <li>Erkki Kolehmainen and his team for their work on Finnish.</li> 8829 <li>Steven R. Loomis for development of the survey tool and 8830 database management.</li> 8831 <li>Peter Nugent for his contributions to the POSIX tool and 8832 from Open Office, and for coordinating and managing data submissions 8833 and vetting.</li> 8834 <li>George Rhoten for his work on currencies.</li> 8835 <li>Roozbeh Pournader (روزبه پورنادر) for his work on South 8836 Asian countries.</li> 8837 <li>Ram Viswanadha (రఘురామ్ విశ్వనాధ) for all of his work on 8838 LDML code and data integration, and for coordinating and managing 8839 data submissions and vetting.</li> 8840 <li>Vladimir Weinstein (Владимир Вајнштајн) for his work on 8841 collation.</li> 8842 <li>Yoshito Umaoka (馬岡 由人) for his work on the timezone 8843 architecture.</li> 8844 <li>Rick McGowan for his work gathering language, script and 8845 region data.</li> 8846 <li>Xiaomei Ji (吉晓梅) for her work on time intervals and plural 8847 formatting.</li> 8848 <li>David Bertoni for his contributions to the conversion tools.</li> 8849 <li>Mike Tardif for reviewing this specification and for 8850 coordinating and vetting data submissions.</li> 8851 <li>Peter Edberg for work on this specification, telephone code 8852 data, monthPatterns, cyclicNameSets and contextTransforms.</li> 8853 <li>Raymond Wainman and Cibu Johny for their work on keyboards.</li> 8854 <li>Jennifer Chye for her contributions to the conversion tools.</li> 8855 <li><a 8856 href="https://plus.google.com/117587389715494866571?rel=author">Markus 8857 Scherer</a> for a major rewrite of Part 5, Collation.</li> 8858 </ul> 8859 <p> 8860 Other contributors to CLDR are listed on the <a 8861 href="http://www.unicode.org/cldr/">CLDR Project Page</a>. 8862 </p> 8863 8864 <h2> 8865 <a name="Modifications" href="#Modifications">Modifications</a> 8866 </h2> 8867 8868<p><b>Revision 53</b></p> 8869<p><strong>Part 1: <a href="tr35.html#Contents">Core</a> (languages, 8870 locales, basic structure) 8871 </strong></p> 8872<ul> 8873 <li><strong>Section 3.2 <a 8874 href="#Unicode_locale_identifier">Unicode Locale Identifier</a></strong> 8875[<a href="http://unicode.org/cldr/trac/ticket/11435">#11435</a>] 8876[<a href="http://unicode.org/cldr/trac/ticket/11434">#11434</a>] 8877<ul> 8878 <li>Fixed cases of "-" in the syntax that should have been <em>sep</em>, and note that "-" is the canonical (preferred) form.</li> 8879 <li>Fixed "u" and "t" in the syntax to [uU] and [tT], resp., to reflect that case is ignored when parsing.</li> 8880 <li>Included specific syntax rather than just noting "Although not shown in the syntax above, Unicode locale identifiers may also have [BCP47] extensions (other than "u" and "t") and private use subtags."</li> 8881 <li>Reformated and fleshed out the canonical form description; listed where CLDR uses non-canonical forms.</li> 8882 <li>Added missing details about how Unicode Locale Identifiers differ from BCP 47, and how to convert between them.</li> 8883 </ul> 8884 </li> 8885 <li><strong>Section 3.3 <a href="#BCP_47_Conformance">BCP 8886 47 Conformance</a> </strong> 8887<ul> 8888 <li>Reorganized for clarity, introduced new terms <em>Unicode BCP 47 locale identifier</em> and <em>Unicode CLDR locale identifier</em>. [<a href="http://unicode.org/cldr/trac/ticket/11451">#11451</a>]</li> 8889 </ul> 8890 </li> 8891 <li><strong>Section 3.3.1 <a href="http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#BCP_47_Language_Tag_Conversion">BCP 47 Language Tag Conversion</a> 8892 [<a href="http://unicode.org/cldr/trac/ticket/11451">#11451</a>]</strong> 8893 <ul> 8894 <li>Now handles private-use extensions and grandfathered tags.</li> 8895 <li>Added more examples.</li> 8896 <li>Separated into three conversions. 8897 <ul> 8898 <li> <a href="http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to Unicode BCP 47 Locale Identifier</a> </li> 8899 <li> <a href="http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Unicode_Locale_Identifier_CLDR_to_BCP_47">Unicode Locale Identifier: CLDR to BCP 47</a> </li> 8900 <li> <a href="http://unicode.org/repos/cldr/trunk/specs/ldml/tr35.html#Unicode_Locale_Identifier_BCP_47_to_CLDR">Unicode Locale Identifier: BCP 47 to CLDR</a> </li> 8901 </ul> 8902 </li> 8903 </ul> 8904 </li> 8905 <li><strong>Section 3.4 8906 <a href="#Field_Definitions">Language Identifier Field Definitions </a> 8907 </strong> 8908 <ul> 8909 <li>Added another macrolanguage example ku (used for kmr), and link to Aliases chart 8910 [<a href="http://unicode.org/cldr/trac/ticket/11470">#11470</a>]</li> 8911 <li>Documented special language subtags mis, mul, zxx [<a href="http://unicode.org/cldr/trac/ticket/11451">#11451</a>]</li> 8912 <li>Added special script code Qaag [<a href="http://unicode.org/cldr/trac/ticket/11408">#11408</a>]</li> 8913 <li>Documented special region subtags XA and XB [<a href="http://unicode.org/cldr/trac/ticket/11451">#11451</a>]</li> 8914 </ul> 8915 </li> 8916 <li><strong>Section 3.5.3 <a href="#Private_Use">Private Use Codes</a></strong> 8917 <ul> 8918 <li>Adjusted table to move Qaag, XA, and XB into <em>defined</em>. The XA and XB were correct in the identity file (a change in a previous release), but had not been added to that table. [<a href="http://unicode.org/cldr/trac/ticket/11408">#11408</a>]</li> 8919 </ul> 8920 </li> 8921 <li><strong>Section 3.6.4 <a href="#Unicode_Locale_Extension_Data_Files" >U Extension Data Files</a> 8922 </strong> 8923 <ul> 8924 <li>Qualified valueType, since a key's value may be empty (if "true"). [<a href="http://unicode.org/cldr/trac/ticket/11408">#11408</a>]</li> 8925 </ul> 8926 </li> 8927 <li><strong>Section 3.6.5.1 <a href="#Validity">Validity</a></strong> 8928 <ul> 8929 <li>Softened the requirement that there be region code matching the first 2 letters of the subdivision code. That was needlessly strict, and introduces a dependency on <em>likely subtags</em> that should not be there. [<a href="http://unicode.org/cldr/trac/ticket/11397">#11397</a>]</li> 8930 </ul> 8931 </li> 8932 <li><strong>Section 4.2.6 <a 8933 href="#Inheritance_vs_Related">Inheritance vs Related Information</a> 8934 </strong> 8935 <ul> 8936 <li>Added table to explain the relationship between Inheritance, DefaultContent, LikelySubtags, and LocaleMatching.</li> 8937 </ul> 8938 </li> 8939 <li><strong>Section 5.3.3 8940 <a href="#Unicode_Sets">Unicode Sets</a> 8941 </strong> 8942 <ul> 8943 <li>Clarified the relation between UnicodeSet and <a 8944 href="http://www.unicode.org/reports/tr41/#UTS18">UTS #18</a> [<a href="http://unicode.org/cldr/trac/ticket/11232">#11232</a>]</li> 8945 </ul> 8946 </li> 8947 </ul> 8948<p><strong>Part 2: <a href="tr35-general.html#Contents">General</a> 8949 (display names & transforms, etc.) 8950 </strong></p> 8951<ul> 8952 <li><strong>Section 6 <a href="tr35-general.html#Unit_Elements">Unit Elements</a> </strong> 8953 <ul> 8954 <li>Added <displayName> element for <coordinateUnit>. 8955 [<a href="http://unicode.org/cldr/trac/ticket/9986">#9986</a>]</li> 8956 <li>Noted that unitPatterns can use explicit count values “0” and “1”. 8957 [<a href="http://unicode.org/cldr/trac/ticket/10922">#10922</a>]</li> 8958 <li>Defined the syntax of unit identifiers [<a href="http://unicode.org/cldr/trac/ticket/11271">#11271</a>]</li> 8959 <li>Added several new units: percent and permille, petabyte, and atmosphere. 8960 [<a href="http://unicode.org/cldr/trac/ticket/10632">#10632</a>] 8961 [<a href="http://unicode.org/cldr/trac/ticket/10410">#10410</a>] 8962 [<a href="http://unicode.org/cldr/trac/ticket/10600">#10600</a>]</li> 8963 </ul> 8964 </li> 8965 <li><strong>Section 10.1.1 <a href="tr35-general.html#Pivots">Pivots</a></strong> 8966 <ul> 8967 <li>Described the use of private use characters in Interindic. [<a href="http://unicode.org/cldr/trac/ticket/10962">#10962</a>]</li> 8968 </ul> 8969 </li> 8970 </ul> 8971<p><strong>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a> 8972 (number & currency formatting) 8973 </strong></p> 8974<ul> 8975 <li><strong>Section 2.5 <a href="tr35-numbers.html#Miscellaneous_Patterns">Miscellaneous Patterns</a></strong> 8976 <ul> 8977 <li>Documented <strong>approximately</strong> and <strong>atMost</strong>. [<a href="http://unicode.org/cldr/trac/ticket/11354">#11354</a>]</li> 8978 </ul> 8979 </li> 8980 <li><strong>Section 3.2 <a 8981 href="tr35-numbers.html##Special_Pattern_Characters">Special Pattern Characters</a></strong><a 8982 href="tr35-numbers.html##Special_Pattern_Characters"></a> 8983 <ul> 8984 <li>Documented edge cases for negative subpatterns (and whitespace) [<a href="http://unicode.org/cldr/trac/ticket/10703">#10703</a>]</li> 8985 </ul> 8986 </li> 8987 <li><strong>Section 3.4 <a href="tr35-numbers.html#sci">Scientific Notation</a> </strong> 8988 <ul> 8989 <li>Specify the special formats used for the integer parts. [<a href="http://unicode.org/cldr/trac/ticket/10103">#10103</a>]</li> 8990 </ul> 8991 </li> 8992 <li><strong>Section 5 <a href="tr35-numbers.html#Language_Plural_Rules">Language Plural Rules</a></strong> 8993 <ul> 8994 <li>Added a new section <a href="tr35-numbers.html#Explicit_0_1_rules">Explicit 0 and 8995 1 rules</a> covering the language-independent explicit plural cases “0” and “1”. 8996 [<a href="http://unicode.org/cldr/trac/ticket/10922">#10922</a>]</li> 8997 </ul> 8998 </li> 8999 </ul> 9000 9001<p><strong>Part 4: <a href="tr35-dates.html#Contents">Dates</a> (date, 9002 time, time zone formatting) 9003 </strong></p> 9004<ul> 9005 <li><strong>Section 2.6.3 <a href="tr35-dates.html#intervalFormats">Element intervalFormats</a></strong> 9006 <ul> 9007 <li>Described how to synthesize intervalFormatItems for skeletons that combine date and time fields. 9008 [<a href="http://unicode.org/cldr/trac/ticket/10133">#10133</a>] </li> 9009 </ul> 9010 </li> 9011 <li><strong>Section 4.4 <a href="tr35-dates.html#Time_Data">Time Data</a></strong> 9012 <ul> 9013 <li>Documented the relation between @allowed and @preferred. [<a href="http://unicode.org/cldr/trac/ticket/9930">#9930</a>]</li> 9014 </ul> 9015 </li> 9016</ul> 9017<p><strong>Part 5: <a href="tr35-collation.html#Contents">Collation</a> 9018 (sorting, searching, grouping) 9019 </strong></p> 9020<ul> 9021 <li><em>no changes</em></li> 9022</ul> 9023<p><strong>Part 6: <a href="tr35-info.html#Contents">Supplemental</a> 9024 (supplemental data) 9025 </strong></p> 9026<ul> 9027 <li> <strong>Section 4 <a href="tr35-info.html#Supplemental_Code_Mapping">Supplemental 9028 Code Mapping</a></strong> 9029 <ul> 9030 <li>For the element <territoryCodes>, deprecated the internet attribute. 9031 [<a href="http://unicode.org/cldr/trac/ticket/11072">#11072</a>]</li> 9032 </ul> 9033 </li> 9034 9035 <li> <strong>Section 5 <a href="tr35-info.html#Telephone_Code_Data">Telephone 9036 Code Data</a></strong> 9037 <ul> 9038 <li>Now deprecated, and data removed. [<a href="http://unicode.org/cldr/trac/ticket/10383">#10383</a>]</li> 9039 </ul> 9040 </li> 9041 9042 <li> <strong>Section 9.3 <a href="tr35-info.html#Default_Content">Default 9043 Content</a></strong> 9044 <ul> 9045 <li>Added pointer to <strong>Section 4.2.6 <a 9046 href="#Inheritance_vs_Related">Inheritance vs Related Information</a> </strong></li> 9047 </ul> 9048 </li> 9049</ul> 9050<p><strong>Part 7: <a href="tr35-keyboards.html#Contents">Keyboards</a> 9051 (keyboard mappings) 9052 </strong> </p> 9053 <ul> 9054 <li><em>no changes</em></li> 9055</ul> 9056 9057 9058<p> </p> 9059 9060 9061 <p>Modifications in previous versions are listed in those respective versions. Click on <strong>Previous Version</strong> in the header until you get to the desired version.</p> 9062 9063 <hr> 9064 <p class="copyright"> 9065 Copyright © 2001–2018 Unicode, Inc. All 9066 Rights Reserved. The Unicode Consortium makes no expressed or implied 9067 warranty of any kind, and assumes no liability for errors or 9068 omissions. No liability is assumed for incidental and consequential 9069 damages in connection with or arising out of the use of the 9070 information or programs contained or accompanying this technical 9071 report. The Unicode <a href="http://unicode.org/copyright.html">Terms 9072 of Use</a> apply. 9073 </p> 9074 <p class="copyright">Unicode and the Unicode logo are trademarks 9075 of Unicode, Inc., and are registered in some jurisdictions.</p> 9076 </div> 9077 9078</body> 9079 9080</html> 9081