1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 2"https://www.w3.org/TR/html4/loose.dtd"> 3<html> 4<head> 5 <meta name="generator" content= 6 "HTML Tidy for HTML5 for Apple macOS version 5.6.0"> 7 <meta http-equiv="Content-Type" content= 8 "text/html; charset=utf-8"> 9 <meta http-equiv="Content-Language" content="en-us"> 10 <link rel="stylesheet" href= 11 "../reports.css" type="text/css"> 12 <title>UTS #35: Unicode Locale Data Markup Language</title> 13 <style type="text/css"> 14 <!-- 15 .dtd { 16 font-family: monospace; 17 font-size: 90%; 18 background-color: #CCCCFF; 19 border-style: dotted; 20 border-width: 1px; 21 } 22 23 .xmlExample { 24 font-family: monospace; 25 font-size: 80% 26 } 27 28 .blockedInherited { 29 font-style: italic; 30 font-weight: bold; 31 border-style: dashed; 32 border-width: 1px; 33 background-color: #FF0000 34 } 35 36 .inherited { 37 font-weight: bold; 38 border-style: dashed; 39 border-width: 1px; 40 background-color: #00FF00 41 } 42 43 .element { 44 font-weight: bold; 45 color: red; 46 } 47 48 .attribute { 49 font-weight: bold; 50 color: maroon; 51 } 52 53 .attributeValue { 54 font-weight: bold; 55 color: blue; 56 } 57 58 li, p { 59 margin-top: 0.5em; 60 margin-bottom: 0.5em 61 } 62 63 h2, h3, h4, h5, table { 64 margin-top: 1.5em; 65 margin-bottom: 0.5em; 66 } 67 68 h5 { 69 font-size: medium; 70 font-style: italic 71 } 72 --> 73 </style> 74</head> 75<body> 76 <table class="header" width="100%"> 77 <tr> 78 <td class="icon"><a href="https://unicode.org"><img alt= 79 "[Unicode]" src="../logo60s2.gif" 80 width="34" height="33" style= 81 "vertical-align: middle; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px; border-top-width: 0px;"></a> 82 <a class="bar" href= 83 "https://www.unicode.org/reports/">Technical Reports</a></td> 84 </tr> 85 <tr> 86 <td class="gray"> </td> 87 </tr> 88 </table> 89 <div class="body"> 90 <h2 style="text-align: center">Unicode Technical Standard #35</h2> 91 <h1>Unicode Locale Data Markup Language (LDML)</h1> 92 <!-- At least the first row of this header table should be identical across the parts of this UTS. --> 93 <table border="1" cellpadding="2" cellspacing="0" class="wide"> 94 <tr> 95 <td>Version</td> 96 <td>38</td> 97 </tr> 98 <tr> 99 <td>Editors</td> 100 <td>Mark Davis (<a href="mailto:markdavis@google.com">markdavis@google.com</a>) and 101 <a href="tr35.html#Acknowledgments">other CLDR committee members</a></td> 102 </tr> 103 <tr> 104 <td>Date</td> 105 <td>2020-10-23</td> 106 </tr> 107 <tr> 108 <!-- This link must be made live when posting the final version but is disabled during proposed update stage. --> 109 <td>This Version</td> 110 <td> 111 <a href="https://www.unicode.org/reports/tr35/tr35-61/tr35.html"> 112 https://www.unicode.org/reports/tr35/tr35-61/tr35.html</a></td> 113 </tr> 114 <tr> 115 <td>Previous Version</td> 116 <td> 117 <a href="https://www.unicode.org/reports/tr35/tr35-60/tr35.html"> 118 https://www.unicode.org/reports/tr35/tr35-60/tr35.html</a></td> 119 </tr> 120 <tr> 121 <td>Latest Version</td> 122 <td><a href= 123 "https://www.unicode.org/reports/tr35/">https://www.unicode.org/reports/tr35/</a></td> 124 </tr> 125 <tr> 126 <td>Corrigenda</td> 127 <td><a href= 128 "http://unicode.org/cldr/corrigenda.html">http://unicode.org/cldr/corrigenda.html</a></td> 129 </tr> 130 <tr> 131 <td>Latest Proposed Update</td> 132 <td><a href= 133 "https://www.unicode.org/reports/tr35/proposed.html">https://www.unicode.org/reports/tr35/proposed.html</a></td> 134 </tr> 135 <tr> 136 <td>Namespace</td> 137 <td><a href= 138 "https://unicode.org/cldr/">https://unicode.org/cldr/</a></td> 139 </tr> 140 <tr> 141 <td>DTDs</td> 142 <td><a href="https://github.com/unicode-org/cldr/tree/maint/maint-38/common/dtd"> 143 http://unicode.org/cldr/dtd/38/</a></td> 144 </tr> 145 <tr> 146 <td>Revision</td> 147 <td><a href="#Modifications">61</a></td> 148 </tr> 149 </table> 150 <h3><i>Summary</i></h3> 151 <p>This document describes an XML format (<i>vocabulary</i>) 152 for the exchange of structured locale data. This format is used 153 in the <a href="https://unicode.org/cldr/">Unicode Common Locale 154 Data Repository</a>.</p> 155 <h3><i>Status</i></h3> 156 157 <!-- NOT YET APPROVED 158 <p> 159 <i class="changed">This is a<b><font color="#ff3333"> 160 draft </font></b>document which may be updated, replaced, or superseded by 161 other documents at any time. Publication does not imply endorsement 162 by the Unicode Consortium. This is not a stable document; it is 163 inappropriate to cite this document as other than a work in 164 progress. 165 </i> 166 </p> 167 END NOT YET APPROVED --> 168 <!-- APPROVED --> 169 <p><i>This document has been reviewed by Unicode members and 170 other interested parties, and has been approved for publication 171 by the Unicode Consortium. This is a stable document and may be 172 used as reference material or cited as a normative reference by 173 other specifications.</i></p> 174 <!-- END APPROVED --> 175 176 <blockquote> 177 <p><i><b>A Unicode Technical Standard (UTS)</b> is an 178 independent specification. Conformance to the Unicode 179 Standard does not imply conformance to any UTS.</i></p> 180 </blockquote> 181 <p><i>Please submit corrigenda and other comments with the CLDR 182 bug reporting form [<a href= 183 "http://cldr.unicode.org/index/bug-reports">Bugs</a>]. Related 184 information that is useful in understanding this document is 185 found in the <a href="#References">References</a>. For the 186 latest version of the Unicode Standard see [<a href= 187 "https://www.unicode.org/versions/latest/">Unicode</a>]. For a 188 list of current Unicode Technical Reports see [<a href= 189 "https://www.unicode.org/reports/">Reports</a>]. For more 190 information about versions of the Unicode Standard, see 191 [<a href= 192 "https://www.unicode.org/versions/">Versions</a>].</i></p><!-- This section of Parts should be identical in all of the parts of this UTS. --> 193 <h2><a name="Parts" href="#Parts" id="Parts">Parts</a></h2> 194 <p>The LDML specification is divided into the following 195 parts:</p> 196 <ul class="toc"> 197 <li>Part 1: <a href="tr35.html#Contents">Core</a> (languages, 198 locales, basic structure)</li> 199 <li>Part 2: <a href="tr35-general.html#Contents">General</a> 200 (display names & transforms, etc.)</li> 201 <li>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a> 202 (number & currency formatting)</li> 203 <li>Part 4: <a href="tr35-dates.html#Contents">Dates</a> 204 (date, time, time zone formatting)</li> 205 <li>Part 5: <a href= 206 "tr35-collation.html#Contents">Collation</a> (sorting, 207 searching, grouping)</li> 208 <li>Part 6: <a href= 209 "tr35-info.html#Contents">Supplemental</a> (supplemental 210 data)</li> 211 <li>Part 7: <a href= 212 "tr35-keyboards.html#Contents">Keyboards</a> (keyboard 213 mappings)</li> 214 </ul> 215 <h2><a name="Contents" href="#Contents" id="Contents">Contents 216 of Part 1, Core</a></h2> 217 <!-- START Generated TOC: CheckHtmlFiles --> 218 <ul class="toc"> 219 <li>1 <a href="#Introduction">Introduction</a> 220 <ul class="toc"> 221 <li>1.1 <a href="#Conformance">Conformance</a></li> 222 </ul> 223 </li> 224 <li>2 <a href="#Locale">What is a Locale?</a></li> 225 <li>3 <a href="#Identifiers">Unicode Language and Locale 226 Identifiers</a> 227 <ul class="toc"> 228 <li>3.1 <a href="#Unicode_language_identifier">Unicode 229 Language Identifier</a></li> 230 <li>3.2 <a href="#Unicode_locale_identifier">Unicode 231 Locale Identifier</a> 232 <ul class='toc'> 233 <li><a href="#Canonical_Unicode_Locale_Identifiers">3.2.1 Canonical Unicode Locale Identifiers</a></li> 234 </ul> 235 </li> 236 <li>3.3 <a href="#BCP_47_Conformance">BCP 47 237 Conformance</a> 238 <ul class="toc"> 239 <li>3.3.1 <a href= 240 "#BCP_47_Language_Tag_Conversion">BCP 47 Language Tag 241 Conversion</a></li> 242 </ul> 243 </li> 244 <li>3.4 <a href="#Field_Definitions">Language Identifier 245 Field Definitions</a> 246 <ul class="toc"> 247 <li>Table: <a href= 248 "#Language_Locale_Field_Definitions">Language 249 Identifier Field Definitions</a></li> 250 </ul> 251 </li> 252 <li>3.5 <a href="#Special_Codes">Special Codes</a> 253 <ul class="toc"> 254 <li>3.5.1 <a href= 255 "#Unknown_or_Invalid_Identifiers">Unknown or Invalid 256 Identifiers</a></li> 257 <li>3.5.2 <a href="#Numeric_Codes">Numeric 258 Codes</a></li> 259 <li>3.5.3 <a href="#Private_Use_Codes">Private Use 260 Codes</a> 261 <ul class="toc"> 262 <li>Table: <a href="#Private_Use_CLDR">Private 263 Use Codes in CLDR</a></li> 264 </ul> 265 </li> 266 </ul> 267 </li> 268 <li>3.6 <a href= 269 "#Locale_Extension_Key_and_Type_Data">Unicode BCP 47 U 270 Extension</a> 271 <ul class="toc"> 272 <li>3.6.1 <a href="#Key_And_Type_Definitions_">Key 273 And Type Definitions</a> 274 <ul class="toc"> 275 <li>Table: <a href= 276 "#Key_Type_Definitions">Key/Type 277 Definitions</a></li> 278 </ul> 279 </li> 280 <li>3.6.2 <a href= 281 "#Numbering%20System%20Data">Numbering System 282 Data</a></li> 283 <li>3.6.3 <a href="#Time_Zone_Identifiers">Time Zone 284 Identifiers</a></li> 285 <li>3.6.4 <a href= 286 "#Unicode_Locale_Extension_Data_Files">U Extension 287 Data Files</a></li> 288 <li>3.6.5 <a href= 289 "#Unicode_Subdivision_Codes">Subdivision Codes</a> 290 <ul class="toc"> 291 <li>3.6.5.1 <a href="#Validity">Validity</a></li> 292 </ul> 293 </li> 294 </ul> 295 </li> 296 <li>3.7 <a href="#t_Extension">Unicode BCP 47 T 297 Extension</a> 298 <ul class="toc"> 299 <li>3.7.1 <a href="#Transformed_Content_Data_File">T 300 Extension Data Files</a></li> 301 </ul> 302 </li> 303 <li>3.8 <a href="#Compatibility_with_Older_Identifiers"> 304 Compatibility with Older Identifiers</a> 305 <ul class="toc"> 306 <li>3.8.1 <a href="#Old_Locale_Extension_Syntax">Old 307 Locale Extension Syntax</a> 308 <ul class="toc"> 309 <li>Table: <a href= 310 "#Locale_Extension_Mappings">Locale Extension 311 Mappings</a></li> 312 </ul> 313 </li> 314 <li>3.8.2 <a href="#Legacy_Variants">Legacy 315 Variants</a> 316 <ul class="toc"> 317 <li>Table: <a href= 318 "#Legacy_Variant_Mappings">Legacy Variant 319 Mappings</a></li> 320 </ul> 321 </li> 322 <li>3.8.3 <a href="#Relation_to_OpenI18n">Relation to 323 OpenI18n</a></li> 324 </ul> 325 </li> 326 <li>3.9 <a href= 327 "#Transmitting_Locale_Information">Transmitting Locale 328 Information</a> 329 <ul class="toc"> 330 <li>3.9.1 <a href= 331 "#Message_Formatting_and_Exceptions">Message 332 Formatting and Exceptions</a></li> 333 </ul> 334 </li> 335 <li>3.10 <a href="#Language_and_Locale_IDs">Unicode 336 Language and Locale IDs</a> 337 <ul class="toc"> 338 <li>3.10.1 <a href="#Written_Language">Written 339 Language</a></li> 340 <li>3.10.2 <a href="#Hybrid_Locale">Hybrid Locale 341 Identifiers</a></li> 342 </ul> 343 </li> 344 <li>3.11 <a href="#Validity_Data">Validity Data</a></li> 345 </ul> 346 </li> 347 <li>4 <a href="#Locale_Inheritance">Locale Inheritance and 348 Matching</a> 349 <ul class="toc"> 350 <li>4.1 <a href="#Lookup">Lookup</a> 351 <ul class="toc"> 352 <li>4.1.1 <a href="#Bundle_vs_Item_Lookup">Bundle vs 353 Item Lookup</a> 354 <ul class="toc"> 355 <li>Table: <a href="#Lookup-Differences">Lookup 356 Differences</a></li> 357 </ul> 358 </li> 359 <li>4.1.2 <a href="#Multiple_Inheritance">Lateral 360 Inheritance</a> 361 <ul class="toc"> 362 <li>Table: <a href="#Count_Fallback_normal">Count 363 Fallback: normal</a></li> 364 <li>Table: <a href= 365 "#Count_Fallback_currency">Count Fallback: 366 currency</a></li> 367 </ul> 368 </li> 369 <li>4.1.3 <a href="#Parent_Locales">Parent 370 Locales</a></li> 371 </ul> 372 </li> 373 <li>4.2 <a href="#Inheritance_and_Validity">Inheritance 374 and Validity</a> 375 <ul class="toc"> 376 <li>4.2.1 <a href="#Definitions">Definitions</a></li> 377 <li>4.2.2 <a href="#Resolved_Data_File">Resolved Data 378 File</a></li> 379 <li>4.2.3 <a href="#Valid_Data">Valid Data</a></li> 380 <li>4.2.4 <a href= 381 "#Checking_for_Draft_Status">Checking for Draft 382 Status</a></li> 383 <li>4.2.5 <a href= 384 "#Keyword_and_Default_Resolution">Keyword and Default 385 Resolution</a></li> 386 <li>4.2.6 <a href= 387 "#Inheritance_vs_Related">Inheritance vs Related 388 Information</a></li> 389 </ul> 390 </li> 391 <li>4.3 <a href="#Likely_Subtags">Likely Subtags</a></li> 392 <li>4.4 <a href="#LanguageMatching">Language Matching</a> 393 <ul class='toc'> 394 <li>4.4.1 <a href= 395 "#EnhancedLanguageMatching">Enhanced Language 396 Matching</a></li> 397 </ul> 398 </li> 399 </ul> 400 </li> 401 <li>5 <a href="#XML_Format">XML Format</a> 402 <ul class="toc"> 403 <li>5.1 <a href="#Common_Elements">Common Elements</a> 404 <ul class="toc"> 405 <li>5.1.1 <a href="#special">Element special</a> 406 <ul class="toc"> 407 <li>5.1.1.1 <a href= 408 "#Sample_Special_Elements">Sample Special 409 Elements</a></li> 410 </ul> 411 </li> 412 <li>5.1.2 <a href="#Alias_Elements">Element alias</a> 413 <ul class="toc"> 414 <li>Table: <a href= 415 "#Inheritance_with_source_locale_">Inheritance 416 with source="locale"</a></li> 417 </ul> 418 </li> 419 <li>5.1.3 <a href="#Element_displayName">Element 420 displayName</a></li> 421 <li>5.1.4 <a href="#Escaping_Characters">Escaping 422 Characters</a></li> 423 </ul> 424 </li> 425 <li>5.2 <a href="#Common_Attributes">Common 426 Attributes</a> 427 <ul class="toc"> 428 <li>5.2.1 <a href="#Attribute_type">Attribute 429 type</a></li> 430 <li>5.2.2 <a href="#Attribute_draft">Attribute 431 draft</a></li> 432 <li>5.2.3 <a href="#alt_attribute">Attribute 433 alt</a></li> 434 </ul> 435 </li> 436 <li>5.3 <a href="#Common_Structures">Common 437 Structures</a> 438 <ul class="toc"> 439 <li>5.3.1 <a href="#Date_Ranges">Date and Date 440 Ranges</a></li> 441 <li>5.3.2 <a href="#Text_Directionality">Text 442 Directionality</a></li> 443 <li>5.3.3 <a href="#Unicode_Sets">Unicode Sets</a> 444 <ul class="toc"> 445 <li>5.3.3.1 <a href="#Lists_of_Code_Points">Lists 446 of Code Points</a></li> 447 <li>5.3.3.2 <a href="#Unicode_Properties">Unicode 448 Properties</a></li> 449 <li>5.3.3.3 <a href="#Boolean_Operations">Boolean 450 Operations</a></li> 451 <li>5.3.3.4 <a href= 452 "#UnicodeSet_Examples">UnicodeSet 453 Examples</a></li> 454 </ul> 455 </li> 456 <li>5.3.4 <a href="#String_Range">String 457 Range</a></li> 458 </ul> 459 </li> 460 <li>5.4 <a href="#Identity_Elements">Identity 461 Elements</a></li> 462 <li>5.5 <a href="#Valid_Attribute_Values">Valid Attribute 463 Values</a></li> 464 <li>5.6 <a href="#Canonical_Form">Canonical Form</a> 465 <ul class="toc"> 466 <li>5.6.1 <a href="#Content">Content</a></li> 467 <li>5.6.2 <a href="#Ordering">Ordering</a></li> 468 <li>5.6.3 <a href="#Comments">Comments</a></li> 469 </ul> 470 </li> 471 <li>5.7 <a href="#DTD_Annotations">DTD 472 Annotations</a> 473 <ul class='toc'> 474 <li>5.7.1 <a href="#match_expressions" >Attribute Value Constraints</a></li> 475 </ul> 476 </li> 477 </ul> 478 </li> 479 <li>6 <a href="#Property_Data">Property Data</a> 480 <ul class="toc"> 481 <li>6.1 <a href="#Script_Metadata">Script 482 Metadata</a></li> 483 <li>6.2 <a href="#Extended_Pictographic">Extended 484 Pictographic</a></li> 485 <li>6.3 <a href="#Labels.txt">Labels.txt</a></li> 486 <li><a href="#Segmentation_Tests">6.4 Segmentation Tests</a></li> 487 </ul> 488 </li> 489 <li>7 <a href="#Format_Parse_Issues">Issues in Formatting and 490 Parsing</a> 491 <ul class="toc"> 492 <li>7.1 <a href="#Lenient_Parsing">Lenient Parsing</a> 493 <ul class="toc"> 494 <li>7.1.1 <a href="#Motivation">Motivation</a></li> 495 <li>7.1.2 <a href="#Loose_Matching">Loose 496 Matching</a></li> 497 </ul> 498 </li> 499 <li>7.2 <a href="#Invalid_Patterns">Handling Invalid 500 Patterns</a></li> 501 </ul> 502 </li> 503 <li>Annex A <a href="#Deprecated_Structure">Deprecated 504 Structure</a> 505 <ul class="toc"> 506 <li>A.1 <a href="#Fallback_Elements">Element 507 fallback</a></li> 508 <li>A.2 <a href="#BCP47_Keyword_Mapping">BCP 47 Keyword 509 Mapping</a></li> 510 <li>A.3 <a href="#Choice_Patterns">Choice 511 Patterns</a></li> 512 <li>A.4 <a href="#Element_default">Element 513 default</a></li> 514 <li>A.5 <a href= 515 "#Deprecated_Common_Attributes">Deprecated Common 516 Attributes</a> 517 <ul> 518 <li>A.5.1 <a href="#Attribute_standard">Attribute 519 standard</a></li> 520 <li>A.5.2 <a href= 521 "#Attribute_draft_nonLeaf">Attribute draft in 522 non-leaf elements</a></li> 523 </ul> 524 </li> 525 <li>A.6 <a href="#Element_base">Element base</a></li> 526 <li>A.7 <a href="#Element_rules">Element rules</a></li> 527 <li>A.8 <a href= 528 "#Deprecated_subelements_of_dates">Deprecated subelements 529 of <dates></a></li> 530 <li>A.9 <a href= 531 "#Deprecated_subelements_of_calendars">Deprecated 532 subelements of <calendars></a></li> 533 <li>A.10 <a href= 534 "#Deprecated_subelements_of_timeZoneNames">Deprecated 535 subelements of <timeZoneNames></a></li> 536 <li>A.11 <a href= 537 "#Deprecated_subelements_of_zone_metazone">Deprecated 538 subelements of <zone> and <metazone></a></li> 539 <li>A.12 <a href= 540 "#Renamed_attribute_values_for_contextTransformUsage">Renamed 541 attribute values for <contextTransformUsage> 542 element</a></li> 543 <li>A.13 <a href= 544 "#Deprecated_subelements_of_segmentations">Deprecated 545 subelements of <segmentations></a></li> 546 <li>A.14 <a href="#Element_cp">Element cp</a></li> 547 <li>A.15 <a href="#validSubLocales">Attribute 548 validSubLocales</a></li> 549 <li>A.16 <a href="#postCodeElements">Elements 550 postalCodeData, postCodeRegex</a></li> 551 <li>A.17 <a href="#telephoneCodeData">Element 552 telephoneCodeData</a></li> 553 </ul> 554 </li> 555 <li>Annex B <a href="#Links_to_Other_Parts">Links to Other 556 Parts</a> 557 <ul class="toc"> 558 <li>Table: <a href="#Part_2_Links">Part 2 Links: General 559 (display names & transforms, etc.)</a></li> 560 <li>Table: <a href="#Part_3_Links">Part 3 Links: Numbers 561 (number & currency formatting)</a></li> 562 <li>Table: <a href="#Part_4_Links">Part 4 Links: Dates 563 (date, time, time zone formatting)</a></li> 564 <li>Table: <a href="#Part_5_Links">Part 5 Links: 565 Collation (sorting, searching, grouping)</a></li> 566 <li>Table: <a href="#Part_6_Links">Part 6 Links: 567 Supplemental (supplemental data)</a></li> 568 <li>Table: <a href="#Part_7_Links">Part 7 Links: 569 Keyboards (keyboard mappings)</a></li> 570 </ul> 571 </li> 572 <li>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></li> 573 <li><a href="#References">References</a></li> 574 <li><a href="#Acknowledgments">Acknowledgments</a></li> 575 <li><a href="#Modifications">Modifications</a></li> 576 </ul><!-- END Generated TOC: CheckHtmlFiles --> 577 <h2><a name="Introduction" href="#Introduction" id= 578 "Introduction">1 Introduction</a></h2> 579 <p>Not long ago, computer systems were like separate worlds, 580 isolated from one another. The internet and related events have 581 changed all that. A single system can be built of many 582 different components, hardware and software, all needing to 583 work together. Many different technologies have been important 584 in bridging the gaps; in the internationalization arena, 585 Unicode has provided a lingua franca for communicating textual 586 data. However, there remain differences in the locale data used 587 by different systems.</p> 588 <p>The best practice for internationalization is to store and 589 communicate language-neutral data, and format that data for the 590 client. This formatting can take place on any of a number of 591 the components in a system; a server might format data based on 592 the user's locale, or it could be that a client machine does 593 the formatting. The same goes for parsing data, and 594 locale-sensitive analysis of data.</p> 595 <p>But there remain significant differences across systems and 596 applications in the locale-sensitive data used for such 597 formatting, parsing, and analysis. Many of those differences 598 are simply gratuitous; all within acceptable limits for human 599 beings, but yielding different results. In many other cases 600 there are outright errors. Whatever the cause, the differences 601 can cause discrepancies to creep into a heterogeneous system. 602 This is especially serious in the case of collation 603 (sort-order), where different collation caused not only 604 ordering differences, but also different results of queries! 605 That is, with a query of customers with names between "Abbot, 606 Cosmo" and "Arnold, James", if different systems have different 607 sort orders, different lists will be returned. (For comparisons 608 across systems formatted as HTML tables, see [<a href= 609 "#Comparisons">Comparisons</a>].)</p> 610 <blockquote> 611 <p class="note"><b>Note:</b> There are many different equally 612 valid ways in which data can be judged to be "correct" for a 613 particular locale. The goal for the common locale data is to 614 make it as consistent as possible with existing locale data, 615 and acceptable to users in that locale.</p> 616 </blockquote> 617 <p>This document specifies an XML format for the communication 618 of locale data: the Unicode Locale Data Markup Language (LDML). 619 This provides a common format for systems to interchange locale 620 data so that they can get the same results in the services 621 provided by internationalization libraries. It also provides a 622 standard format that can allow users to customize the behavior 623 of a system. With it, for example, collation (sorting) rules 624 can be exchanged, allowing two implementations to exchange a 625 specification of tailored collation rules. Using the same 626 specification, the two implementations will achieve the same 627 results in comparing strings. Unicode LDML can also be used to 628 let a user encapsulate specialized sorting behavior for a 629 specific domain, or create a customized locale for a minority 630 language. Unicode LDML is also used in the Unicode Common 631 Locale Data Repository (CLDR). CLDR uses an open process for 632 reconciling differences between the locale data used on 633 different systems and validating the data, to produce with a 634 useful, common, consistent base of locale data.</p> 635 <p>For more information, see the Common Locale Data Repository 636 project page [<a href="#localeProject">LocaleProject</a>].</p> 637 <p>As LDML is an interchange format, it was designed for ease 638 of maintenance and simplicity of transformation into other 639 formats, above efficiency of run-time lookup and use. 640 Implementations should consider converting LDML data into a 641 more compact format prior to use.</p> 642 <h3><a name="Conformance" href="#Conformance" id= 643 "Conformance">1.1 Conformance</a></h3> 644 <p>There are many ways to use the Unicode LDML format and the 645 data in CLDR, and the Unicode Consortium does not restrict the 646 ways in which the format or data are used. However, an 647 implementation may also claim conformance to LDML or to CLDR, 648 as follows:</p> 649 <p> </p> 650 <p><i><b>UAX35-C1.</b></i> An implementation that claims 651 conformance to this specification shall:</p> 652 <ol> 653 <li>Identify the sections of the specification that it 654 conforms to. 655 <ul> 656 <li>For example, an implementation might claim 657 conformance to all LDML features except for 658 <i>transforms</i> and <i>segments</i>.</li> 659 </ul> 660 </li> 661 <li>Interpret the relevant elements and attributes of LDML 662 documents in accordance with the descriptions in those 663 sections. 664 <ul> 665 <li>For example, an implementation that claims 666 conformance to the date format patterns must interpret 667 the characters in such patterns according to <a href= 668 "tr35-dates.html#Date_Field_Symbol_Table">Date Field 669 Symbol Table</a>.</li> 670 </ul> 671 </li> 672 <li>Declare which types of CLDR data that it uses. 673 <ul> 674 <li>For example, an implementation might declare that it 675 only uses language names, and those with a <i>draft</i> 676 status of <i>contributed</i> or <i>approved</i>.</li> 677 </ul> 678 </li> 679 </ol> 680 <p><i><b>UAX35-C2.</b></i> An implementation that claims 681 conformance to Unicode locale or language identifiers 682 shall:</p> 683 <ol> 684 <li>Specify whether Unicode locale extensions are 685 allowed</li> 686 <li>Specify the canonical form used for identifiers in terms 687 of casing and field separator characters.</li> 688 </ol> 689 <p>External specifications may also reference particular 690 components of Unicode locale or language identifiers, such 691 as:</p> 692 <blockquote> 693 <p><i>Field X can contain any Unicode region subtag values as 694 given in Unicode Technical Standard #35: Unicode Locale Data 695 Markup Language (LDML), excluding grouping codes.</i></p> 696 </blockquote> 697 <h2><a name="Locale" href="#Locale" id="Locale">2 What is a 698 Locale?</a></h2> 699 <p>Before diving into the XML structure, it is helpful to 700 describe the model behind the structure. People do not have to 701 subscribe to this model to use data in LDML, but they do need 702 to understand it so that the data can be correctly translated 703 into whatever model their implementation uses.</p> 704 <p>The first issue is basic: <i>what is a locale?</i> In this 705 model, a locale is an identifier (id) that refers to a set of 706 user preferences that tend to be shared across significant 707 swaths of the world. Traditionally, the data associated with 708 this id provides support for formatting and parsing of dates, 709 times, numbers, and currencies; for measurement units, for 710 sort-order (collation), plus translated names for time zones, 711 languages, countries, and scripts. The data can also include 712 support for text boundaries (character, word, line, and 713 sentence), text transformations (including transliterations), 714 and other services.</p> 715 <p>Locale data is not cast in stone: the data used on someone's 716 machine generally may reflect the US format, for example, but 717 preferences can typically set to override particular items, 718 such as setting the date format for 2002.03.15, or using metric 719 or Imperial measurement units. In the abstract, locales are 720 simply one of many sets of preferences that, say, a website may 721 want to remember for a particular user. Depending on the 722 application, it may want to also remember the user's time zone, 723 preferred currency, preferred character set, smoker/non-smoker 724 preference, meal preference (vegetarian, kosher, and so on), 725 music preference, religion, party affiliation, favorite 726 charity, and so on.</p> 727 <p>Locale data in a system may also change over time: country 728 boundaries change; governments (and currencies) come and go: 729 committees impose new standards; bugs are found and fixed in 730 the source data; and so on. Thus the data needs to be versioned 731 for stability over time.</p> 732 <p>In general terms, the locale id is a parameter that is 733 supplied to a particular service (date formatting, sorting, 734 spell-checking, and so on). The format in this document does 735 not attempt to represent all the data that could conceivably be 736 used by all possible services. Instead, it collects together 737 data that is in common use in systems and internationalization 738 libraries for basic services. The main difference among locales 739 is in terms of language; there may also be some differences 740 according to different countries or regions. However, the line 741 between <i>locales</i> and <i>languages</i>, as commonly used 742 in the industry, are rather fuzzy. Note also that the vast 743 majority of the locale data in CLDR is in fact language data; 744 all non-linguistic data is separated out into a separate tree. 745 For more information, see <i><a href= 746 "#Language_and_Locale_IDs">Section 3.10 Language and Locale 747 IDs</a></i>.</p> 748 <p>We will speak of data as being "in locale X". That does not 749 imply that a locale <i>is</i> a collection of data; it is 750 simply shorthand for "the set of data associated with the 751 locale id X". Each individual piece of data is called a 752 <i>resource</i> or <i>field</i>, and a tag indicating the key 753 of the resource is called a <i>resource tag.</i></p> 754 <h2><a name="Identifiers" href="#Identifiers" id= 755 "Identifiers"></a> <a name= 756 "Unicode_Language_and_Locale_Identifiers" href= 757 "#Unicode_Language_and_Locale_Identifiers" id= 758 "Unicode_Language_and_Locale_Identifiers">3 Unicode Language 759 and Locale Identifiers</a></h2> 760 <p>Unicode LDML uses stable identifiers based on [<a href= 761 "#BCP47">BCP47</a>] for distinguishing among languages, 762 locales, regions, currencies, time zones, transforms, and so 763 on. There are many systems for identifiers for these entities. 764 The Unicode LDML identifiers may not match the identifiers used 765 on a particular target system. If so, some process of 766 identifier translation may be required when using LDML 767 data.</p> 768 <p>The BCP 47 extensions (-u- and -t-) are described in 769 <em>Section 3.6 <a href="#u_Extension">Unicode BCP 47 U 770 Extension</a></em> and <em>Section 3.7 <a href= 771 "#BCP47_T_Extension">Unicode BCP 47 T Extension</a></em>.</p> 772 <h3><i><a name="Unicode_language_identifier" href= 773 "#Unicode_language_identifier" id= 774 "Unicode_language_identifier">3.1 Unicode Language 775 Identifier</a></i></h3> 776 <p>A <i>Unicode language identifier</i> has the following 777 structure (provided in EBNF (Perl-based)). The following table defines 778 syntactically well-formed identifiers: they are not necessarily 779 valid identifiers. For additional validity criteria, see the 780 links on the right.</p> 781 <table> 782 <tr> 783 <th> </th> 784 <th> 785 <div align="center"> 786 EBNF 787 </div> 788 </th> 789 <th> 790 <div align="center"> 791 Validity / Comments 792 </div> 793 </th> 794 </tr> 795 <tr> 796 <td><code><a href="#unicode_language_id" name= 797 "unicode_language_id" id= 798 "unicode_language_id">unicode_language_id</a></code></td> 799 <td><code>= "root"<br> 800 | (unicode_language_subtag<br> 801 (sep unicode_script_subtag)?<br> 802 | unicode_script_subtag)<br> 803 (sep unicode_region_subtag)?<br> 804 (sep unicode_variant_subtag)* ;</code></td> 805 <td>"root" is treated as a special 806 <code>unicode_language_subtag</code></td> 807 </tr> 808 <tr> 809 <td><code><a href="#unicode_language_subtag" name= 810 "unicode_language_subtag" id= 811 "unicode_language_subtag">unicode_language_subtag</a></code></td> 812 <td><code>= alpha{2,3} | alpha{5,8};</code></td> 813 <td><code><a href= 814 '#unicode_language_subtag_validity'>validity</a><br> 815 <a href= 816 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/language.xml'> 817 latest-data</a></code></td> 818 </tr> 819 <tr> 820 <td><code><a href="#unicode_script_subtag" name= 821 "unicode_script_subtag" id= 822 "unicode_script_subtag">unicode_script_subtag</a></code></td> 823 <td><code>= alpha{4} ;</code></td> 824 <td><code><a href= 825 '#unicode_script_subtag_validity'>validity</a><br> 826 <a href= 827 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/script.xml'> 828 latest-data</a></code></td> 829 </tr> 830 <tr> 831 <td><code><a href="#unicode_region_subtag" name= 832 "unicode_region_subtag" id= 833 "unicode_region_subtag">unicode_region_subtag</a></code></td> 834 <td><code>= (alpha{2} | digit{3}) ;</code></td> 835 <td><code><a href= 836 '#unicode_language_subtag_validity'>validity</a><br> 837 <a href= 838 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/region.xml'> 839 latest-data</a></code></td> 840 </tr> 841 <tr> 842 <td><code><a href="#unicode_variant_subtag" name= 843 "unicode_variant_subtag" id= 844 "unicode_variant_subtag">unicode_variant_subtag</a></code></td> 845 <td><code>= (alphanum{5,8}<br> 846 | digit alphanum{3}) ;</code></td> 847 <td><code><a href= 848 '#unicode_language_subtag_validity'>validity</a><br> 849 <a href= 850 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/variant.xml'> 851 latest-data</a></code></td> 852 </tr> 853 <tr> 854 <td><code>sep</code></td> 855 <td><code>= [-_] ;</code></td> 856 </tr> 857 <tr> 858 <td><code>digit</code></td> 859 <td><code>= [0-9] ;</code></td> 860 </tr> 861 <tr> 862 <td><code>alpha</code></td> 863 <td><code>= [A-Z a-z] ;</code></td> 864 </tr> 865 <tr> 866 <td><code>alphanum</code></td> 867 <td><code>= [0-9 A-Z a-z] ;</code></td> 868 </tr> 869 </table> 870 <p>The semantics of the various subtags is explained in 871 <em>Section 3.4 <a href="#Field_Definitions">Language 872 Identifier Field Definitions</a></em> ; there are also direct 873 links from <code><a href= 874 "#unicode_language_subtag">unicode_language_subtag</a></code> , 875 etc. While theoretically the <code><a href= 876 "#unicode_language_subtag">unicode_language_subtag</a></code> 877 may have more than 3 letters through the IANA registration 878 process, in practice that has not occurred. The <code><a href= 879 "#unicode_language_subtag">unicode_language_subtag</a></code> 880 "und" may be omitted when there is a <code><a href= 881 "#unicode_script_subtag">unicode_script_subtag</a></code> ; for 882 that reason <code><a href= 883 "#unicode_language_subtag">unicode_language_subtag</a></code> 884 values with 4 letters are not permitted. However, such 885 <code><a href= 886 "#unicode_language_id">unicode_language_id</a></code> values 887 are not intended for general interchange, because they are not 888 valid BCP 47 tags. Instead, they are intended for certain 889 protocols such as the identification of transliterators or font 890 ScriptLangTag values. For more information on language subtags with 4 letters, see <a href= 891 "#Language_Tag_to_Locale_Identifier" >BCP 47 Language Tag to 892 Unicode BCP 47 Locale Identifier</a>.</p> 893 <p>For example, "en-US" (American English), "en_GB" (British 894 English), "es-419" (Latin American Spanish), and "uz-Cyrl" 895 (Uzbek in Cyrillic) are all valid Unicode language 896 identifiers.</p> 897 <h3><i><a name="Unicode_locale_identifier" href= 898 "#Unicode_locale_identifier" id="Unicode_locale_identifier">3.2 899 Unicode Locale Identifier</a></i></h3> 900 <p>A <i>Unicode locale identifier</i> is composed of a Unicode 901 language identifier plus (optional) locale extensions. It has 902 the following structure. The semantics of the U and T 903 extensions are explained in <em>Section 3.6 <a href= 904 "#u_Extension">Unicode BCP 47 U Extension</a></em> and 905 <em>Section 3.7 <a href="#BCP47_T_Extension">Unicode BCP 47 T 906 Extension</a></em>. Other extensions and private use extensions 907 are supported for pass-through. The following table defines 908 syntactically <em>well-formed</em> identifiers: they are not 909 necessarily <em>valid</em> identifiers. For additional validity 910 criteria, see the links on the right. </p> 911 <p>As is often the case, the complete syntactic constraints are not easily captured by ABNF, so there is a further condition: There cannot be more than one extension with the 912 same singleton (-a-, …, -t-, -u-, …). Note that the private use extension (-x-) must 913 come after all other extensions. </p> 914 <table border="0"> 915 <tr> 916 <th> </th> 917 <th> 918 <div align="center"> 919 EBNF 920 </div> 921 </th> 922 <th> 923 <div align="center"> 924 Validity 925 </div> 926 </th> 927 </tr> 928 <tr> 929 <td><code><a href="#unicode_locale_id" name= 930 "unicode_locale_id" id= 931 "unicode_locale_id">unicode_locale_id</a></code></td> 932 <td><code>= unicode_language_id<br> 933 extensions*<br> 934 pu_extensions? ;</code></td> 935 </tr> 936 <tr> 937 <td><code><a href="#extensions" name="extensions" id= 938 "extensions">extensions</a></code></td> 939 <td><code>= unicode_locale_extensions<br> 940 | transformed_extensions<br> 941 | other_extensions ;</code></td> 942 </tr> 943 <tr> 944 <td><code><a href="#unicode_locale_extensions" name= 945 "unicode_locale_extensions" id= 946 "unicode_locale_extensions">unicode_locale_extensions</a></code></td> 947 <td><code>= sep [uU]<br> 948 ((sep keyword)+<br> 949 |(sep attribute)+ (sep keyword)*) ;</code></td> 950 </tr> 951 <tr> 952 <td><code><a href="#transformed_extensions" name= 953 "transformed_extensions" id= 954 "transformed_extensions">transformed_extensions</a></code></td> 955 <td><code>= sep [tT]<br> 956 ((sep tlang (sep tfield)*)<br> 957 | (sep tfield)+) ;</code></td> 958 </tr> 959 <tr> 960 <td><code><a href="#pu_extensions" name="pu_extensions" id= 961 "pu_extensions">pu_extensions</a></code></td> 962 <td><code>= sep [xX]<br> 963 (sep alphanum{1,8})+ ;</code></td> 964 </tr> 965 <tr> 966 <td><code><a href="#other_extensions" name= 967 "other_extensions" id= 968 "other_extensions">other_extensions</a></code></td> 969 <td><code>= sep [alphanum-[tTuUxX]]<br> 970 (sep alphanum{2,8})+ ;</code></td> 971 </tr> 972 <tr> 973 <td><code>keyword</code><br> 974 (Also known as <code>uvalue</code>)</td> 975 <td><code>= key (sep type)? ;</code></td> 976 </tr> 977 <tr> 978 <td><code>key</code><br> 979 (Also known as <code>ukey</code>)</td> 980 <td><code>= alphanum alpha ;</code><br> 981 (Note that this is narrower than in [<a href="https://www.ietf.org/rfc/rfc6067.txt" title="https://www.ietf.org/rfc/rfc6067.txt">RFC6067</a>], so that it is disjoint with tkey.)</td> 982 <td><code><a href="#Key_Type_Definitions">validity</a><br> 983 <a href= 984 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/bcp47'>latest-data</a></code></td> 985 </tr> 986 <tr> 987 <td><code>type</code><br> 988 (Also known as <code>uvalue</code>)</td> 989 <td><code>= alphanum{3,8}<br> 990 (sep alphanum{3,8})* ;</code></td> 991 <td><code><a href="#Key_Type_Definitions">validity</a><br> 992 <a href= 993 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/bcp47'>latest-data</a></code></td> 994 </tr> 995 <tr> 996 <td><code>attribute</code></td> 997 <td><code>= alphanum{3,8} ;</code></td> 998 </tr> 999 <tr> 1000 <td><code><a name="unicode_subdivision_id" href= 1001 "#unicode_subdivision_id" id= 1002 "unicode_subdivision_id">unicode_subdivision_id</a><a name= 1003 "unicode_subdivision_subtag" id= 1004 "unicode_subdivision_subtag"></a><a name= 1005 "subdivision_attribute" id= 1006 "subdivision_attribute"></a></code></td> 1007 <td><code>= <a href= 1008 "#unicode_region_subtag">unicode_region_subtag</a> 1009 unicode_subdivision_suffix ;</code></td> 1010 <td><code><a href= 1011 '#unicode_subdivision_subtag_validity'>validity</a><br> 1012 <a href= 1013 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/subdivision.xml'> 1014 latest-data</a></code></td> 1015 </tr> 1016 <tr> 1017 <td><code>unicode_subdivision_suffix</code></td> 1018 <td><code>= alphanum{1,4} ;</code></td> 1019 </tr> 1020 <tr> 1021 <td><code><a name="unicode_measure_unit" href= 1022 "#unicode_measure_unit" id= 1023 "unicode_measure_unit">unicode_measure_unit</a></code></td> 1024 <td><code>= alphanum{3,8}<br> 1025 (sep alphanum{3,8})* ;</code></td> 1026 <td><code><a href='#Validity_Data'>validity</a><br> 1027 <a href= 1028 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/validity/unit.xml'>latest-data</a></code></td> 1029 </tr> 1030 <tr> 1031 <td><code>tlang</code></td> 1032 <td><code>= unicode_language_subtag<br> 1033 (sep unicode_script_subtag)?<br> 1034 (sep unicode_region_subtag)?<br> 1035 (sep unicode_variant_subtag)* ;</code></td> 1036 </tr> 1037 <tr> 1038 <td><code>tfield</code></td> 1039 <td><code>= tkey tvalue;</code></td> 1040 <td><code><a href="#BCP47_T_Extension">validity</a><br> 1041 <a href= 1042 'https://github.com/unicode-org/cldr/blob/maint/maint-38/common/bcp47'>latest-data</a></code></td> 1043 </tr> 1044 <tr> 1045 <td><code>tkey</code></td> 1046 <td><code>= alpha digit ;</code></td> 1047 </tr> 1048 <tr> 1049 <td><code>tvalue</code></td> 1050 <td><code>= (sep alphanum{3,8})+ ;</code></td> 1051 </tr> 1052 </table> 1053 <p>For historical reasons, this is called a Unicode locale 1054 identifier. However, it really functions (with few exceptions) 1055 as a <span class="st">language</span> identifier, and accesses 1056 <span class="st">language</span>-based data. Except where it 1057 would be unclear, this document uses the term "locale" data 1058 loosely to encompass both types of data: for more information, 1059 see <i><a href="#Language_and_Locale_IDs">Section 3.10 Language 1060 and Locale IDs</a></i>.</p> 1061 <p>As of the release of this specification, there were no 1062 other_extensions defined. The other_extensions are present in 1063 the syntax to allow implementations to preserve that 1064 information.</p> 1065 <p>As for terminology, the term <i>code</i> may also be used 1066 instead of "subtag", and "territory" instead of "region". The 1067 primary language subtag is also called the <i>base language 1068 code</i>. For example, the base language code for "en-US" 1069 (American English) is "en" (English). The <i>type</i> may also 1070 be referred to as a <i>value</i> or <i>key-value</i>.</p> 1071 <p>The identifiers can vary in case and in the separator 1072 characters. The "-" and "_" separators are treated as 1073 equivalent, although "-" is preferred.</p> 1074 <p>All identifier field values are case-insensitive. Although 1075 case distinctions do not carry any special meaning, an 1076 implementation of LDML should use the casing recommendations in 1077 [<a href="#BCP47">BCP47</a>], especially when a Unicode locale 1078 identifier is used for locale data exchange in software 1079 protocols.</p> 1080 <h4><a name="Canonical_Unicode_Locale_Identifiers" href="#Canonical_Unicode_Locale_Identifiers">3.2.1 Canonical Unicode Locale Identifiers</a></h4> 1081 <p>A <code><a href= 1082 "#unicode_locale_id">unicode_locale_id</a></code> has <em>canonical syntax</em> when:</p> 1083 <ul> 1084 <li>It starts with a language subtag (those beginning with a script subtag are only for specialized use)</li> 1085 <li>Casing 1086 <ul> 1087 <li>Any script subtag is in title case (eg, Hant)</li> 1088 <li>Any region subtag is in uppercase (eg, DE)</li> 1089 <li>All other subtags are in lowercase (eg, en, fonipa)</li> 1090 </ul> 1091 </li> 1092 <li>Order 1093 <ul> 1094 <li>Any variants are in alphabetical order (eg, en-fonipa-scouse, 1095 not en-scouse-fonipa)</li> 1096 <li>Any extensions are in alphabetical order by their singleton 1097 (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)</li> 1098 <li>All attributes are sorted in alphabetical order.</li> 1099 <li>All keywords and tfields are sorted by alphabetical order of their keys, within their respective extensions.</li> 1100 <li>Any type or tfield value "true" is removed.</li> 1101 </ul> 1102 </li> 1103 </ul> 1104 <p>For example, the canonical form of 1105 "en-u-foo-bar-nu-thai-ca-buddhist-kk-true" is 1106 "en-u-bar-foo-ca-buddhist-kk-nu-thai". The attributes "foo" and 1107 "bar" in this example are provided only for illustration; no 1108 attribute subtags are defined by the current CLDR 1109 specification.</p> 1110 <p><b>Note:</b> The current version of CLDR data uses some 1111 non-preferred <em>syntax</em> for backward compatibility. This might be 1112 changed in future CLDR releases.</p> 1113 <ul> 1114 <li>It uses uppercase letters for variant subtags, while the 1115 preferred forms are all lowercase.</li> 1116 <li>It uses "_" as the separator, while the preferred form of 1117 the separator is "-".</li> 1118 <li>It uses "root", while the preferred form is "und".</li> 1119 </ul> 1120 1121 <p>A <code><a href= 1122 "#unicode_locale_id">unicode_locale_id</a></code> is in <em>canonical form</em> when it has canonical syntax and contains no aliased subtags. A <code><a href= 1123 "#unicode_locale_id">unicode_locale_id</a></code> can be transformed into canonical form according to <a href="#LocaleId_Canonicalization" >Annex C. LocaleId Canonicalization</a>.</p> 1124 1125 1126 <p>A <code><a href= 1127 "#unicode_locale_id">unicode_locale_id</a></code> is <em>maximal</em> when the <code><a href= 1128 "#unicode_language_id">unicode_language_id</a></code> and tlang (if any) have been transformed by the Add Likely Subtags operation in <em>Section 4.3 <a href="#Likely_Subtags">Likely Subtags</a></em>, excluding "und". </p> 1129 <blockquote><em>Example:</em> the maxmal form of ja-Kana-t-it is ja-Kana-JP-t-it-Latn-IT</blockquote> 1130 <p>Two <code><a href= 1131 "#unicode_locale_id">unicode_locale_ids</a></code> are <em>equivalent</em> when their maximal canonical forms are identical.</p> 1132 <blockquote> 1133 <p><em>Example:</em> "IW-HEBR-u-ms-imperial" ~ "he-u-ms-uksystem"</p> 1134 </blockquote> 1135 <p>The equivalence relationship may change over time, such as when subtags are deprecated or likely subtag mappings change. For example, if two countries were to merge, then various subtags would become deprecated. These kinds of changes are generally very infrequent.</p> 1136 1137 <h3><a name="BCP_47_Conformance" href="#BCP_47_Conformance" id= 1138 "BCP_47_Conformance">3.3 BCP 47 Conformance</a></h3> 1139 <p>Unicode language and locale identifiers inherit the design 1140 and the repertoire of subtags from [<a href="#BCP47">BCP47</a>] 1141 Language Tags. There are some extensions and restrictions made 1142 for the use of the Unicode locale identifier in CLDR:</p> 1143 <ul> 1144 <li>It does not allow for the full syntax of [<a href= 1145 "#BCP47">BCP47</a>]: 1146 <ul> 1147 <li>No extlang subtags are allowed (as in the BCP 47 1148 canonical form, see BCP 47 <a href= 1149 "https://tools.ietf.org/html/bcp47#section-4.5">Section 1150 4.5</a> and <a href= 1151 "https://tools.ietf.org/html/bcp47#section-3.1.7" target= 1152 "_blank">Section 3.1.7</a>)</li> 1153 <li>No irregular BCP 47 legacy language tags 1154 (marked as “Type: grandfathered” in BCP 47) are allowed 1155 (these are all deprecated in BCP 47)</li> 1156 <li>A tag must not start with the subtag "x": thus a 1157 <em>privateuse</em> (eg x-abc) can only be after a 1158 language subtag, like "und"</li> 1159 </ul> 1160 </li> 1161 <li>It allows for certain semantic additions and constraints: 1162 <ul> 1163 <li>Certain codes that are private-use in BCP-47 and ISO 1164 are given semantics by LDML</li> 1165 <li>Each macrolanguage has an identified primary 1166 encompassed language, which is treated as an alias for 1167 the macrolanguage, and thus is replaced when 1168 canonicalizing (as allowed by BCP 47, see <a href= 1169 "https://tools.ietf.org/html/bcp47#section-4.1.2">Section 1170 4.1.2</a>)</li> 1171 </ul> 1172 </li> 1173 <li>It allows certain syntax for backwards compatibility (not 1174 BCP 47-compatible): 1175 <ul> 1176 <li>The "_" character for field separator characters, as 1177 well as the "-" used in [<a href="#BCP47">BCP47</a>] 1178 (however, the canonical form is with "-")</li> 1179 <li>The subtag "root" to indicate the generic locale used 1180 as the parent of all languages in the CLDR data model 1181 ("und" can be used instead)</li> 1182 <li>The language tag may begin with a script subtag 1183 rather than a language subtag. This is specialized use 1184 only, and not required for CLDR conformance.</li> 1185 </ul> 1186 </li> 1187 </ul> 1188 <p>There are thus two subtypes of Unicode locale 1189 identifiers:</p> 1190 <ul> 1191 <li>the term <em>Unicode CLDR locale identifier</em> applies 1192 where the backwards compatibility syntax is used.</li> 1193 <li>the term <em>Unicode BCP 47 locale identifier</em> 1194 applies otherwise. A <em>Unicode BCP 47 locale 1195 identifier</em> is also a valid BCP 47 language tag.</li> 1196 </ul> 1197 <h4><a name="BCP_47_Language_Tag_Conversion" href= 1198 "#BCP_47_Language_Tag_Conversion" id= 1199 "BCP_47_Language_Tag_Conversion">3.3.1 BCP 47 Language Tag 1200 Conversion</a></h4> 1201 <p>The different identifiers can be converted to one another as 1202 described in this section.</p> 1203 <h5><a name="Language_Tag_to_Locale_Identifier" href= 1204 "#Language_Tag_to_Locale_Identifier" id= 1205 "Language_Tag_to_Locale_Identifier">BCP 47 Language Tag to 1206 Unicode BCP 47 Locale Identifier</a></h5> 1207 <p>A valid [<a href="#BCP47">BCP47</a>] language tag can be 1208 converted to a valid Unicode BCP 47 locale identifier according to <a href="#LocaleId_Canonicalization" >Annex C. LocaleId Canonicalization</a></p> 1209 1210 <p>The result is a Unicode BCP 47 locale identifier, in 1211 canonical form. It is both a BCP 47 language tag and a Unicode 1212 locale identifier. Because the process maps from all BCP 47 1213 language tags into a subset of BCP 47 language tags, the format 1214 changes are not reversible, much as a lowercase transformation 1215 of the string “McGowan” is not reversible.</p><br> 1216 <p><em>Examples</em></p> 1217 <table> 1218 <tr> 1219 <th style='width:10em'>BCP 47 language tag</th> 1220 <th style='width:10em'>Unicode BCP 47 locale 1221 identifier</th> 1222 <th>Comments</th> 1223 </tr> 1224 <tr> 1225 <td><code>en-US</code></td> 1226 <td><code>en-US</code></td> 1227 <td>no changes</td> 1228 </tr> 1229 <tr> 1230 <td><code>iw-FX</code></td> 1231 <td><code>he-FR</code></td> 1232 <td>BCP 47 canonicalization [1]</td> 1233 </tr> 1234 <tr> 1235 <td><code>cmn-TW</code></td> 1236 <td><code>zh-TW</code></td> 1237 <td>language alias [2]</td> 1238 </tr> 1239 <tr> 1240 <td><code>zh-cmn-TW</code></td> 1241 <td><code>zh-TW</code></td> 1242 <td>BCP 47 canonicalization [1], then language alias 1243 [2]</td> 1244 </tr> 1245 <tr> 1246 <td><code>sr-CS</code></td> 1247 <td><code>sr-RS</code></td> 1248 <td>territory alias [3]</td> 1249 </tr> 1250 <tr> 1251 <td><code>sh</code></td> 1252 <td><code>sr-Latn</code></td> 1253 <td>multiple replacement subtags [2.1]</td> 1254 </tr> 1255 <tr> 1256 <td><code>sh-Cyrl</code></td> 1257 <td><code>sr-Cyrl</code></td> 1258 <td>no replacement with multiple replacement subtags [2.1 1259 doesn't apply]</td> 1260 </tr> 1261 <tr> 1262 <td><code>hy-SU</code></td> 1263 <td><code>hy-AM</code></td> 1264 <td>multiple territory values [3.2]<br> 1265 <code><territoryAlias type="SU" replacement="RU AM AZ BY 1266 EE GE KZ KG LV LT MD TJ TM UA UZ" …/></code></td> 1267 </tr> 1268 <tr> 1269 <td><code>i-enochian</code></td> 1270 <td><code>und-x-i-enochian</code></td> 1271 <td>prefix any legacy language tags 1272 (marked as “Type: grandfathered” in BCP 47) with "und-x-" [4]</td> 1273 </tr> 1274 <tr> 1275 <td><code>x-abc</code></td> 1276 <td><code>und-x-abc</code></td> 1277 <td>prefix with "und-", so that there is always a base 1278 language subtag [5]</td> 1279 </tr> 1280 </table> 1281 <p> </p> 1282 <h5><a name="Unicode_Locale_Identifier_CLDR_to_BCP_47" href= 1283 "#Unicode_Locale_Identifier_CLDR_to_BCP_47" id= 1284 "Unicode_Locale_Identifier_CLDR_to_BCP_47">Unicode Locale 1285 Identifier: CLDR to BCP 47</a></h5> 1286 <p>A Unicode CLDR locale identifier can be converted to a valid 1287 [<a href="#BCP47">BCP47</a>] language tag (which is also a 1288 Unicode BCP 47 locale identifier) by performing the following 1289 transformation.</p> 1290 <ol> 1291 <li>Replace the "_" separators with "-"</li> 1292 <li>Replace the special language identifier "root" with the 1293 BCP 47 primary language tag "und"</li> 1294 <li>Add an initial "und" primary language subtag if the first 1295 subtag is a script.</li> 1296 </ol> 1297 <p><em>Examples:</em></p> 1298 <table> 1299 <tr> 1300 <th style='width:10em'>Unicode CLDR locale identifier</th> 1301 <th style='width:10em'>BCP 47 language tag</th> 1302 <th>Comments</th> 1303 </tr> 1304 <tr> 1305 <td><code>en_US</code></td> 1306 <td><code>en-US</code></td> 1307 <td>change separator [1]</td> 1308 </tr> 1309 <tr> 1310 <td><code>de_DE_u_co_phonebk</code></td> 1311 <td><code>de-DE-u-co-phonebk</code></td> 1312 <td>change separator [1]</td> 1313 </tr> 1314 <tr> 1315 <td><code>root</code></td> 1316 <td><code>und</code></td> 1317 <td>change to "und" [2]</td> 1318 </tr> 1319 <tr> 1320 <td><code>root_u_cu_usd</code></td> 1321 <td><code>und-u-cu-usd</code></td> 1322 <td>change to "und" [1, 2]</td> 1323 </tr> 1324 <tr> 1325 <td><code>Latn_DE</code></td> 1326 <td><code>und-Latn-DE</code></td> 1327 <td>add "und" [1, 3]</td> 1328 </tr> 1329 </table><br> 1330 <h5><a name="Unicode_Locale_Identifier_BCP_47_to_CLDR" href= 1331 "#Unicode_Locale_Identifier_BCP_47_to_CLDR" id= 1332 "Unicode_Locale_Identifier_BCP_47_to_CLDR">Unicode Locale 1333 Identifier: BCP 47 to CLDR</a></h5> 1334 <p>A Unicode BCP 47 locale identifier can be transformed into a 1335 Unicode CLDR locale identifier by performing the following 1336 transformation.</p> 1337 <ol> 1338 <li>the separator is changed to "_"</li> 1339 <li>the primary language subtag "und" is replaced with "root" 1340 if no script, region, or variant subtags are present.</li> 1341 </ol> 1342 <p><em>Examples:</em></p> 1343 <table> 1344 <tr> 1345 <th style='width:10em'>BCP 47 language tag</th> 1346 <th style='width:10em'>Unicode CLDR locale identifier</th> 1347 <th>Comments</th> 1348 </tr> 1349 <tr> 1350 <td><code>en-US</code></td> 1351 <td><code>en_US</code></td> 1352 <td>changes separator [1]</td> 1353 </tr> 1354 <tr> 1355 <td><code>und</code></td> 1356 <td><code>root</code></td> 1357 <td>changes to "root", because no script, region, or 1358 variant tag is present [2]</td> 1359 </tr> 1360 <tr> 1361 <td><code>und-US</code></td> 1362 <td><code>und_US</code></td> 1363 <td>no change to "und", because a region subtag is present 1364 [1]</td> 1365 </tr> 1366 <tr> 1367 <td nowrap><code>und-u-cu-USD</code></td> 1368 <td nowrap><code>root_u_cu_usd</code></td> 1369 <td>changes to "root", because no script, region, or 1370 variant tag is present [1, 2]</td> 1371 </tr> 1372 </table> 1373 <h3><a name="Field_Definitions" href="#Field_Definitions" id= 1374 "Field_Definitions">3.4 Language Identifier Field 1375 Definitions</a></h3> 1376 <p>Unicode language and locale identifier field values are 1377 provided in the following table. Note that some private-use BCP 1378 47 field values are given specific meanings in CLDR. While 1379 field values are based on [<a href="#BCP47">BCP47</a>] subtag 1380 values, their validity status in CLDR is specified by means of 1381 machine-readable files in the <a href= 1382 'https://github.com/unicode-org/cldr/releases/tag/latest/common/validity/'>common/validity/</a> 1383 subdirectory, such as language.xml. For the format of those 1384 files and more information, see <em><a href= 1385 '#Validity_Data'>Section 3.11 Validity Data</a></em>.</p> 1386 <table> 1387 <caption> 1388 <a name="Language_Locale_Field_Definitions" href= 1389 "#Language_Locale_Field_Definitions" id= 1390 "Language_Locale_Field_Definitions">Language Identifier 1391 Field Definitions</a> 1392 </caption> 1393 <tr> 1394 <th>Field</th> 1395 <th>Valid values</th> 1396 </tr> 1397 <tr> 1398 <td> 1399 <a href="#unicode_language_subtag_validity" name= 1400 "unicode_language_subtag_validity" id= 1401 "unicode_language_subtag_validity">unicode_language_subtag</a> 1402 <p>(also known as a <i>Unicode base language 1403 code)</i></p> 1404 </td> 1405 <td> 1406 Subtags in the language.xml file (see <em>Section 3.11 1407 <a href="#Validity_Data">Validity Data</a></em> ). These 1408 are based on [<a href="#BCP47">BCP47</a>] subtag values 1409 marked as <b>Type: language</b> 1410 <p>ISO 639-3 introduces the notion of "macrolanguages", 1411 where certain ISO 639-1 or ISO 639-2 codes are given 1412 broad semantics, and additional codes are given for the 1413 narrower semantics. For backwards compatibility, Unicode 1414 language identifiers retain use of the narrower semantics 1415 for these codes. For example:</p> 1416 <table border="1" cellspacing="0" cellpadding="2" style= 1417 "margin: 0.5em"> 1418 <tr> 1419 <th>For</th> 1420 <th>Use</th> 1421 <th><i>Not</i></th> 1422 </tr> 1423 <tr> 1424 <td>Standard Chinese (Mandarin)</td> 1425 <td><code>zh</code></td> 1426 <td><code>cmn</code></td> 1427 </tr> 1428 <tr> 1429 <td>Standard Arabic</td> 1430 <td><code>ar</code></td> 1431 <td><code>arb</code></td> 1432 </tr> 1433 <tr> 1434 <td>Standard Malay</td> 1435 <td><code>ms</code></td> 1436 <td><code>zsm</code></td> 1437 </tr> 1438 <tr> 1439 <td>Standard Swahili</td> 1440 <td><code>sw</code></td> 1441 <td><code>swh</code></td> 1442 </tr> 1443 <tr> 1444 <td>Standard Uzbek</td> 1445 <td><code>uz</code></td> 1446 <td><code>uzn</code></td> 1447 </tr> 1448 <tr> 1449 <td>Standard Konkani</td> 1450 <td><code>kok</code></td> 1451 <td><code>knn</code></td> 1452 </tr> 1453 <tr> 1454 <td>Northern Kurdish</td> 1455 <td><code>ku</code></td> 1456 <td><code>kmr</code></td> 1457 </tr> 1458 </table> 1459 <p>If a language subtag matches the type attribute of a 1460 languageAlias element, then the replacement value is used 1461 instead. For example, because "swh" occurs in 1462 <tt><languageAlias type="swh" 1463 replacement="sw"/></tt> , "sw" must be used instead of 1464 "swh". Thus Unicode language identifiers use "ar-EG" for 1465 Standard Arabic (Egypt), not "arb-EG"; they use "zh-TW" 1466 for Mandarin Chinese (Taiwan), not "cmn-TW".</p> 1467 <p>The private use codes listed as 1468 <strong>excluded</strong> in <em>Section 3.5.3 <a href= 1469 "#Private_Use_Codes">Private Use Codes</a></em> will never be 1470 given specific semantics in Unicode identifiers, and are 1471 thus safe for use for other purposes by other 1472 applications.</p> 1473 <p>The CLDR provides data for normalizing language/locale 1474 codes, including mapping overlong codes like "eng-840" or 1475 "eng-USA" to the correct code "en-US"; see the 1476 <strong><a href= 1477 "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/aliases.html"> 1478 Aliases</a></strong> Chart.</p> 1479 <p>The following are special language subtags:</p> 1480 <table class="simple" border="1" cellspacing="0" 1481 cellpadding="2"> 1482 <tr> 1483 <td> </td> 1484 <td><strong>Name</strong></td> 1485 <td><strong>Comment</strong></td> 1486 </tr> 1487 <tr> 1488 <td><code>mis</code></td> 1489 <td>Uncoded languages</td> 1490 <td>The content is in a language that doesn't yet 1491 have an ISO 639 code.</td> 1492 </tr> 1493 <tr> 1494 <td><code>mul</code></td> 1495 <td>Multiple languages</td> 1496 <td>The content contains more than one language or 1497 text that is simultaneously in multiple languages 1498 (such as brand names).</td> 1499 </tr> 1500 <tr> 1501 <td><code>zxx</code></td> 1502 <td>No linguistic content</td> 1503 <td>The content is not in any particular languages 1504 (such as images, symbols, etc.)</td> 1505 </tr> 1506 </table> 1507 </td> 1508 </tr> 1509 <tr> 1510 <td> 1511 <a href="#unicode_script_subtag_validity" name= 1512 "unicode_script_subtag_validity" id= 1513 "unicode_script_subtag_validity">unicode_script_subtag</a> 1514 <p>(also known as a <i>Unicode script code)</i></p> 1515 </td> 1516 <td> 1517 Subtags in the script.xml file (see <em>Section 3.11 1518 <a href="#Validity_Data">Validity Data</a></em>). These 1519 are based on [<a href="#BCP47">BCP47</a>] subtag values 1520 marked as <b>Type: script</b> 1521 <p>In most cases the script is not necessary, since the 1522 language is only customarily written in a single script. 1523 Examples of cases where it is used are:</p> 1524 <table border="1" cellspacing="0" cellpadding="2" style= 1525 "margin: 0.5em"> 1526 <tr> 1527 <td><code>az_Arab</code></td> 1528 <td>Azerbaijani in Arabic script</td> 1529 </tr> 1530 <tr> 1531 <td><code>az_Cyrl</code></td> 1532 <td>Azerbaijani in Cyrillic script</td> 1533 </tr> 1534 <tr> 1535 <td><code>az_Latn</code></td> 1536 <td>Azerbaijani in Latin script</td> 1537 </tr> 1538 <tr> 1539 <td><code>zh_Hans</code></td> 1540 <td>Chinese, in simplified script (=zh, zh-Hans, 1541 zh-CN, zh-Hans-CN)</td> 1542 </tr> 1543 <tr> 1544 <td><code>zh_Hant</code></td> 1545 <td>Chinese, in traditional script</td> 1546 </tr> 1547 </table> 1548 <p>Unicode identifiers give specific semantics to certain 1549 Unicode Script values. For more information, see also 1550 [<a href= 1551 "https://www.unicode.org/reports/tr41/#UAX24">UAX24</a>]:</p> 1552 <table cellspacing="0" cellpadding="2" border="1" style= 1553 "margin: 0.5em"> 1554 <tr> 1555 <td><code>Qaag</code></td> 1556 <td>Zawgyi</td> 1557 <td colspan="2">Qaag is a special script code for 1558 identifying the non-standard use of Myanmar 1559 characters for display with the Zawgyi font. The 1560 purpose of the code is to enable migration to 1561 standard, interoperable use of Unicode by providing 1562 an identifier for Zawgyi for tagging text, 1563 applications, input methods, font tables, 1564 transformations, and other mechanisms used for 1565 migration.</td> 1566 </tr> 1567 <tr> 1568 <td><code>Qaai</code></td> 1569 <td>Inherited</td> 1570 <td colspan="2"><strong>deprecated</strong>: the 1571 <em>canonicalized</em> form is Zinh</td> 1572 </tr> 1573 <tr> 1574 <td><code>Zinh</code></td> 1575 <td>Inherited</td> 1576 <td colspan="2"> </td> 1577 </tr> 1578 <tr> 1579 <td><code>Zsye</code></td> 1580 <td>Emoji Style</td> 1581 <td colspan="2">Prefer emoji style for characters 1582 that have both text and emoji styles available.</td> 1583 </tr> 1584 <tr> 1585 <td><code>Zsym</code></td> 1586 <td>Text Style</td> 1587 <td colspan="2">Prefer text style for characters that 1588 have both text and emoji styles available.</td> 1589 </tr> 1590 <tr> 1591 <td rowspan="7"><code>Zxxx</code></td> 1592 <td rowspan="7">Unwritten</td> 1593 <td colspan="2">Indicates spoken or otherwise 1594 unwritten content. For example:</td> 1595 </tr> 1596 <tr> 1597 <th>Sample(s)</th> 1598 <th>Description</th> 1599 </tr> 1600 <tr> 1601 <td>uz</td> 1602 <td>either written or spoken content</td> 1603 </tr> 1604 <tr> 1605 <td>uz-Latn <em>or</em> uz-Arab</td> 1606 <td>written-only content (particular script)</td> 1607 </tr> 1608 <tr> 1609 <td>uz-Zyyy</td> 1610 <td>written-only content (unspecified script)</td> 1611 </tr> 1612 <tr> 1613 <td>uz-Zxxx</td> 1614 <td>spoken-only content</td> 1615 </tr> 1616 <tr> 1617 <td>uz-Latn, uz-Zxxx</td> 1618 <td>both specific written and spoken content (using a 1619 <em>language list</em>)</td> 1620 </tr> 1621 <tr> 1622 <td><code>Zyyy</code></td> 1623 <td>Common</td> 1624 <td colspan="2"> </td> 1625 </tr> 1626 <tr> 1627 <td><code>Zzzz</code></td> 1628 <td>Unknown</td> 1629 <td colspan="2"> </td> 1630 </tr> 1631 </table> 1632 <p>The private use subtags listed as 1633 <strong>excluded</strong> in <em>Section 3.5.3 <a href= 1634 "#Private_Use_Codes">Private Use Codes</a></em> will never be 1635 given specific semantics in Unicode identifiers, and are 1636 thus safe for use for other purposes by other 1637 applications.</p> 1638 </td> 1639 </tr> 1640 <tr> 1641 <td> 1642 <a href="#unicode_region_subtag_validity" name= 1643 "unicode_region_subtag_validity" id= 1644 "unicode_region_subtag_validity">unicode_region_subtag</a> 1645 <p>(also known as a <i>Unicode region code,</i> or <i>a 1646 Unicode territory code)</i></p> 1647 </td> 1648 <td> 1649 Subtags in the region.xml file (see <em>Section 3.11 1650 <a href="#Validity_Data">Validity Data</a></em>). These 1651 are based on [<a href="#BCP47">BCP47</a>] subtag values 1652 marked as <b>Type: region</b> 1653 <p>Unicode identifiers give specific semantics to the 1654 following subtags:</p> 1655 <table border="1" cellspacing="0" cellpadding="2"> 1656 <tr> 1657 <td> </td> 1658 <td><strong>Name</strong></td> 1659 <td><strong>Comment</strong></td> 1660 <td><strong>ISO 3166-1 status</strong></td> 1661 </tr> 1662 <tr> 1663 <td><code>QO</code></td> 1664 <td>Outlying Oceania</td> 1665 <td>countries in Oceania [009] that do not have a 1666 <a href= 1667 "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/territory_containment_un_m_49.html"> 1668 subcontinent</a>.</td> 1669 <td>private use</td> 1670 </tr> 1671 <tr> 1672 <td><code>QU</code></td> 1673 <td>European Union</td> 1674 <td><strong>deprecated</strong>: the 1675 <em>canonicalized</em> form is EU</td> 1676 <td>private use</td> 1677 </tr> 1678 <tr> 1679 <td><code>UK</code></td> 1680 <td>United Kingdom</td> 1681 <td><strong>deprecated</strong>: the 1682 <em>canonicalized</em> form is GB</td> 1683 <td>exceptionally reserved</td> 1684 </tr> 1685 <tr> 1686 <td><code>XA</code></td> 1687 <td>Pseudo-Accents</td> 1688 <td>special code indicating derived testing locale 1689 with English + added accents and lengthened</td> 1690 <td>private use</td> 1691 </tr> 1692 <tr> 1693 <td><code>XB</code></td> 1694 <td>Pseudo-Bidi</td> 1695 <td>special code indicating derived testing locale 1696 with forced RTL English</td> 1697 <td>private use</td> 1698 </tr> 1699 <tr> 1700 <td><code>XK</code></td> 1701 <td>Kosovo</td> 1702 <td>industry practice</td> 1703 <td>private use</td> 1704 </tr> 1705 <tr> 1706 <td><code>ZZ</code></td> 1707 <td>Unknown or Invalid Territory</td> 1708 <td>used in APIs or as replacement for invalid 1709 code</td> 1710 <td>private use</td> 1711 </tr> 1712 </table> 1713 <p>The private use subtags listed as 1714 <strong>excluded</strong> in <em>Section 3.5.3 <a href= 1715 "#Private_Use_Codes">Private Use Codes</a></em> will normally 1716 never be given specific semantics in Unicode identifiers, 1717 and are thus safe for use for other purposes by other 1718 applications. However, LDML may follow widespread 1719 industry practice in the use of some of these codes, such 1720 as for XK.</p> 1721 <p>The CLDR provides data for normalizing 1722 territory/region codes, including mapping overlong codes 1723 like "eng-840" or "eng-USA" to the correct code 1724 "en-US".</p> 1725 <p>Special Codes:</p> 1726 <ul> 1727 <li>The territory code 'UK' has a special status in 1728 ISO, and is used for the domain name instead of GB. It 1729 is thus recognized by CLDR as being an alternate 1730 (unnormalized) form of 'GB'.</li> 1731 <li>The territory code '001' (the World) is used to 1732 indicate a standardized form, such as "ar-001" for 1733 Modern Standard Arabic.</li> 1734 </ul> 1735 </td> 1736 </tr> 1737 <tr> 1738 <td> 1739 <a href="#unicode_variant_subtag_validity" name= 1740 "unicode_variant_subtag_validity" id= 1741 "unicode_variant_subtag_validity">unicode_variant_subtag</a> 1742 <p>(also known as a <i>Unicode language variant 1743 code)</i></p> 1744 </td> 1745 <td> 1746 Subtags in the variant.xml file (see <em>Section 3.11 1747 <a href="#Validity_Data">Validity Data</a></em> ). These 1748 are based on [<a href="#BCP47">BCP47</a>] subtag values 1749 marked as <b>Type: variant</b> 1750 <p>CLDR provides data for normalizing variant codes. 1751 About handling of the "POSIX" variant see <i>Section 1752 3.8.2, <a href="#Legacy_Variants">Legacy 1753 Variants</a></i>.</p> 1754 </td> 1755 </tr> 1756 </table> 1757 <p><i>Examples:</i></p> 1758 <blockquote> 1759 <pre>en 1760fr_BE 1761zh-Hant-HK</pre> 1762 </blockquote> 1763 <p><em>Deprecated</em> codes—such as QU above—are valid, but 1764 strongly discouraged.</p> 1765 <p>A locale that only has a language subtag (and optionally a 1766 script subtag) is called a <i>language locale</i>; one with 1767 both language and territory subtag is called a <i>territory 1768 locale</i> (or <i>country locale</i>).</p> 1769 <h3><a name="Special_Codes" href="#Special_Codes" id= 1770 "Special_Codes">3.5 Special Codes</a></h3> 1771 <h4><a name="Unknown_or_Invalid_Identifiers" href= 1772 "#Unknown_or_Invalid_Identifiers" id= 1773 "Unknown_or_Invalid_Identifiers">3.5.1 Unknown or Invalid 1774 Identifiers</a></h4> 1775 <p>The following identifiers are used to indicate an unknown or 1776 invalid code in Unicode language and locale identifiers. For 1777 Unicode identifiers, the region code uses a private use ISO 1778 3166 code, and Time Zone code uses an additional code; the 1779 others are defined by the relevant standards. When these codes 1780 are used in APIs connected with Unicode identifiers, the 1781 meaning is that either there was no identifier available, or 1782 that at some point an input identifier value was determined to 1783 be invalid or ill-formed.</p> 1784 <table border="1" cellspacing="0" cellpadding="4" style= 1785 "margin-top: 0.5em; margin-bottom: 0.5em" id="table4"> 1786 <tr> 1787 <th>Code Type</th> 1788 <th>Value</th> 1789 <th>Description in Referenced Standards</th> 1790 </tr> 1791 <tr> 1792 <td>Language</td> 1793 <td><code>und</code></td> 1794 <td>Undetermined language, also used for “root”</td> 1795 </tr> 1796 <tr> 1797 <td>Script</td> 1798 <td><code>Zzzz</code></td> 1799 <td>Code for uncoded script, Unknown [<a href= 1800 "https://www.unicode.org/reports/tr41/#UAX24">UAX24</a>]</td> 1801 </tr> 1802 <tr> 1803 <td>Region </td> 1804 <td><code>ZZ</code></td> 1805 <td>Unknown or Invalid Territory</td> 1806 </tr> 1807 <tr> 1808 <td>Currency</td> 1809 <td><code>XXX</code></td> 1810 <td>The codes assigned for transactions where no currency 1811 is involved</td> 1812 </tr> 1813 <tr> 1814 <td>Time Zone</td> 1815 <td><code>unk</code></td> 1816 <td>Unknown or Invalid Time Zone</td> 1817 </tr> 1818 <tr> 1819 <td>Subdivision</td> 1820 <td><em><region></em>zzzz</td> 1821 <td>Unknown or Invalid Subdivision</td> 1822 </tr> 1823 </table> 1824 <p>When only the script or region are known, then a locale ID 1825 will use "und" as the language subtag portion. Thus the locale 1826 tag "und_Grek" represents the Greek script; "und_US" represents 1827 the US territory.</p> 1828 <h4><a name="Numeric_Codes" href="#Numeric_Codes" id= 1829 "Numeric_Codes">3.5.2 Numeric Codes</a></h4> 1830 <p>For region codes, ISO and the UN establish a mapping to 1831 three-letter codes and numeric codes. However, this does not 1832 extend to the private use codes, which are the codes 900-999 1833 (total: 100), and AAA, QMA-QZZ, XAA-XZZ, and ZZZ (total: 1092). 1834 Unicode identifiers supply a standard mapping to these: for the 1835 numeric codes, it uses the top of the numeric private use 1836 range; for the 3-letter codes it doubles the final letter. 1837 These are the resulting mappings for all of the private use 1838 region codes:</p> 1839 <table border="1" cellspacing="0" cellpadding="4" style= 1840 "margin-top: 0.5em; margin-bottom: 0.5em" id="table19"> 1841 <tr> 1842 <th>Region</th> 1843 <th>UN/ISO Numeric</th> 1844 <th>ISO 3-Letter</th> 1845 </tr> 1846 <tr> 1847 <td><code>AA</code></td> 1848 <td><code>958</code></td> 1849 <td><code>AAA</code></td> 1850 </tr> 1851 <tr> 1852 <td><code>QM..QZ</code></td> 1853 <td><code>959..972</code></td> 1854 <td><code>QMM..QZZ</code></td> 1855 </tr> 1856 <tr> 1857 <td><code>XA..XZ</code></td> 1858 <td><code>973..998</code></td> 1859 <td><code>XAA..XZZ</code></td> 1860 </tr> 1861 <tr> 1862 <td><code>ZZ</code></td> 1863 <td><code>999</code></td> 1864 <td><code>ZZZ</code></td> 1865 </tr> 1866 </table> 1867 <p>For script codes, ISO 15924 supplies a mapping (however, the 1868 numeric codes are not in common use):</p> 1869 <table border="1" cellspacing="0" cellpadding="4" style= 1870 "margin-top: 0.5em; margin-bottom: 0.5em" id="table21"> 1871 <tr> 1872 <th>Script</th> 1873 <th>Numeric</th> 1874 </tr> 1875 <tr> 1876 <td><code>Qaaa..Qabx</code></td> 1877 <td><code>900..949</code></td> 1878 </tr> 1879 </table><br> 1880 <h4>3.5.3 <a name="Private_Use_Codes" href="#Private_Use_Codes" id= 1881 "Private_Use_Codes">Private Use Codes</a></h4> 1882 <p>Private use codes fall into three groups.</p> 1883 <ul> 1884 <li><strong>defined:</strong> those that are given particular 1885 semantics currently in CLDR</li> 1886 <li><strong>reserved:</strong> those that may be given 1887 particular semantics in future versions of CLDR</li> 1888 <li><strong>excluded:</strong> those that will never be given 1889 particular CLDR semantics in the future, and thus can 1890 normally be used by applications without worrying about 1891 collisions. However, CLDR may follow widespread industry 1892 practice in the use of some of these codes, such as for XA, 1893 XB, and XK.</li> 1894 </ul> 1895 <table> 1896 <caption> 1897 <a name="Private_Use_CLDR" href="#Private_Use_CLDR" id= 1898 "Private_Use_CLDR">Private Use Codes in CLDR</a> 1899 </caption> 1900 <tr> 1901 <th>category</th> 1902 <th>status</th> 1903 <th>codes</th> 1904 </tr> 1905 <tr> 1906 <td rowspan="3">base language</td> 1907 <td>defined</td> 1908 <td>none</td> 1909 </tr> 1910 <tr> 1911 <td>reserved</td> 1912 <td>qaa..qfy</td> 1913 </tr> 1914 <tr> 1915 <td>excluded</td> 1916 <td>qfz..qtz</td> 1917 </tr> 1918 <tr> 1919 <td rowspan="3">script</td> 1920 <td>defined</td> 1921 <td>Qaai (obsolete), Qaag</td> 1922 </tr> 1923 <tr> 1924 <td>reserved</td> 1925 <td>Qaaa..Qaaf Qaah Qaaj..Qaap</td> 1926 </tr> 1927 <tr> 1928 <td>excluded</td> 1929 <td>Qaaq..Qabx</td> 1930 </tr> 1931 <tr> 1932 <td rowspan="3">region</td> 1933 <td>defined</td> 1934 <td>QO, QU, UK, XA, XB, XK, ZZ</td> 1935 </tr> 1936 <tr> 1937 <td>reserved</td> 1938 <td>AA QM..QN QP..QT QV..QZ</td> 1939 </tr> 1940 <tr> 1941 <td>excluded</td> 1942 <td>XC..XJ, XL..XZ</td> 1943 </tr> 1944 <tr> 1945 <td rowspan="3">timezone</td> 1946 <td>defined</td> 1947 <td>IANA: Etc/Unknown<br> 1948 bcp47: as listed in bcp47/timezone.xml</td> 1949 </tr> 1950 <tr> 1951 <td>reserved</td> 1952 <td>bcp47: all non-5 letter codes not starting with x</td> 1953 </tr> 1954 <tr> 1955 <td>excluded</td> 1956 <td>bcp47: all non-5 letter codes starting with x</td> 1957 </tr> 1958 </table> 1959 <p>See also <em>Section 3.5.1 <a href= 1960 "#Unknown_or_Invalid_Identifiers">Unknown or Invalid 1961 Identifiers</a></em>.</p> 1962 <h3><a name="Locale_Extension_Key_and_Type_Data" id= 1963 "Locale_Extension_Key_and_Type_Data"></a><a name="u_Extension" 1964 href="#u_Extension" id="u_Extension">3.6 Unicode BCP 47 U 1965 Extension</a></h3> 1966 <p>[<a href="#BCP47">BCP47</a>] Language Tags provides a 1967 mechanism for extending language tags for use in various 1968 applications by extension subtags. Each extension subtag is 1969 identified by a single alphanumeric character subtag assigned 1970 by IANA.</p> 1971 <p>The Unicode Consortium has registered and is the maintaining 1972 authority for two BCP 47 language tag extensions: the extension 1973 'u' for Unicode locale extension [<a href= 1974 "#RFC6067">RFC6067</a>] and extension 't' for transformed 1975 content [<a href="#RFC6497">RFC6497</a>]. The Unicode BCP 47 1976 extension data defines the complete list of valid subtags.</p> 1977 <p>These subtags are all in lowercase (that is the canonical 1978 casing for these subtags), however, subtags are 1979 case-insensitive and casing does not carry any specific 1980 meaning. All subtags within the Unicode extensions are 1981 alphanumeric characters in length of two to eight that meet the 1982 rule <code>extension</code> in the [<a href= 1983 "#BCP47">BCP47</a>]</p> 1984 <p><strong>The -u- Extension.</strong> The syntax of 'u' 1985 extension subtags is defined by the rule 1986 <code>unicode_locale_extensions</code> in <a href= 1987 "#Unicode_locale_identifier">Section 3.2 Unicode locale 1988 identifier</a>, except the separator of subtags 1989 <code>sep</code> must be always hyphen '-' when the extension 1990 is used as a part of BCP 47 language tag.</p> 1991 <p>A 'u' extension may contain multiple <code>attribute</code> 1992 s or <code>keyword</code> s as defined in <a href= 1993 "#Unicode_locale_identifier">Section 3.2 Unicode locale 1994 identifier</a>. The canonical syntax is defined as in <a href="#Canonical_Unicode_Locale_Identifiers">3.2.1 Canonical Unicode Locale Identifiers</a>. </p> 1995 <p><em>See also <a href= 1996 "http://cldr.unicode.org/index/bcp47-extension">Unicode 1997 Extensions for BCP 47</a> on the CLDR site.</em></p> 1998 <h4><a href="#Key_And_Type_Definitions_" name= 1999 "Key_And_Type_Definitions_" id= 2000 "Key_And_Type_Definitions_">3.6.1 Key And Type 2001 Definitions</a></h4> 2002 <p>The following chart contains a set of U extension key values 2003 that are currently available, with a description or sampling of 2004 the U extension type values. Each category is associated with 2005 an XML file in the bcp47 directory.</p> 2006 <p>For the complete list of valid keys and types defined for 2007 Unicode locale extensions, see <a href= 2008 "#Unicode_Locale_Extension_Data_Files">Section 3.6.4 U 2009 Extension Data Files</a>. For information on the process for 2010 adding new <i>key</i>/<i>type</i>, see [<a href= 2011 "#localeProject">LocaleProject</a>].</p> 2012 <p>Most type values are represented by a single subtag in the 2013 current version of CLDR. There are exceptions, such as types 2014 used for key "ca" (calendar) and "kr" (collation reordering). 2015 If the type is not included, then the type value "true" is 2016 assumed. Note that the default for key with a possible "true" 2017 value is often "false", but may not always be. Note also that 2018 "true"/"True" is not a valid script code, since <a href= 2019 "https://www.unicode.org/iso15924/codelists.html">the ISO 15924 2020 Registration Authority has exceptionally reserved it</a>, which 2021 means that it will not be assigned for any purpose.</p> 2022 <p>The BCP 47 form for keys and types is the canonical form, 2023 and recommended. Other aliases are included for backwards 2024 compatibility.</p> 2025 <table> 2026 <caption> 2027 <a name="Key_Type_Definitions" href="#Key_Type_Definitions" 2028 id="Key_Type_Definitions">Key/Type Definitions</a> 2029 </caption> 2030 <tr> 2031 <th>key<br> 2032 (old key name)</th> 2033 <th>key description</th> 2034 <th>example type<br> 2035 (old type name)</th> 2036 <th>type description</th> 2037 </tr> 2038 <tr> 2039 <td colspan="4"><strong>A <a href= 2040 "#UnicodeCalendarIdentifier" name= 2041 "UnicodeCalendarIdentifier" id= 2042 "UnicodeCalendarIdentifier">Unicode Calendar Identifier</a> 2043 defines a type of calendar. The valid values are those 2044 <em>name</em> attribute values in the <em>type</em> 2045 elements of key name="ca" in bcp47/<a target="_blank" href= 2046 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td> 2047 </tr> 2048 <tr> 2049 <td rowspan="10">"ca"<br> 2050 (calendar)</td> 2051 <td rowspan="10">Calendar algorithm<br> 2052 <br> 2053 <i>(For information on the calendar algorithms associated 2054 with the data used with these, see [<a href= 2055 "#Calendars">Calendars</a>].)</i></td> 2056 <td>"buddhist"</td> 2057 <td>Thai Buddhist calendar (same as Gregorian except for 2058 the year)</td> 2059 </tr> 2060 <tr> 2061 <td>"chinese"</td> 2062 <td>Traditional Chinese calendar</td> 2063 </tr> 2064 <tr> 2065 <td colspan="2">…</td> 2066 </tr> 2067 <tr> 2068 <td>"gregory"<br> 2069 (gregorian)</td> 2070 <td>Gregorian calendar</td> 2071 </tr> 2072 <tr> 2073 <td colspan="2">…</td> 2074 </tr> 2075 <tr> 2076 <td>"islamic"</td> 2077 <td>Islamic calendar</td> 2078 </tr> 2079 <tr> 2080 <td>"islamic-civil"</td> 2081 <td>Islamic calendar, tabular (intercalary years 2082 [2,5,7,10,13,16,18,21,24,26,29] - civil epoch)</td> 2083 </tr> 2084 <tr> 2085 <td>"islamic-umalqura"</td> 2086 <td>Islamic calendar, Umm al-Qura</td> 2087 </tr> 2088 <tr> 2089 <td colspan="2">…</td> 2090 </tr> 2091 <tr> 2092 <td colspan="2"><b>Note:</b> <i>Some calendar types are 2093 represented by two subtags. In such cases, the first subtag 2094 specifies a generic calendar type and the second subtag 2095 specifies a calendar algorithm variant. The CLDR uses 2096 generic calendar types (single subtag types) for tagging 2097 data when calendar algorithm variations within a generic 2098 calendar type are irrelevant. For example, type "islamic" 2099 is used for specifying Islamic calendar formatting data for 2100 all Islamic calendar types, including "islamic-civil" and 2101 "islamic-umalqura".</i></td> 2102 </tr> 2103 <tr> 2104 <td colspan="4"><strong>A <a href= 2105 "#UnicodeCurrencyFormatIdentifier" name= 2106 "UnicodeCurrencyFormatIdentifier" id= 2107 "UnicodeCurrencyFormatIdentifier">Unicode Currency Format 2108 Identifier</a> defines a style for currency formatting. The 2109 valid values are those <em>name</em> attribute values in 2110 the <em>type</em> elements of key name="cf" in 2111 bcp47/<a target="_blank" href= 2112 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/currency.xml">currency.xml</a></strong>.</td> 2113 </tr> 2114 <tr> 2115 <td rowspan="2">"cf"</td> 2116 <td rowspan="2">Currency Format style</td> 2117 <td>"standard"</td> 2118 <td>Negative numbers use the minusSign symbol (the 2119 default).</td> 2120 </tr> 2121 <tr> 2122 <td>"account"</td> 2123 <td>Negative numbers use parentheses or equivalent.</td> 2124 </tr> 2125 <tr> 2126 <td colspan="4"><strong>A <a href= 2127 "#UnicodeCollationIdentifier" name= 2128 "UnicodeCollationIdentifier" id= 2129 "UnicodeCollationIdentifier">Unicode Collation 2130 Identifier</a> defines a type of collation (sort order). 2131 The valid values are those <em>name</em> attribute values 2132 in the <em>type</em> elements of bcp47/<a target="_blank" 2133 href= 2134 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/collation.xml">collation.xml</a></strong>.</td> 2135 </tr> 2136 <tr> 2137 <td colspan="4"><i>For information on each collation 2138 setting parameter, from <strong>ka</strong> to 2139 <strong>vt</strong>, see <a href= 2140 "tr35-collation.html#Setting_Options">Setting 2141 Options</a></i></td> 2142 </tr> 2143 <tr> 2144 <td rowspan="9">"co"<br> 2145 (collation)</td> 2146 <td rowspan="9">Collation type</td> 2147 <td>"standard"</td> 2148 <td>The default ordering for each language. For root it is 2149 based on the [<a href="#DUCET">DUCET</a>] (Default Unicode 2150 Collation Element Table): see <em><a href= 2151 "tr35-collation.html#Root_Collation">Root 2152 Collation</a></em>. Each other locale is based on that, 2153 except for appropriate modifications to certain characters 2154 for that language.</td> 2155 </tr> 2156 <tr> 2157 <td>"search"</td> 2158 <td>A special collation type dedicated for string search—it 2159 is not used to determine the relative order of two strings, 2160 but only to determine whether they should be considered 2161 equivalent for the specified strength, using the string 2162 search matching rules appropriate for the language. 2163 Compared to the normal collator for the language, this may 2164 add or remove primary equivalences, may make additional 2165 characters ignorable or change secondary equivalences, and 2166 may modify contractions to allow matching within them, 2167 depending on the desired behavior. For example, in Czech, 2168 the distinction between ‘a’ and ‘á’ is secondary for normal 2169 collation, but primary for search; a search for ‘a’ should 2170 never match ‘á’ and vice versa. A search collator is 2171 normally used with strength set to PRIMARY or SECONDARY 2172 (should be SECONDARY if using “asymmetric” search as 2173 described in the [<a href= 2174 "https://www.unicode.org/reports/tr41/#UTS10">UCA</a>] 2175 section Asymmetric Search). The search collator in root 2176 supplies matching rules that are appropriate for most 2177 languages (and which are different than the root collation 2178 behavior); language-specific search collators may be 2179 provided to override the matching rules for a given 2180 language as necessary.</td> 2181 </tr> 2182 <tr> 2183 <td colspan="2"> 2184 <p>Other keywords provide additional choices for certain 2185 locales; <i>they only have effect in certain 2186 locales.</i></p> 2187 </td> 2188 </tr> 2189 <tr> 2190 <td colspan="2">…</td> 2191 </tr> 2192 <tr> 2193 <td>"phonetic"</td> 2194 <td>Requests a phonetic variant if available, where text is 2195 sorted based on pronunciation. It may interleave different 2196 scripts, if multiple scripts are in common use.</td> 2197 </tr> 2198 <tr> 2199 <td>"pinyin"</td> 2200 <td>Pinyin ordering for Latin and for CJK characters; that 2201 is, an ordering for CJK characters based on a 2202 character-by-character transliteration into a pinyin. (used 2203 in Chinese)</td> 2204 </tr> 2205 <tr> 2206 <td>"reformed"</td> 2207 <td>Reformed collation (such as in Swedish)</td> 2208 </tr> 2209 <tr> 2210 <td>"searchjl"</td> 2211 <td>Special collation type for a modified string search in 2212 which a pattern consisting of a sequence of Hangul initial 2213 consonants (jamo lead consonants) will match a sequence of 2214 Hangul syllable characters whose initial consonants match 2215 the pattern. The jamo lead consonants can be represented 2216 using conjoining or compatibility jamo. This search 2217 collator is best used at SECONDARY strength with an 2218 "asymmetric" search as described in the [<a href= 2219 "https://www.unicode.org/reports/tr41/#UTS10">UCA</a>] 2220 section Asymmetric Search and obtained, for example, using 2221 ICU4C's usearch facility with attribute 2222 USEARCH_ELEMENT_COMPARISON set to value 2223 USEARCH_PATTERN_BASE_WEIGHT_IS_WILDCARD; this ensures that 2224 a full Hangul syllable in the search pattern will only 2225 match the same syllable in the searched text (instead of 2226 matching any syllable with the same initial consonant), 2227 while a Hangul initial consonant in the search pattern will 2228 match any Hangul syllable in the searched text with the 2229 same initial consonant.</td> 2230 </tr> 2231 <tr> 2232 <td colspan="2">…</td> 2233 </tr> 2234 <tr> 2235 <td colspan="4"><strong>A <a href= 2236 "#UnicodeCurrencyIdentifier" name= 2237 "UnicodeCurrencyIdentifier" id= 2238 "UnicodeCurrencyIdentifier">Unicode Currency Identifier</a> 2239 defines a type of currency. The valid values are those 2240 <em>name</em> attribute values in the <em>type</em> 2241 elements of key name="cu" in bcp47/<a target="_blank" href= 2242 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/currency.xml">currency.xml</a>.</strong></td> 2243 </tr> 2244 <tr> 2245 <td>"cu"<br> 2246 (currency)</td> 2247 <td>Currency type</td> 2248 <td> 2249 <i>ISO 4217 code,</i> 2250 <p><i>plus others in common use</i></p> 2251 </td> 2252 <td> 2253 <p>Codes consisting of 3 ASCII letters that are or have 2254 been valid in ISO 4217, plus certain additional codes 2255 that are or have been in common use. The list of 2256 countries and time periods associated with each currency 2257 value is available in <a href= 2258 "tr35-numbers.html#Supplemental_Currency_Data">Supplemental 2259 Currency Data</a>, plus the default number of 2260 decimals.</p> 2261 <p>The XXX code is given a broader interpretation as 2262 <em>Unknown or Invalid Currency</em>.</p> 2263 </td> 2264 </tr> 2265 <tr> 2266 <td colspan="4"><strong>A <a href= 2267 "#UnicodeDictionaryBreakExclusionIdentifier" name= 2268 "UnicodeDictionaryBreakExclusionIdentifier" id= 2269 "UnicodeDictionaryBreakExclusionIdentifier">Unicode Dictionary Break Exclusion Identifier</a> 2270 specifies scripts to be excluded from dictionary-based text break (for words and lines). 2271 The valid values are of one or more items of type SCRIPT_CODE as specified in the 2272 <em>name</em> attribute value in the <em>type</em> 2273 element of key name="dx" in bcp47/<a target="_blank" href= 2274 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a>.</strong></td> 2275 </tr> 2276 <tr> 2277 <td>"dx"</td> 2278 <td>Dictionary break script exclusions</td> 2279 <td> 2280 <i><code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> values</i> 2281 </td> 2282 <td> 2283 <p>One or more items of type SCRIPT_CODE, which are valid 2284 <code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> values.</p> 2285 <p>The code Zyyy (Common) can be specified to exclude all scripts, in which case 2286 it should be the only SCRIPT_CODE value specified.</p> 2287 </td> 2288 </tr> 2289 <tr> 2290 <td colspan="4"><strong>A <a href= 2291 "#UnicodeEmojiPresentationStyleIdentifier" name= 2292 "UnicodeEmojiPresentationStyleIdentifier" id= 2293 "UnicodeEmojiPresentationStyleIdentifier">Unicode Emoji 2294 Presentation Style Identifier</a> specifies a request for 2295 the preferred emoji presentation style. This can be used as 2296 part of the value for an HTML lang attribute, for example 2297 <code><html lang="sr-Latn-u-em-emoji"></code>. The 2298 valid values are those <em>name</em> attribute values in 2299 the <em>type</em> elements of key name="em" in 2300 bcp47/<a target="_blank" href= 2301 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/variant.xml">variant.xml</a></strong>.</td> 2302 </tr> 2303 <tr> 2304 <td rowspan="3">"em"</td> 2305 <td rowspan="3">Emoji presentation style</td> 2306 <td>"emoji"</td> 2307 <td>Use an emoji presentation for emoji characters if 2308 possible.</td> 2309 </tr> 2310 <tr> 2311 <td>"text"</td> 2312 <td>Use a text presentation for emoji characters if 2313 possible.</td> 2314 </tr> 2315 <tr> 2316 <td>"default"</td> 2317 <td>Use the default presentation for emoji characters as 2318 specified in UTR #51 Section 4, <a href= 2319 "https://www.unicode.org/reports/tr51/#Presentation_Style">Presentation 2320 Style</a>.</td> 2321 </tr> 2322 <tr> 2323 <td colspan="4"><strong>A <a href= 2324 "#UnicodeFirstDayIdentifier" name= 2325 "UnicodeFirstDayIdentifier" id= 2326 "UnicodeFirstDayIdentifier">Unicode First Day 2327 Identifier</a> defines the preferred first day of the week 2328 for calendar display. Specifying "fw" in a locale 2329 identifier overrides the default value specified by 2330 supplemental week data (see Part 4 Dates, section 4.3 2331 <a href="tr35-dates.html#Week_Data">Week Data</a>). The 2332 valid values are those <em>name</em> attribute values in 2333 the <em>type</em> elements of key name="fw" in 2334 bcp47/<a target="_blank" href= 2335 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td> 2336 </tr> 2337 <tr> 2338 <td rowspan="4">"fw"</td> 2339 <td rowspan="4">First day of week</td> 2340 <td>"sun"</td> 2341 <td>Sunday</td> 2342 </tr> 2343 <tr> 2344 <td>"mon"</td> 2345 <td>Monday</td> 2346 </tr> 2347 <tr> 2348 <td colspan="2">…</td> 2349 </tr> 2350 <tr> 2351 <td>"sat"</td> 2352 <td>Saturday</td> 2353 </tr> 2354 <tr> 2355 <td colspan="4"><strong>A <a href= 2356 "#UnicodeHourCycleIdentifier" name= 2357 "UnicodeHourCycleIdentifier" id= 2358 "UnicodeHourCycleIdentifier">Unicode Hour Cycle 2359 Identifier</a> defines the preferred time cycle. Specifying 2360 "hc" in a locale identifier overrides the default value 2361 specified by supplemental time data (see Part 4 Dates, 2362 section 4.4 <a href="tr35-dates.html#Time_Data">Time 2363 Data</a>). The valid values are those <em>name</em> 2364 attribute values in the <em>type</em> elements of key 2365 name="hc" in bcp47/<a target="_blank" href= 2366 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/calendar.xml">calendar.xml</a></strong>.</td> 2367 </tr> 2368 <tr> 2369 <td rowspan="4">"hc"</td> 2370 <td rowspan="4">Hour cycle</td> 2371 <td>"h12"</td> 2372 <td>Hour system using 1–12; corresponds to 'h' in 2373 patterns</td> 2374 </tr> 2375 <tr> 2376 <td>"h23"</td> 2377 <td>Hour system using 0–23; corresponds to 'H' in 2378 patterns</td> 2379 </tr> 2380 <tr> 2381 <td>"h11"</td> 2382 <td>Hour system using 0–11; corresponds to 'K' in 2383 patterns</td> 2384 </tr> 2385 <tr> 2386 <td>"h24"</td> 2387 <td>Hour system using 1–24; corresponds to 'k' in 2388 pattern</td> 2389 </tr> 2390 <tr> 2391 <td colspan="4"><strong>A <a href= 2392 "#UnicodeLineBreakStyleIdentifier" name= 2393 "UnicodeLineBreakStyleIdentifier" id= 2394 "UnicodeLineBreakStyleIdentifier">Unicode Line Break Style 2395 Identifier</a> defines a preferred line break style 2396 corresponding to the CSS level 3 <a href= 2397 "https://drafts.csswg.org/css-text/#line-break-property">line-break 2398 option</a>. Specifying "lb" in a locale identifier 2399 overrides the locale‘s default style (which may correspond 2400 to "normal" or "strict"). The valid values are those 2401 <em>name</em> attribute values in the <em>type</em> 2402 elements of key name="lb" in bcp47/<a target="_blank" href= 2403 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td> 2404 </tr> 2405 <tr> 2406 <td rowspan="3">"lb"</td> 2407 <td rowspan="3">Line break style</td> 2408 <td>"strict"</td> 2409 <td>CSS level 3 line-break=strict, e.g. treat CJ as NS</td> 2410 </tr> 2411 <tr> 2412 <td>"normal"</td> 2413 <td>CSS level 3 line-break=normal, e.g. treat CJ as ID, 2414 break before hyphens for ja,zh</td> 2415 </tr> 2416 <tr> 2417 <td>"loose"</td> 2418 <td>CSS lev 3 line-break=loose</td> 2419 </tr> 2420 <tr> 2421 <td colspan="4"><strong>A <a href= 2422 "#UnicodeLineBreakWordIdentifier" name= 2423 "UnicodeLineBreakWordIdentifier" id= 2424 "UnicodeLineBreakWordIdentifier">Unicode Line Break Word 2425 Identifier</a> defines preferred line break word handling 2426 behavior corresponding to the CSS level 3 <a href= 2427 "https://drafts.csswg.org/css-text/#word-break-property">word-break 2428 option</a>. The valid values are those <em>name</em> 2429 attribute values in the <em>type</em> elements of key 2430 name="lw" in bcp47/<a target="_blank" href= 2431 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td> 2432 </tr> 2433 <tr> 2434 <td rowspan="3">"lw"</td> 2435 <td rowspan="3">Line break word handling</td> 2436 <td>"normal"</td> 2437 <td>CSS level 3 word-break=normal, normal script/language 2438 behavior for midword breaks</td> 2439 </tr> 2440 <tr> 2441 <td>"breakall"</td> 2442 <td>CSS level 3 word-break=break-all, allow midword breaks 2443 unless forbidden by lb setting</td> 2444 </tr> 2445 <tr> 2446 <td>"keepall"</td> 2447 <td>CSS level 3 word-break=keep-all, prohibit midword 2448 breaks except for dictionary breaks</td> 2449 </tr> 2450 <tr> 2451 <td colspan="4"><strong>A <a href= 2452 "#UnicodeMeasurementSystemIdentifier" name= 2453 "UnicodeMeasurementSystemIdentifier" id= 2454 "UnicodeMeasurementSystemIdentifier">Unicode Measurement 2455 System Identifier</a> defines a preferred measurement 2456 system. Specifying "ms" in a locale identifier overrides 2457 the default value specified by supplemental measurement 2458 system data (see Part 2 General, section 5 <a href= 2459 "tr35-general.html#Measurement_System_Data">Measurement 2460 System Data</a>). The valid values are those <em>name</em> 2461 attribute values in the <em>type</em> elements of key 2462 name="ms" in bcp47/<a target="_blank" href= 2463 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/measure.xml">measure.xml</a></strong>.</td> 2464 </tr> 2465 <tr> 2466 <td rowspan="3">"ms"</td> 2467 <td rowspan="3">Measurement system</td> 2468 <td>"metric"</td> 2469 <td>Metric System</td> 2470 </tr> 2471 <tr> 2472 <td>"ussystem"</td> 2473 <td>US System of measurement: feet, pints, etc.; pints are 2474 16oz</td> 2475 </tr> 2476 <tr> 2477 <td>"uksystem"</td> 2478 <td>UK System of measurement: feet, pints, etc.; pints are 2479 20oz</td> 2480 </tr> 2481 <tr> 2482 <td colspan="4"><strong>A <a href= 2483 "#UnicodeNumberSystemIdentifier" name= 2484 "UnicodeNumberSystemIdentifier" id= 2485 "UnicodeNumberSystemIdentifier">Unicode Number System 2486 Identifier</a> defines a type of number system. The valid 2487 values are those <em>name</em> attribute values in the 2488 <em>type</em> elements of bcp47/<a target="_blank" href= 2489 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/number.xml">number.xml</a>.</strong></td> 2490 </tr> 2491 <tr> 2492 <td rowspan="7">"nu"<br> 2493 (numbers)</td> 2494 <td rowspan="7">Numbering system</td> 2495 <td><i>Unicode script subtag</i></td> 2496 <td> 2497 <p>Four-letter types indicating the primary numbering 2498 system for the corresponding script represented in 2499 Unicode. Unless otherwise specified, it is a decimal 2500 numbering system using digits [:GeneralCategory=Nd:]. For 2501 example, "latn" refers to the ASCII / Western digits 0-9, 2502 while "taml" is an algorithmic (non-decimal) numbering 2503 system. (The code "tamldec" is indicates the "modern 2504 Tamil decimal digits".)<br></p> 2505 <p class="note">For more information, see <a href= 2506 "tr35-numbers.html#Numbering_Systems">Numbering 2507 Systems</a>.</p> 2508 </td> 2509 </tr> 2510 <tr> 2511 <td>"arabext"</td> 2512 <td>Extended Arabic-Indic digits ("arab" means the base 2513 Arabic-Indic digits)</td> 2514 </tr> 2515 <tr> 2516 <td>"armnlow"</td> 2517 <td>Armenian lowercase numerals</td> 2518 </tr> 2519 <tr> 2520 <td colspan="2">…</td> 2521 </tr> 2522 <tr> 2523 <td>"roman"</td> 2524 <td>Roman numerals</td> 2525 </tr> 2526 <tr> 2527 <td>"romanlow"</td> 2528 <td>Roman lowercase numerals</td> 2529 </tr> 2530 <tr> 2531 <td>"tamldec"</td> 2532 <td>Modern Tamil decimal digits</td> 2533 </tr> 2534 <tr> 2535 <td colspan="4"><strong>A <a href="#RegionOverride" name= 2536 "RegionOverride" id="RegionOverride">Region Override</a> 2537 specifies an alternate region to use for obtaining certain 2538 region-specific default values (those specified by the 2539 <a href="tr35-info.html#rgScope"><rgScope></a> 2540 element), instead of using the region specified by the 2541 <a href="#unicode_region_subtag">unicode_region_subtag</a> 2542 in the Unicode Language Identifier (or inferred from the 2543 <a href= 2544 "#unicode_language_subtag">unicode_language_subtag</a>).</strong></td> 2545 </tr> 2546 <tr> 2547 <td rowspan="2">"rg"</td> 2548 <td rowspan="2">Region Override</td> 2549 <td>"uszzzz"<br> 2550 <br></td> 2551 <td rowspan="2">The value is a <a 2552 href= "#unicode_subdivision_id">unicode_subdivision_id</a> 2553 of type “unknown” or “regular”; this consists of a 2554 <a href= 2555 "#unicode_region_subtag">unicode_region_subtag</a> for a 2556 regular region (not a macroregion), suffixed 2557 either by “zzzz” (case is not 2558 significant) to designate the region 2559 as a whole, or by a unicode_subdivision_suffix to provide 2560 more specificity. For example, “en-GB-u-rg-uszzzz” 2561 represents a locale for British English but with 2562 region-specific defaults set to US for items such as 2563 default currency, default calendar and week data, default 2564 time cycle, and default measurement system and unit 2565 preferences.</td> 2566 </tr> 2567 <tr> 2568 <td>…</td> 2569 </tr> 2570 <tr> 2571 <td colspan="4"><strong>A <a name= 2572 "unicode_subdivision_subtag_validity" id= 2573 "unicode_subdivision_subtag_validity"></a><a href= 2574 "#UnicodeSubdivisionIdentifier" name= 2575 "UnicodeSubdivisionIdentifier" id= 2576 "UnicodeSubdivisionIdentifier">Unicode Subdivision 2577 Identifier</a> defines a regional subdivision used for 2578 locales. The valid values are based on the 2579 <em>subdivisionContainment</em> element as described in 2580 <em>Section <a href="#Unicode_Subdivision_Codes">3.6.5 2581 Subdivision Codes</a></em>.</strong></td> 2582 </tr> 2583 <tr> 2584 <td rowspan="2">"sd"</td> 2585 <td rowspan="2">Regional Subdivision</td> 2586 <td>"gbsct"<br> 2587 <br></td> 2588 <td rowspan="2">A <a href= 2589 "#unicode_subdivision_id">unicode_subdivision_id</a>, which 2590 is a <a href= 2591 "#unicode_region_subtag">unicode_region_subtag</a> 2592 concatenated with a unicode_subdivision_suffix.<br> 2593 For example, <em>gbsct</em> is “gb”+“sct” (where sct 2594 represents the subdivision code for Scotland). Thus 2595 “en-GB-u-sd-gbsct” represents the language variant “English 2596 as used in Scotland”. And both “en-u-sd-usca” and 2597 “en-US-u-sd-usca” represent “English as used in 2598 California”. See <strong><em><a href= 2599 "#Unicode_Subdivision_Codes">3.6.5 Subdivision 2600 Codes</a></em></strong>.</td> 2601 </tr> 2602 <tr> 2603 <td>…</td> 2604 </tr> 2605 <tr> 2606 <td colspan="4"><strong>A <a href= 2607 "#UnicodeSentenceBreakSuppressionsIdentifier" name= 2608 "UnicodeSentenceBreakSuppressionsIdentifier" id= 2609 "UnicodeSentenceBreakSuppressionsIdentifier">Unicode 2610 Sentence Break Suppressions Identifier</a> defines a set of 2611 data to be used for suppressing certain sentence breaks 2612 that would otherwise be found by UAX #14 rules. The valid 2613 values are those <em>name</em> attribute values in the 2614 <em>type</em> elements of key name="ss" in bcp47/<a target= 2615 "_blank" href= 2616 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/segmentation.xml">segmentation.xml</a></strong>.</td> 2617 </tr> 2618 <tr> 2619 <td rowspan="2">"ss"</td> 2620 <td rowspan="2">Sentence break suppressions</td> 2621 <td>"none"</td> 2622 <td>Don’t use sentence break suppressions data (the 2623 default).</td> 2624 </tr> 2625 <tr> 2626 <td>"standard"</td> 2627 <td>Use sentence break suppressions data of type 2628 "standard"</td> 2629 </tr> 2630 <tr> 2631 <td colspan="4"><strong>A <a href= 2632 "#UnicodeTimezoneIdentifier" name= 2633 "UnicodeTimezoneIdentifier" id= 2634 "UnicodeTimezoneIdentifier">Unicode Timezone Identifier</a> 2635 defines a timezone. The valid values are those name 2636 attribute values in the <em>type</em> elements of 2637 bcp47/<a target="_blank" href= 2638 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/timezone.xml">timezone.xml</a>.</strong></td> 2639 </tr> 2640 <tr> 2641 <td>"tz"<br> 2642 (timezone)</td> 2643 <td>Time zone</td> 2644 <td><i>Unicode short time zone IDs</i></td> 2645 <td> 2646 <p>Short identifiers defined in terms of a TZ time zone 2647 database [<a href="#Olson">Olson</a>] identifier in the 2648 file common/bcp47/timezone.xml file, plus a few extra 2649 values.</p> 2650 <p>For more information, see <a href= 2651 "#Time_Zone_Identifiers">Section 3.7.1.2 Time Zone 2652 Identifiers</a>.</p> 2653 <p>CLDR provides data for normalizing timezone codes.</p> 2654 </td> 2655 </tr> 2656 <tr> 2657 <td colspan="4"><strong>A <a href= 2658 "#UnicodeVariantIdentifier" name="UnicodeVariantIdentifier" 2659 id="UnicodeVariantIdentifier">Unicode Variant 2660 Identifier</a> defines a special variant used for locales. 2661 The valid values are those name attribute values in the 2662 <em>type</em> elements of bcp47/<a target="_blank" href= 2663 "https://github.com/unicode-org/cldr/tree/latest/common/bcp47/variant.xml">variant.xml</a>.</strong></td> 2664 </tr> 2665 <tr> 2666 <td>"va"</td> 2667 <td>Common variant type</td> 2668 <td>"posix"</td> 2669 <td>POSIX style locale variant. About handling of the 2670 "POSIX" variant see <i>Section 3.8.2, <a href= 2671 "#Legacy_Variants">Legacy Variants</a></i>.</td> 2672 </tr> 2673 </table> 2674 <p>For more information on the allowed keys and types, see the 2675 specific elements below, and <a href= 2676 "#Unicode_Locale_Extension_Data_Files">Section 3.6.4 U 2677 Extension Data Files</a>.</p> 2678 <p>Additional keys or types might be added in future versions. 2679 Implementations of LDML should be robust to handle any 2680 syntactically valid key or type values.</p> 2681 <h4><a href="#Numbering%20System%20Data" name= 2682 "Numbering System Data">3.6.2 Numbering System Data</a></h4> 2683 <p>LDML supports multiple numbering systems. The identifiers 2684 for those numbering systems are defined in the file 2685 <strong>bcp47/number.xml</strong>. For example, for the 'trunk' 2686 version of the data see <a href= 2687 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/number.xml"> 2688 bcp47/number.xml</a>.<br></p> 2689 <p>Details about those numbering systems are defined in 2690 <strong>supplemental/numberingSystems.xml</strong>. For 2691 example, for the 'trunk' version of the data see <a href= 2692 "https://github.com/unicode-org/cldr/releases/tag/latest/common/supplemental/numberingSystems.xml"> 2693 supplemental/numberingSystems.xml</a>.<br></p> 2694 <p>LDML makes certain stability guarantees on this 2695 data: <br></p> 2696 <ol> 2697 <li>Like other BCP 47 identifiers, once a numeric identifier 2698 is added to <strong>bcp47/number.xml</strong> or 2699 <strong>numberingSystems.xml</strong>, it will never be 2700 removed from either of those files.</li> 2701 <li>If an identifier has type="numeric" in 2702 numberingSystems.xml, then 2703 <ol> 2704 <li>It is a decimal, positional numbering system with an 2705 attribute digits=X, where X is a string with the 10 2706 digits in order used by the numbering system.</li> 2707 <li>The values of the type and digits will never 2708 change.</li> 2709 </ol> 2710 </li> 2711 </ol> 2712 <h4><a href="#Time_Zone_Identifiers" name= 2713 "Time_Zone_Identifiers" id="Time_Zone_Identifiers">3.6.3 Time 2714 Zone Identifiers</a></h4> 2715 <p>LDML inherits time zone IDs from the tz database [<a href= 2716 "#Olson">Olson</a>]. Because these IDs from the tz database do 2717 not satisfy the BCP 47 language subtag syntax requirements, 2718 CLDR defines short identifiers for the use in the Unicode 2719 locale extension. The short identifiers are defined in the file 2720 <strong>common/bcp47/timezone.xml</strong>.</p> 2721 <p>The short identifiers use UN/LOCODE [<a href= 2722 "#LOCODE">LOCODE</a>] (excluding a space character) codes where 2723 possible. For example, the short identifier for 2724 "America/Los_Angeles" is "uslax" (the LOCODE for Los Angeles, 2725 US is "US LAX"). Identifiers of length not equal to 5 are used 2726 where there is no corresponding UN/LOCODE, such as "usnavajo" 2727 for "America/Shiprock", or "utcw01" for "Etc/GMT+1", so that 2728 they do not overlap with future UN/LOCODE.</p> 2729 <p>Although the first two letters of a short identifier may 2730 match an ISO 3166 two-letter country code, a user should not 2731 assume that the time zone belongs to the country. The first two 2732 letters in an identifier of length not equal to 5 has no 2733 meaning. Also, the identifiers are stabilized, meaning that 2734 they will not change no matter what changes happen in the base 2735 standard. So if Hawaii leaves the US and joins Canada as a new 2736 province, the short time zone identifier "ushnl" would not 2737 change in CLDR even if the UN/LOCODE changes to "cahnl" or 2738 something else.</p> 2739 <p>There is a special code "unk" for an Unknown or Invalid time 2740 zone. This can be expressed in the tz database style ID 2741 "Etc/Unknown", although it is not defined in the tz 2742 database.</p> 2743 <p><b>Stability of Time Zone Identifiers</b></p> 2744 <p>Although the short time zone identifiers are guaranteed to 2745 be stable, the preferred IDs in the tz database (as those found 2746 in <strong>zone.tab</strong> file) might be changed time to 2747 time. For example, "Asia/Culcutta" was replaced with 2748 "Asia/Kolkata" and moved to <strong>backward</strong> file in 2749 the tz database. CLDR contains locale data using a time zone ID 2750 from the tz database as the key, stability of the IDs is 2751 cirtical.</p> 2752 <p>To maintain the stability of "long" IDs (for those inherited 2753 from the tz database), a special rule applied to the 2754 <i>alias</i> attribute in the <type> element for "tz" - 2755 the first "long" ID is the CLDR canonical "long" time zone 2756 ID.</p> 2757 <p>For example:</p> 2758 <blockquote> 2759 <type name="inccu" alias="Asia/Calcutta Asia/Kolkata" 2760 description="Kolkata, India"/> 2761 </blockquote> 2762 <p>Above <type> element defines the short time zone ID 2763 "inccu" (for the use in the Unicode locale extension), 2764 corresponding <em>CLDR canonical "long" ID</em> 2765 "Asia/Culcutta", and an alias "Asia/Kolkata".</p> 2766 <h4><a href="#Unicode_Locale_Extension_Data_Files" name= 2767 "Unicode_Locale_Extension_Data_Files" id= 2768 "Unicode_Locale_Extension_Data_Files">3.6.4 U Extension Data 2769 Files</a></h4> 2770 <p>The 'u' extension data is stored in multiple XML files 2771 located under common/bcp47 directory in CLDR. Each file 2772 contains the locale extension key/type values and their 2773 backward compatibility mappings appropriate for a particular 2774 domain. <a href= 2775 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/collation.xml"> 2776 common/bcp47/collation.xml</a> contains key/type values for 2777 collation, including optional collation parameters and valid 2778 type values for each key.</p> 2779 <p>The 't' extension data is stored in <a href= 2780 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml"> 2781 common/bcp47/transform.xml</a>.</p> 2782 <p class="dtd"><!ELEMENT keyword ( key* )></p> 2783 <p class="dtd"><!ELEMENT key ( type* )><br> 2784 <!ATTLIST key extension NMTOKEN #IMPLIED><br> 2785 <!ATTLIST key name NMTOKEN #REQUIRED><br> 2786 <!ATTLIST key description CDATA #IMPLIED><br> 2787 <!ATTLIST key deprecated ( true | false ) "false"><br> 2788 <!ATTLIST key preferred NMTOKEN #IMPLIED><br> 2789 <!ATTLIST key alias NMTOKEN #IMPLIED><br> 2790 <!ATTLIST key valueType (single | multiple | incremental | 2791 any) #IMPLIED ><br> 2792 <!ATTLIST key since CDATA #IMPLIED></p> 2793 <p class="dtd"><!ELEMENT type EMPTY><br> 2794 <!ATTLIST type name NMTOKEN #REQUIRED><br> 2795 <!ATTLIST type description CDATA #IMPLIED><br> 2796 <!ATTLIST type deprecated ( true | false ) "false"><br> 2797 <!ATTLIST type preferred NMTOKEN #IMPLIED><br> 2798 <!ATTLIST type alias CDATA #IMPLIED><br> 2799 <!ATTLIST type since CDATA #IMPLIED></p> 2800 <p class="dtd"><!ELEMENT attribute EMPTY><br> 2801 <!ATTLIST attribute name NMTOKEN #REQUIRED><br> 2802 <!ATTLIST attribute description CDATA #IMPLIED><br> 2803 <!ATTLIST attribute deprecated ( true | false ) 2804 "false"><br> 2805 <!ATTLIST attribute preferred NMTOKEN #IMPLIED><br> 2806 <!ATTLIST attribute since CDATA #IMPLIED></p> 2807 <p>The extension attribute in <key> element specifies the 2808 BCP 47 language tag extension type. The default value of the 2809 extension attribute is "u" (Unicode locale extension). The 2810 <type> element is only applicable to the enclosing 2811 <key>.</p> 2812 <p>In the Unicode locale extension 'u' and 't' data files, the 2813 common attributes for the <key>, <type> and 2814 <attribute> elements are as follows:</p> 2815 <dl> 2816 <dt><b>name</b></dt> 2817 <dd> 2818 <p>The key or type name used by Unicode locale extension 2819 with <a href="#Unicode_locale_identifier">'u' extension 2820 syntax</a> or the 't' extensions syntax. When <i>alias</i> 2821 below is absent, this name can be also used with the old 2822 style <a href="#Old_Locale_Extension_Syntax">"@key=type" 2823 syntax</a>.</p> 2824 <p>Most type names are <strong>literal type names</strong>, 2825 which match exactly the same value. All of these have at 2826 least one lowercase letter, such as "buddhist". There are a 2827 small number of <strong>indirect type names</strong>, such 2828 as "RG_KEY_VALUE". These have no lowercase letters. The 2829 interpretation of each one is listed below.</p> 2830 <h5><a name="CODEPOINTS" href="#CODEPOINTS" id= 2831 "CODEPOINTS">CODEPOINTS</a></h5> 2832 <p>The type name <strong>"CODEPOINTS"</strong> is reserved 2833 for a variable representing Unicode code point(s). The 2834 syntax is:</p> 2835 <table border="0"> 2836 <tr> 2837 <th> </th> 2838 <th> 2839 <div align="center"> 2840 EBNF 2841 </div> 2842 </th> 2843 </tr> 2844 <tr> 2845 <td> 2846 <pre>codepoints</pre> 2847 </td> 2848 <td> 2849 <pre>= codepoint (sep codepoint)?</pre> 2850 </td> 2851 </tr> 2852 <tr> 2853 <td> 2854 <pre>codepoint</pre> 2855 </td> 2856 <td> 2857 <pre>= [0-9 A-F a-f]{4,6}</pre> 2858 </td> 2859 </tr> 2860 </table> 2861 <p>In addition, no codepoint may exceed 10FFFF. For 2862 example, "00A0", "300b", "10D40C" and "00C1-00E1" are 2863 valid, but "A0", "U060C" and "110000" are not.</p> 2864 <p>In the current version of CLDR, the type "CODEPOINTS" is 2865 only used for the deprecated locale extension key "vt" 2866 (variableTop). The subtags forming the type for "vt" 2867 represent an arbitrary string of characters. There is no 2868 formal limit in the number of characters, although 2869 practically anything above 1 will be rare, and anything 2870 longer than 4 might be useless. Repetition is allowed, for 2871 example, 0061-0061 ("aa") is a Valid type value for "vt", 2872 since the sequence may be a collating element. Order is 2873 vital: 0061-0062 ("ab") is different than 0062-0061 ("ba"). 2874 Note that for variableTop any character sequence must be a 2875 contraction which yields exactly one primary weight.</p> 2876 <p>For example,</p> 2877 <blockquote> 2878 <p><strong>en-u-vt-00A4</strong> : this indicates 2879 English, with any characters sorting at or below " ¤" (at 2880 a primary level) considered Variable.</p> 2881 </blockquote> 2882 <p>By default in UCA, variable characters are ignored in 2883 sorting at a primary, secondary, and tertiary level. But in 2884 CLDR, they are not ignorable by default. For more 2885 information, see <a href= 2886 "tr35-collation.html#Setting_Options">Collation: Section 2887 3.3 <em>Setting Options</em></a> .</p> 2888 <h5><a name="REORDER_CODE" href="#REORDER_CODE" id= 2889 "REORDER_CODE">REORDER_CODE</a></h5> 2890 <p>The type name <strong>"REORDER_CODE"</strong> is 2891 reserved for reordering block names (e.g. "latn", "digit" 2892 and "others") defined in the <i><a href= 2893 "tr35-collation.html#Root_Collation">Root 2894 Collation</a></i>. The type "REORDER_CODE" is used for 2895 locale extension key "kr" (colReorder). The value of type 2896 for "kr" is represented by one or more reordering block 2897 names such as "latn-digit". For more information, see 2898 <a href="tr35-collation.html#Script_Reordering">Collation: 2899 Section 3.12 <em>Collation Reordering</em></a> .</p> 2900 <h5><a name="RG_KEY_VALUE" href="#RG_KEY_VALUE" id= 2901 "RG_KEY_VALUE">RG_KEY_VALUE</a></h5> 2902 <p>The type name <strong>"RG_KEY_VALUE"</strong> is 2903 reserved for region codes in the format required by the 2904 "rg" key; this is a subdivision 2905 code with idStatus='unknown' or 'regular' from the 2906 idValidity data in common/validity/subdivision.xml.</p> 2907 <h5><a name="SCRIPT_CODE" href="#SCRIPT_CODE" id= 2908 "SCRIPT_CODE">SCRIPT_CODE</a></h5> 2909 <p>The type name <strong>"SCRIPT_CODE"</strong> is 2910 reserved for <code><a href="#unicode_script_subtag">unicode_script_subtag</a></code> 2911 values (e.g. "thai", "laoo"). 2912 The type "SCRIPT_CODE" is used for locale extension key "dx". 2913 The value of type for "dx" is represented by one or more SCRIPT_CODEs, 2914 such as "thai-laoo".</p> 2915 <h5><a name="SUBDIVISION_CODE" href="#SUBDIVISION_CODE" id= 2916 "SUBDIVISION_CODE">SUBDIVISION_CODE</a></h5> 2917 <p>The type name <strong>"SUBDIVISION_CODE"</strong> is 2918 reserved for subdivision codes in the format required by 2919 the "sd" key; this is a subdivision code from the 2920 idValidity data in common/validity/subdivision.xml, 2921 excluding those with idStatus='unknown'. Codes with 2922 idStatus='deprecated' should not be generated, and those 2923 with idStatus='private_use' are only to be used with prior 2924 agreement.</p> 2925 <h5><a name="PRIVATE_USE" href="#PRIVATE_USE" id= 2926 "PRIVATE_USE">PRIVATE_USE</a></h5> 2927 <p>The type name <strong>"PRIVATE_USE"</strong> is reserved 2928 for private use types. A valid type value is composed of 2929 one or more subtags separated by hyphens and each subtag 2930 consists of three to eight ASCII alphanumeric characters. 2931 In the current version of CLDR, 2932 <strong>"PRIVATE_USE"</strong> is only used for transform 2933 extension "x0".</p> 2934 </dd> 2935 <dt><b>valueType</b></dt> 2936 <dd> 2937 <p>The valueType attribute indicates how many subtags are 2938 valid for a given key:</p> 2939 <table class='simple' width="100%" border="1"> 2940 <tbody> 2941 <tr> 2942 <th>single</th> 2943 <td>Either exactly one type value, or no type value 2944 (but only if the value of "true" would be valid). 2945 This is the default if no valueType attribute is 2946 present.</td> 2947 </tr> 2948 <tr> 2949 <th>incremental</th> 2950 <td>Multiple type values are allowed, but only if a 2951 prefix is also present, and the sequence is 2952 explicitly listed. Each successive type value 2953 indicates a refinement of its prefix. For 2954 example:<br> 2955 <key name="ca" description="Calendar algorithm 2956 key" <strong>valueType="incremental"</strong>><br> 2957 <type name="islamic" 2958 description="Islamic calendar"/><br> 2959 <type name="islamic-umalqura" 2960 description="Islamic calendar, Umm al-Qura"/><br> 2961 Thus <em>ca-islamic-umalqura</em> is valid. However, 2962 <em>ca-gregory-japanese</em> is not valid, because 2963 "gregory-japanese" is not listed as a type.</td> 2964 </tr> 2965 <tr> 2966 <th>multiple</th> 2967 <td>Multiple type values are allowed, but each may 2968 only occur once. For example:<br> 2969 <key name="kr" description="Collation reorder 2970 codes" <strong>valueType="multiple"</strong>><br> 2971 <type name="REORDER_CODE" …/></td> 2972 </tr> 2973 <tr> 2974 <th>any</th> 2975 <td>Any number of type values are allowed, with none 2976 of the above restrictions. For example:<br> 2977 <key extension="t" name="x0" description="Private 2978 use transform type key." 2979 <strong>valueType="any"</strong>><br> 2980 <type name="PRIVATE_USE" …/></td> 2981 </tr> 2982 </tbody> 2983 </table> 2984 </dd> 2985 <dt><b>description</b></dt> 2986 <dd> 2987 <p>The description of the key, type or attribute element. 2988 There is also some informative text about certain keys and 2989 types in the Section 3.5 <a href= 2990 "#Key_And_Type_Definitions_">Key And Type 2991 Definitions</a>.</p> 2992 </dd> 2993 <dt><b>deprecated</b></dt> 2994 <dd> 2995 <p>The deprecation status of the key, type or attribute 2996 element. The value "true" indicates the element is 2997 deprecated and no longer used in the version of CLDR. The 2998 default value is "false".</p> 2999 </dd> 3000 <dt><b>preferred</b></dt> 3001 <dd> 3002 <p>The preferred value of the deprecated key, type or 3003 attribute element. When a key, type or attribute element is 3004 deprecated, this attribute is used for specifying a new 3005 canonical form if available.</p> 3006 </dd> 3007 <dt><b>alias</b> (Not applicable to <attribute>)</dt> 3008 <dd> 3009 <p>The BCP 47 form is the canonical form, and recommended. 3010 Other aliases are included only for backwards 3011 compatibility.</p> 3012 </dd> 3013 <dd><em>Example:</em></dd> 3014 <dd> 3015 <p><type name="phonebk" 3016 <strong>alias="phonebook"</strong> description="Phonebook 3017 style ordering (such as in German)"/><br></p>The 3018 preferred term, and the only one to be used in BCP 47, is 3019 the name: in this example, "phonebk".<br> 3020 </dd> 3021 <dd> 3022 <p>The alias is a key or type name used by Unicode locale 3023 extensions with the old <a href= 3024 "#Old_Locale_Extension_Syntax">"@key=type" syntax</a>. The 3025 attribute value for type may contain multiple names 3026 delimited by ASCII space characters. Of those aliases, the 3027 first name is the preferred value.</p> 3028 </dd> 3029 <dt><b>since</b></dt> 3030 <dd>The version of CLDR in which this key or type was 3031 introduced. Absence of this attribute value implies the key 3032 or type was available in CLDR 1.7.2.</dd> 3033 </dl> 3034 <p><em>Note: There are no values defined for the locale 3035 extension attribute in the current CLDR release.</em></p> 3036 <p>For example,</p> 3037 <pre> 3038<key name="co" alias="collation" description="Collation type key"> 3039 <type name="pinyin" description="Pinyin ordering for Latin and for CJK characters (used in Chinese)"/> 3040</key> 3041 3042<key name="ka" alias="colAlternate" description="Collation parameter key for alternate handling"> 3043 <type name="noignore" alias="non-ignorable" description="Variable collation elements are not reset to ignorable"/> 3044 <type name="shifted" description="Variable collation elements are reset to zero at levels one through three"/> 3045</key> 3046 3047<key name="tz" alias="timezone"> 3048 ... 3049 <type name="aumel" alias="Australia/Melbourne Australia/Victoria" description="Melbourne, Australia"/> 3050 <type name="aumqi" alias="Antarctica/Macquarie" description="Macquarie Island Station, Macquarie Island" since="1.8.1"/> 3051 ... 3052</key> 3053 </pre>The data above indicates: 3054 <ul> 3055 <li>type "pinyin" is valid for key "co", thus "u-co-pinyin" 3056 is a valid Unicode locale extension.</li> 3057 <li>type "pinyin" is not valid for key "ka", thus 3058 "u-ka-pinyin" is not a valid Unicode locale extension.</li> 3059 <li>type "pinyin" has no <i>alias</i>, so 3060 "zh@collation=pinyin" is a valid Unicode locale identifier 3061 according to the old syntax.</li> 3062 <li>type "noignore" has an alias attribute, so 3063 "en@colAlternate=noignore" is not a valid Unicode locale 3064 identifier according to the old syntax.</li> 3065 <li>type "aumel" is valid for key "tz", supported by CLDR 3066 1.7.2 (default value) or later versions.</li> 3067 <li>type "aumqi" is valid for key "tz", supported by CLDR 3068 1.8.1 or later versions.</li> 3069 </ul> 3070 <p>It is strongly recommended that all API methods accept all 3071 possible aliases for keywords and types, but generate the 3072 canonical form. For example, "ar-u-ca-islamicc" would be 3073 equivalent to "ar-u-ca-islamic-civil" on input, but the latter 3074 should be output. The one exception is where an alias would 3075 only be well-formed with the old syntax, such as "gregorian" 3076 (for "gregory").</p> 3077 <h4><a href="#Unicode_Subdivision_Codes" name= 3078 "Unicode_Subdivision_Codes" id= 3079 "Unicode_Subdivision_Codes">3.6.5 Subdivision Codes</a></h4> 3080 <p>The subdivision codes designate a subdivision of a country 3081 or region. They are called various names, such as a 3082 <em>state</em> in the United States, or a <em>province</em> in 3083 Canada. The codes in CLDR are based on ISO 3166-2 subdivision 3084 codes. The ISO codes have a region code followed by a hyphen, 3085 then a suffix consisting of 1..3 ASCII letters or digits.</p> 3086 <p>The CLDR codes are designed to work in a <a href= 3087 '#unicode_locale_id'>unicode_locale_id</a> (BCP47), and are 3088 thus all lowercase, with no hyphen. For example, the following 3089 are valid, and mean “English as used in California, USA”.</p> 3090 <ul> 3091 <li>en-u-sd-<strong>usca</strong></li> 3092 <li>en-US-u-sd-<strong>usca</strong></li> 3093 </ul> 3094 <p>CLDR has additional subdivision codes. These may start with 3095 a 3-digit region code or use a suffix of 4 ASCII letters or 3096 digits, so they will not collide with the ISO codes. 3097 Subdivision codes for unknown values are the region code plus 3098 "zzzz", such as "uszzzz" for an unknown subdivision of the US. 3099 Other codes may be added for stability.</p> 3100 <p>Like BCP 47, CLDR requires stable codes, which are not 3101 guaranteed for ISO 3166-2 (nor have the ISO 3166-2 codes been 3102 stable in the past). If an ISO 3166-2 code is removed, it 3103 remains valid (though marked as deprecated) in CLDR. If an ICU 3104 3166-2 code is reused (for the same region), then CLDR will 3105 define a new equivalent code using these a 4-character 3106 suffixes.</p> 3107 <h5><a name="Validity" href="#Validity" id="Validity">3.6.5.1 3108 Validity</a></h5> 3109 <p>A <a href= 3110 "#unicode_subdivision_id">unicode_subdivision_id</a> is only 3111 valid when it is present in the subdivision.xml file as 3112 described in <em>Section 3.11 <a href="#Validity_Data">Validity 3113 Data</a></em>. The data is in a compressed form, and thus needs 3114 to be expanded before such a test is made.</p> 3115 <p><em>Examples:<br></em></p> 3116 <ul> 3117 <li><strong>usca</strong> is valid — there is an 3118 <strong>id</strong> 3119 element<code><id type="subdivision"…>… usca 3120 …</id></code></li> 3121 <li><strong>ussct</strong> is invalid — there is no 3122 <strong>id</strong> element 3123 <code><id type="subdivision"…>… ussct 3124 …</id></code></li> 3125 </ul> 3126 <p>If a <a href='#unicode_locale_id'>unicode_locale_id</a> 3127 contains both a <a href= 3128 "#unicode_region_subtag">unicode_region_subtag</a> and a 3129 <a href="#unicode_subdivision_id">unicode_subdivision_id</a>, 3130 it is only valid if the <a href= 3131 "#unicode_subdivision_id">unicode_subdivision_id</a> starts 3132 with the <a href= 3133 "#unicode_region_subtag">unicode_region_subtag</a> 3134 (case-insensitively).<br></p> 3135 <p>It is recommended that a <a href= 3136 '#unicode_locale_id'>unicode_locale_id</a> contain a <a href= 3137 "#unicode_region_subtag">unicode_region_subtag</a> if it 3138 contains a <a href= 3139 "#unicode_subdivision_id">unicode_subdivision_id</a> and the 3140 region would not be added by adding likely subtags. That 3141 produces better behavior if the <a href= 3142 "#unicode_subdivision_id">unicode_subdivision_id</a> is ignored 3143 by an implementation or if the language tag is truncated.</p> 3144 <p>Examples:<br></p> 3145 <ul> 3146 <li>en-<strong>US</strong>-u-sd-<strong>us</strong>ca is 3147 valid — the region "US" matches the first part of "usca"</li> 3148 <li>en-u-sd-<strong>us</strong>ca is valid — it still works 3149 after adding likely subtags.</li> 3150 <li>en-<strong>CA</strong>-u-sd-<strong>gb</strong>sct is 3151 invalid — the region "CA" does not match the first part of 3152 "gbsct". An implementation should disregard the subdivision 3153 id (or return an error).</li> 3154 <li>en-u-sd-<strong>gb</strong>sct is valid but not 3155 recommended — an implementation that ignores the <a href= 3156 "#unicode_subdivision_id">unicode_subdivision_id</a> can get 3157 the wrong fallback behavior, or could add likely subtags and 3158 get the invalid 3159 en<strong>-Latn-US</strong>-u-sd-<strong>gb</strong>sct</li> 3160 </ul> 3161 <p>In version 28.0, the subdivisions in the validity files used 3162 the ISO format, uppercase with a hyphen separating two 3163 components, instead of the BCP 47 format.</p> 3164 <h3><a name="t_Extension" id="t_Extension"></a><a name= 3165 "BCP47_T_Extension" href="#BCP47_T_Extension" id= 3166 "BCP47_T_Extension">3.7 Unicode BCP 47 T Extension</a></h3> 3167 <p>The Unicode Consortium has registered and is the maintaining 3168 authority for two BCP 47 language tag extensions: the extension 3169 'u' for Unicode locale extension [<a href= 3170 "#RFC6067">RFC6067</a>] and extension 't' for transformed 3171 content [<a href="#RFC6497">RFC6497</a>]. The Unicode BCP 47 3172 extension data defines the complete list of valid subtags. 3173 While the title of the RFC is “Transformed Content”, the 3174 abstract makes it clear that the scope is broader than the term 3175 "transformed" might indicate to a casual 3176 reader: “including content that has been transliterated, 3177 transcribed, or translated, or <em>in some other way 3178 influenced by the source. It also provides for additional 3179 information used for identification.</em>”</p> 3180 <p><strong>The -t- Extension.</strong> The syntax of 't' 3181 extension subtags is defined by the rule 3182 <code>unicode_locale_extensions</code> in <a href= 3183 "#Unicode_locale_identifier"><em>Section 3.2 Unicode locale 3184 identifier</em></a>, except the separator of subtags 3185 <code>sep</code> must be always hyphen '-' when the extension 3186 is used as a part of BCP 47 language tag. For information about 3187 the registration process, meaning, and usage of the 't' 3188 extension, see [<a href="#RFC6497">RFC6497</a>].</p> 3189 <p>These subtags are all in lowercase (that is the canonical 3190 casing for these subtags), however, subtags are 3191 case-insensitive and casing does not carry any specific 3192 meaning. All subtags within the Unicode extensions are 3193 alphanumeric characters in length of two to eight that meet the 3194 rule <code>extension</code> in the [<a href= 3195 "#BCP47">BCP47</a>].</p> 3196 <p>The following keys are defined for the -t- extension:</p> 3197 <table class='simple'> 3198 <tbody> 3199 <tr> 3200 <th>Keys</th> 3201 <th>Description</th> 3202 <th>Values in latest release</th> 3203 </tr> 3204 <tr> 3205 <td>m0</td> 3206 <td><strong>Transform extension mechanism:</strong> to 3207 reference an authority or rules for a type of 3208 transformation</td> 3209 <td><a href= 3210 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform.xml"> 3211 transform.xml</a></td> 3212 </tr> 3213 <tr> 3214 <td nowrap>s0, d0</td> 3215 <td><strong>Transform source/destination:</strong> for 3216 non-languages/scripts, such as fullwidth-halfwidth 3217 conversion.</td> 3218 <td><a href= 3219 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform-destination.xml"> 3220 transform-destination.xml</a></td> 3221 </tr> 3222 <tr> 3223 <td>i0</td> 3224 <td><strong>Input Method Engine transform:</strong> Used 3225 to indicate an input method transformation, such as one 3226 used by a client-side input method. The first subfield in 3227 a sequence would typically be a 'platform' or vendor 3228 designation.</td> 3229 <td><a href= 3230 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_ime.xml"> 3231 transform_ime.xml</a></td> 3232 </tr> 3233 <tr> 3234 <td>k0</td> 3235 <td><strong>Keyboard transform:</strong> Used to indicate 3236 a keyboard transformation, such as one used by a 3237 client-side virtual keyboard. The first subfield in a 3238 sequence would typically be a 'platform' designation, 3239 representing the platform that the keyboard is intended 3240 for. The keyboard might or might not correspond to a 3241 keyboard mapping shipped by the vendor for the platform. 3242 One or more subsequent fields may occur, but are only 3243 added where needed to distinguish from others.</td> 3244 <td><a href= 3245 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_keyboard.xml"> 3246 transform_keyboard.xml</a></td> 3247 </tr> 3248 <tr> 3249 <td>t0</td> 3250 <td><strong>Machine Translation:</strong> Used to 3251 indicate content that has been machine translated, or a 3252 request for a particular type of machine translation of 3253 content. The first subfield in a sequence would typically 3254 be a 'platform' or vendor designation.</td> 3255 <td><a href= 3256 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_mt.xml"> 3257 transform_mt.xml</a></td> 3258 </tr> 3259 <tr> 3260 <td nowrap>h0</td> 3261 <td><strong>Hybrid Locale Identifiers:</strong> h0 with 3262 the value 'hybrid' indicates that the -t- value is a 3263 language that is mixed into the main language tag to form 3264 a hybrid. For more information, and examples, see 3265 <em>Section 3.10.2 <a href="#Hybrid_Locale">Hybrid Locale 3266 Identifiers</a>.</em></td> 3267 <td><a href= 3268 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_hybrid.xml"> 3269 transform_hybrid.xml</a></td> 3270 </tr> 3271 <tr> 3272 <td>x0</td> 3273 <td><strong>Private use transform</strong></td> 3274 <td><a href= 3275 "https://github.com/unicode-org/cldr/releases/tag/latest/common/bcp47/transform_private_use.xml"> 3276 transform_private_use.xml</a></td> 3277 </tr> 3278 </tbody> 3279 </table> 3280 <h4><a href="#Transformed_Content_Data_File" name= 3281 "Transformed_Content_Data_File" id= 3282 "Transformed_Content_Data_File">3.7.1 T Extension Data 3283 Files</a></h4> 3284 <p>The overall structure of the data files is the similar to 3285 the U Extension, with the following exceptions.</p> 3286 <p>In the transformed content 't' data file, the name attribute 3287 in a <key> element defines a valid field separator 3288 subtag. The name attribute in an enclosed <type> element 3289 defines a valid field subtag for the field separator subtag. 3290 For example:</p> 3291 <pre> 3292<key extension="t" name="m0" 3293 description="Transform extension mechanism"> 3294 <type name="ungegn" 3295 description="United Nations Group of Experts on Geographical Names" 3296 since="21"/> 3297<key> 3298</pre>The data above indicates: 3299 <ul> 3300 <li>"m0" is a valid field separator for the transformed 3301 content extension 't'.</li> 3302 <li>field subtag "ungegn" is valid for field separator 3303 "m0".</li> 3304 <li>field subtag "ungegn" was introduced in CLDR 21.</li> 3305 </ul> 3306 <p>The attributes are:</p> 3307 <dl> 3308 <dt><b>name</b></dt> 3309 <dd>The name of the mechanism, limited to 3-8 characters (or 3310 sequences of them). Any indirect type names are listed in 3311 3.6.4 <a href="#Unicode_Locale_Extension_Data_Files">U 3312 Extension Data Files</a>.</dd> 3313 <dt><b>description</b></dt> 3314 <dd>A description of the name, with all and only that 3315 information necessary to distinguish one name from | American 3316 Library others with which it might be confused. Descriptions 3317 are not intended to provide general background 3318 information.</dd> 3319 <dt><b>since</b></dt> 3320 <dd>Indicates the first version of CLDR where the name 3321 appears. (Required for new items.)</dd> 3322 <dt> </dt> 3323 <dt><b>alias</b></dt> 3324 <dd>Alternative name, not limited in number of characters. 3325 Aliases are intended for compatibility, not to provide all 3326 possible alternate names or designations. 3327 <em>(Optional)</em></dd> 3328 </dl> 3329 <p>For information about the registration process, meaning, and 3330 usage of the 't' extension, see [<a href= 3331 "#RFC6497">RFC6497</a>].</p> 3332 <h3><a name="Compatibility_with_Older_Identifiers" href= 3333 "#Compatibility_with_Older_Identifiers" id= 3334 "Compatibility_with_Older_Identifiers">3.8 Compatibility with 3335 Older Identifiers</a></h3> 3336 <p>LDML version before 1.7.2 used slightly different syntax for 3337 variant subtags and locale extensions. Implementations of LDML 3338 may provide backward compatible identifier support as described 3339 in following sections.</p> 3340 <h4><a name="Old_Locale_Extension_Syntax" href= 3341 "#Old_Locale_Extension_Syntax" id= 3342 "Old_Locale_Extension_Syntax">3.8.1 Old Locale Extension 3343 Syntax</a></h4> 3344 <p>LDML 1.7 or older specification used different syntax for 3345 representing unicode locale extensions. The previous definition 3346 of Unicode locale extensions had the following structure:</p> 3347 <table border="0"> 3348 <tr> 3349 <th> </th> 3350 <th> 3351 <div align="center"> 3352 EBNF 3353 </div> 3354 </th> 3355 </tr> 3356 <tr> 3357 <td>old_unicode_locale_extensions</td> 3358 <td> 3359 <pre>= "@" old_key "=" old_type 3360 (";" old_key "=" old_type)*</pre> 3361 </td> 3362 </tr> 3363 </table> 3364 <p>The new specification mandates keys to be two alphanumeric 3365 characters and types to be three to eight alphanumeric 3366 characters. As the result, new codes were assigned to all 3367 existing keys and some types. For example, a new key "co" 3368 replaced the previous key "collation", a new type "phonebk" 3369 replaced the previous type "phonebook". However, the existing 3370 collation type "big5han" already satisfied the new requirement, 3371 so no new type code was assigned to the type. All new keys and 3372 types introduced after LDML 1.7 satisfy the new requirement, so 3373 they do not have aliases dedicated for the old syntax, except 3374 time zone types. The conversion between old types and new types 3375 can be done regardless of key, with one known exception (old 3376 type "traditional" is mapped to new type "trad" for collation 3377 and "traditio" for numbering system), and this relationship 3378 will be maintained in the future versions unless otherwise 3379 noted.</p> 3380 <p>The new specification introduced a new field 3381 <code>attribute</code> in addition to key/type pairs in the 3382 Unicode locale extension. When it is necessary to map a new 3383 Unicode locale identifier with <code>attribute</code> field to 3384 a well-formed old locale identifier, a special key name 3385 <i>attribute</i> with the value of entire 3386 <code>attribute</code> subtags in the new identifier is used. 3387 For example, a new identifier 3388 <code>ja-u-xxx-yyy-ca-japanese</code> is mapped to an old 3389 identifier <code>ja@attribute=xxx-yyy;calendar=japanese</code> 3390 .</p> 3391 <p>The chart below shows some example mappings between the new 3392 syntax and the old syntax.</p> 3393 <table> 3394 <caption> 3395 <a name="Locale_Extension_Mappings" href= 3396 "#Locale_Extension_Mappings" id= 3397 "Locale_Extension_Mappings">Locale Extension Mappings</a> 3398 </caption> 3399 <tr> 3400 <th>Old (LDML 1.7 or older)</th> 3401 <th>New</th> 3402 </tr> 3403 <tr> 3404 <td>de_DE@collation=phonebook</td> 3405 <td>de_DE_u_co_phonebk</td> 3406 </tr> 3407 <tr> 3408 <td>zh_Hant_TW@collation=big5han</td> 3409 <td>zh_Hant_TW_u_co_big5han</td> 3410 </tr> 3411 <tr> 3412 <td>th_TH@calendar=gregorian;numbers=thai</td> 3413 <td>th_TH_u_ca_gregory_nu_thai</td> 3414 </tr> 3415 <tr> 3416 <td>en_US_POSIX@timezone=America/Los_Angeles</td> 3417 <td>en_US_u_tz_uslax_va_posix</td> 3418 </tr> 3419 </table> 3420 <p>Where the old API is supplied the bcp47 language code, or 3421 vice versa, the recommendation is to:</p> 3422 <ol> 3423 <li>Have all methods that take the old syntax also take the 3424 new syntax, interpreted correctly. For example, 3425 "zh-TW-u-co-pinyin" and "zh_TW@collation=pinyin" would both 3426 be interpreted as meaning the same.</li> 3427 <li>Have all methods (both for old and new syntax) accept all 3428 possible aliases for keywords and types. For example, 3429 "ar-u-ca-islamicc" would be equivalent to 3430 "ar-u-ca-islamic-civil". 3431 <ul> 3432 <li>The one exception is where an alias would only be 3433 well-formed with the old syntax, such as "gregorian" (for 3434 "gregory").</li> 3435 </ul> 3436 </li> 3437 <li>Where an API cannot successfully accept the alternate 3438 syntax, throw an exception (or otherwise indicate an error) 3439 so that people can detect that they are using the wrong 3440 method (or wrong input).</li> 3441 <li>Provide a method that tests a purported locale ID string 3442 to determine its status: 3443 <ol> 3444 <li><strong>well-formed</strong> - syntactically 3445 correct</li> 3446 <li><strong>valid</strong> - well-formed and only uses 3447 registered language subtags, extensions, keywords, 3448 types...</li> 3449 <li><strong>canonical</strong> - valid and no deprecated 3450 codes or structure.</li> 3451 </ol> 3452 </li> 3453 </ol> 3454 <h4><a name="Legacy_Variants" href="#Legacy_Variants" id= 3455 "Legacy_Variants">3.8.2 Legacy Variants</a></h4> 3456 <p>Old LDML specification allowed codes other than registered 3457 [<a href="#BCP47">BCP47</a>] variant subtags used in Unicode 3458 language and locale identifiers for representing variations of 3459 locale data. Unicode locale identifiers including such variant 3460 codes can be converted to the new [<a href="#BCP47">BCP47</a>] 3461 compatible identifiers by following the descriptions below:</p> 3462 <table> 3463 <caption> 3464 <a name="Legacy_Variant_Mappings" href= 3465 "#Legacy_Variant_Mappings" id= 3466 "Legacy_Variant_Mappings">Legacy Variant Mappings</a> 3467 </caption> 3468 <tr> 3469 <th>Variant Code</th> 3470 <th>Description</th> 3471 </tr> 3472 <tr> 3473 <td>AALAND</td> 3474 <td>Åland, variant of "sv" Swedish used in Finland. Use 3475 "sv_AX" to indicate this.</td> 3476 </tr> 3477 <tr> 3478 <td>BOKMAL</td> 3479 <td>Bokmål, variant of "no" Norwegian. Use primary language 3480 subtag "nb" to indicate this.</td> 3481 </tr> 3482 <tr> 3483 <td>NYNORSK</td> 3484 <td>Nynorsk, variant of "no" Norwegian. Use primary 3485 language subtag "nn" to indicate this.</td> 3486 </tr> 3487 <tr> 3488 <td>POSIX</td> 3489 <td>POSIX variation of locale data. Use Unicode locale 3490 extension "-u-va-posix" to indicate this.</td> 3491 </tr> 3492 <tr> 3493 <td>POLYTONI</td> 3494 <td>Polytonic, variant of "el" Greek. Use [<a href= 3495 "#BCP47">BCP47</a>] variant subtag "polyton" to indicate 3496 this.</td> 3497 </tr> 3498 <tr> 3499 <td>SAAHO</td> 3500 <td>The Saaho variant of Afar. Use primary language subtag 3501 "ssy" to indicated this.</td> 3502 </tr> 3503 </table> 3504 <p>When converting to old syntax, the Unicode locale extension 3505 "-u-va-posix" should be converted to the "POSIX" variant, 3506 <i>not</i> to old extension syntax like "@va=posix". This is an 3507 exception: The other mappings above should not be reversed.</p> 3508 <p>Examples:</p> 3509 <ul> 3510 <li>en_US_POSIX ↔ en-US-u-va-posix</li> 3511 <li>en_US_POSIX@colNumeric=yes ↔ en-US-u-kn-va-posix</li> 3512 <li>en-US-POSIX-u-kn-true → en-US-u-kn-va-posix</li> 3513 <li>en-US-POSIX-u-kn-va-posix → en-US-u-kn-va-posix</li> 3514 </ul> 3515 <h4><a name="Relation_to_OpenI18n" href="#Relation_to_OpenI18n" 3516 id="Relation_to_OpenI18n">3.8.3 Relation to OpenI18n</a></h4> 3517 <p>The locale id format generally follows the description in 3518 the <i>OpenI18N Locale Naming Guideline</i> [<a href= 3519 "#NamingGuideline">NamingGuideline</a>], with some 3520 enhancements. The main differences from the those guidelines 3521 are that the locale id:</p> 3522 <ol type="a"> 3523 <li style="margin-top: 0.5em; margin-bottom: 0.5em">does not 3524 include a charset (since the data in LDML format always 3525 provides a representation of all Unicode characters. The 3526 repository is stored in UTF-8, although that can be 3527 transcoded to other encodings as well.),</li> 3528 <li style="margin-top: 0.5em; margin-bottom: 0.5em">adds the 3529 ability to have a variant, as in Java</li> 3530 <li style="margin-top: 0.5em; margin-bottom: 0.5em">adds the 3531 ability to discriminate the written language by script (or 3532 script variant).</li> 3533 <li style="margin-top: 0.5em; margin-bottom: 0.5em">is a 3534 superset of [<a href="#BCP47">BCP47</a>] codes.</li> 3535 </ol> 3536 <h3><a name="Transmitting_Locale_Information" href= 3537 "#Transmitting_Locale_Information" id= 3538 "Transmitting_Locale_Information">3.9 Transmitting Locale 3539 Information</a></h3> 3540 <p>In a world of on-demand software components, with arbitrary 3541 connections between those components, it is important to get a 3542 sense of where localization should be done, and how to transmit 3543 enough information so that it can be done at that appropriate 3544 place. End-users need to get messages localized to their 3545 languages, messages that not only contain a translation of 3546 text, but also contain variables such as date, time, number 3547 formats, and currencies formatted according to the users' 3548 conventions. The strategy for doing the so-called <i>JIT 3549 localization</i> is made up of two parts:</p> 3550 <ol> 3551 <li>Store and transmit <i>neutral-format</i> data wherever 3552 possible. 3553 <ul> 3554 <li>Neutral-format data is data that is kept in a 3555 standard format, no matter what the local user's 3556 environment is. Neutral-format is also (loosely) called 3557 <i>binary data</i>, even though it actually could be 3558 represented in many different ways, including a textual 3559 representation such as in XML.</li> 3560 <li>Such data should use accepted standards where 3561 possible, such as for currency codes.</li> 3562 <li>Textual data should also be in a uniform character 3563 set (Unicode/10646) to avoid possible data corruption 3564 problems when converting between encodings.</li> 3565 </ul> 3566 </li> 3567 <li>Localize that data as "<i>close</i>" to the end-user as 3568 possible.</li> 3569 </ol> 3570 <p>There are a number of advantages to this strategy. The 3571 longer the data is kept in a neutral format, the more flexible 3572 the entire system is. On a practical level, if transmitted data 3573 is neutral-format, then it is much easier to manipulate the 3574 data, debug the processing of the data, and maintain the 3575 software connections between components.</p> 3576 <p>Once data has been localized into a given language, it can 3577 be quite difficult to programmatically convert that data into 3578 another format, if required. This is especially true if the 3579 data contains a mixture of translated text and formatted 3580 variables. Once information has been localized into, say, 3581 Romanian, it is much more difficult to localize that data into, 3582 say, French. Parsing is more difficult than formatting, and may 3583 run up against different ambiguities in interpreting text that 3584 has been localized, even if the original translated message 3585 text is available (which it may not be).</p> 3586 <p>Moreover, the closer we are to end-user, the more we know 3587 about that user's preferred formats. If we format dates, for 3588 example, at the user's machine, then it can easily take into 3589 account any customizations that the user has specified. If the 3590 formatting is done elsewhere, either we have to transmit 3591 whatever user customizations are in play, or we only transmit 3592 the user's locale code, which may only approximate the desired 3593 format. Thus the closer the localization is to the end user, 3594 the less we need to ship all of the user's preferences around 3595 to all the places that localization could possibly need to be 3596 done.</p> 3597 <p>Even though localization should be done as close to the 3598 end-user as possible, there will be cases where different 3599 components need to be aware of whatever settings are 3600 appropriate for doing the localization. Thus information such 3601 as a locale code or time zone needs to be communicated between 3602 different components.</p> 3603 <h4><a name="Message_Formatting_and_Exceptions" href= 3604 "#Message_Formatting_and_Exceptions" id= 3605 "Message_Formatting_and_Exceptions">3.9.1 Message Formatting 3606 and Exceptions</a></h4> 3607 <p>Windows (<a href= 3608 "https://msdn.microsoft.com/en-us/library/ms679351.aspx">FormatMessage</a>, 3609 <a href= 3610 "https://msdn.microsoft.com/en-us/library/aa331875.aspx">String.Format</a>), 3611 Java (<a href= 3612 "https://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html">MessageFormat</a>) 3613 and ICU (<a href= 3614 "http://www.icu-project.org/apiref/icu4c/classMessageFormat.html">MessageFormat</a>, 3615 <a href= 3616 "http://www.icu-project.org/apiref/icu4c/umsg_8h.html">umsg</a>) 3617 all provide methods of formatting variables (dates, times, etc) 3618 and inserting them at arbitrary positions in a string. This 3619 avoids the manual string concatenation that causes severe 3620 problems for localization. The question is, where to do this? 3621 It is especially important since the original code site that 3622 originates a particular message may be far down in the bowels 3623 of a component, and passed up to the top of the component with 3624 an exception. So we will take that case as representative of 3625 this class of issues.</p> 3626 <p>There are circumstances where the message can be 3627 communicated with a language-neutral code, such as a numeric 3628 error code or mnemonic string key, that is understood outside 3629 of the component. If there are arguments that need to accompany 3630 that message, such as a number of files or a datetime, those 3631 need to accompany the numeric code so that when the 3632 localization is finally at some point, the full information can 3633 be presented to the end-user. This is the best case for 3634 localization.</p> 3635 <p>More often, the exact messages that could originate from 3636 within the component are not known outside of the component 3637 itself; or at least they may not be known by the component that 3638 is finally displaying text to the user. In such a case, the 3639 information as to the user's locale needs to be communicated in 3640 some way to the component that is doing the localization. That 3641 locale information does not necessarily need to be communicated 3642 deep within the component; ideally, any exceptions should 3643 bundle up some language-neutral message ID, plus the arguments 3644 needed to format the message (for example, datetime), but not 3645 do the localization at the throw site. This approach has the 3646 advantages noted above for JIT localization.</p> 3647 <p>In addition, exceptions are often caught at a higher level; 3648 they do not end up being displayed to any end-user at all. By 3649 avoiding the localization at the throw site, it the cost of 3650 doing formatting, when that formatting is not really necessary. 3651 In fact, in many running programs most of the exceptions that 3652 are thrown at a low level never end up being presented to an 3653 end-user, so this can have considerable performance 3654 benefits.</p> 3655 <h3><a name="Language_and_Locale_IDs" href= 3656 "#Language_and_Locale_IDs" id="Language_and_Locale_IDs">3.10 3657 Unicode Language and Locale IDs</a></h3> 3658 <p>People have very slippery notions of what distinguishes a 3659 language code versus a locale code. The problem is that both 3660 are somewhat nebulous concepts.</p> 3661 <p>In practice, many people use [<a href="#BCP47">BCP47</a>] 3662 codes to mean locale codes instead of strictly language codes. 3663 It is easy to see why this came about; because [<a href= 3664 "#BCP47">BCP47</a>] includes an explicit region (territory) 3665 code, for most people it was sufficient for use as a locale 3666 code as well. For example, when typical web software receives 3667 an [<a href="#BCP47">BCP47</a>] code, it will use it as a 3668 locale code. Other typical software will do the same: in 3669 practice, language codes and locale codes are treated 3670 interchangeably. Some people recommend distinguishing on the 3671 basis of "-" versus "_" (for example, <i>zh-TW</i> for language 3672 code, <i>zh_TW</i> for locale code), but in practice that does 3673 not work because of the free variation out in the world in the 3674 use of these separators. Notice that Windows, for example, uses 3675 "-" as a separator in its locale codes. So pragmatically one is 3676 forced to treat "-" and "_" as equivalent when interpreting 3677 either one on input.</p> 3678 <p>Another reason for the conflation of these codes is that 3679 <i>very</i> little data in most systems is distinguished by 3680 region alone; currency codes and measurement systems being some 3681 of the few. Sometimes date or number formats are mentioned as 3682 regional, but that really does not make much sense. If people 3683 see the sentence "You will have to adjust the value to 3684 १,२३४.५६७ from ૭૧,૨૩૪.૫૬" (using Indic digits), they would say 3685 that sentence is simply not English. Number format is far more 3686 closely associated with language than it is with region. The 3687 same is true for date formats: people would never expect to see 3688 intermixed a date in the format "2003年4月1日" (using Kanji) in 3689 text purporting to be purely English. There are regional 3690 differences in date and number format — differences which can 3691 be important — but those are different in kind than other 3692 language differences between regions.</p> 3693 <p>As far as we are concerned — <i>as a completely practical 3694 matter</i> — two languages are different if they require 3695 substantially different localized resources. Distinctions 3696 according to spoken form are important in some contexts, but 3697 the written form is by far and away the most important issue 3698 for data interchange. Unfortunately, this is not the principle 3699 used in [<a href="#ISO639">ISO639</a>], which has the fairly 3700 unproductive notion (for data interchange) that only spoken 3701 language matters (it is also not completely consistent about 3702 this, however).</p> 3703 <p>[<a href="#BCP47">BCP47</a>] <i><b>can</b></i> express a 3704 difference if the use of written languages happens to 3705 correspond to region boundaries expressed as [<a href= 3706 "#ISO3166">ISO3166</a>] region codes, and has recently added 3707 codes that allow it to express some important cases that are 3708 not distinguished by [<a href="#ISO3166">ISO3166</a>] codes. 3709 These written languages include simplified and traditional 3710 Chinese (both used in Hong Kong S.A.R.); Serbian in Latin 3711 script; Azerbaijani in Arab script, and so on.</p> 3712 <p>Notice also that <i>currency codes</i> are different than 3713 <i>currency localizations</i>. The currency localizations 3714 should largely be in the language-based resource bundles, not 3715 in the territory-based resource bundles. Thus, the resource 3716 bundle <i>en</i> contains the localized mappings in English for 3717 a range of different currency codes: USD → US$, RUR → Rub, AUD 3718 → $A and so on. Of course, some currency symbols are used for 3719 more than one currency, and in such cases specializations 3720 appear in the territory-based bundles. Continuing the example, 3721 <i>en_US</i> would have USD → $, while <i>en_AU</i> would have 3722 AUD → $. (In protocols, the currency codes should always 3723 accompany any currency amounts; otherwise the data is 3724 ambiguous, and software is forced to use the user's territory 3725 to guess at the currency. For some informal discussion of this, 3726 see <a href= 3727 "http://source.icu-project.org/repos/icu/icuhtml/trunk/design/jit_localization.html"> 3728 JIT Localization</a>.)</p> 3729 <h4><a name="Written_Language" href="#Written_Language" id= 3730 "Written_Language">3.10.1 Written Language</a></h4> 3731 <p>Criteria for what makes a written language should be purely 3732 pragmatic; <i>what would copy-editors say?</i> If one gave them 3733 text like the following, they would respond that is far from 3734 acceptable English for publication, and ask for it to be 3735 redone:</p> 3736 <ol> 3737 <li type="A">"Theatre Center News: The date of the last 3738 version of this document was 2003年3月20日. A copy can be 3739 obtained for $50,0 or 1.234,57 грн. We would like to 3740 acknowledge contributions by the following authors (in 3741 alphabetical order): Alaa Ghoneim, Behdad Esfahbod, Ahmed 3742 Talaat, Eric Mader, Asmus Freytag, Avery Bishop, and Doug 3743 Felt."</li> 3744 </ol> 3745 <p>So one would change it to either B or C below, depending on 3746 which orthographic variant of English was the target for the 3747 publication:</p> 3748 <ol type="A" start="2"> 3749 <li>"Theater Center News: The date of the last version of 3750 this document was 3/20/2003. A copy can be obtained for 3751 $50.00 or 1,234.57 Ukrainian Hryvni. We would like to 3752 acknowledge contributions by the following authors (in 3753 alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus 3754 Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric 3755 Mader."</li> 3756 <li>"Theatre Centre News: The date of the last version of 3757 this document was 20/3/2003. A copy can be obtained for 3758 $50.00 or 1,234.57 Ukrainian Hryvni. We would like to 3759 acknowledge contributions by the following authors (in 3760 alphabetical order): Alaa Ghoneim, Ahmed Talaat, Asmus 3761 Freytag, Avery Bishop, Behdad Esfahbod, Doug Felt, Eric 3762 Mader."</li> 3763 </ol> 3764 <p>Clearly there are many acceptable variations on this text. 3765 For example, copy editors might still quibble with the use of 3766 first versus last name sorting in the list, but clearly the 3767 first list was <i>not</i> acceptable English alphabetical 3768 order. And in quoting a name, like "Theatre Centre News", one 3769 may leave it in the source orthography even if it differs from 3770 the publication target orthography. And so on. However, just as 3771 clearly, there limits on what is acceptable English, and 3772 "2003年3月20日", for example, is <i>not</i>.</p> 3773 <p>Note that the language of locale data may differ from the 3774 language of localized software or web sites, when those latter 3775 are not localized into the user's preferred language. In such 3776 cases, the kind of incongruous juxtapositions described above 3777 may well appear, but this situation is usually preferable to 3778 forcing unfamiliar date or number formats on the user as 3779 well.</p> 3780 <h4><a name="Hybrid_Locale" href="#Hybrid_Locale" id= 3781 "Hybrid_Locale">3.10.2 Hybrid Locale Identifiers</a></h4> 3782 <p>Hybrid locales have intermixed content from 2 (or more) 3783 languages, often with one language's grammatical structure 3784 applied to words in another. These are commonly referred to 3785 with portmanteau words such as <em>Franglais, <a href= 3786 "https://en.oxforddictionaries.com/definition/spanglish">Spanglish</a></em> 3787 or <em>Denglish</em>. Hybrid locales do not <em>not</em> 3788 reference text simply containing two languages: a book of 3789 parallel text containing English and French, such as the 3790 following, is not Franglais:</p> 3791 <table style='margin-left:2em; margin-right:2em'> 3792 <tbody> 3793 <tr> 3794 <td width='50%' style='font-family:serif'>On the 24th of 3795 May, 1863, my uncle, Professor Liedenbrock, rushed into 3796 his little house, No. 19 Königstrasse, one of the oldest 3797 streets in the oldest portion of the city of 3798 Hamburg…</td> 3799 <td style='font-family:serif'>Le 24 mai 1863, un 3800 dimanche, mon oncle, le professeur Lidenbrock, revint 3801 précipitamment vers sa petite maison située au numéro 19 3802 de Königstrasse, l’une des plus anciennes rues du vieux 3803 quartier de Hambourg…</td> 3804 </tr> 3805 </tbody> 3806 </table> 3807 <p>While text in a document can be tagged as partly in one 3808 language and partly in another, that is not the same having a 3809 hybrid locale. There is a difference between having a Spanglish 3810 document, and a Spanish document that has some passages quoted 3811 in English. Fine-grained tagging doesn't handle grammatical 3812 combinations like Denglisch “<a href= 3813 "https://www.duden.de/rechtschreibung/downloaden">gedownloadet</a>”, 3814 which is neither English nor German — similarly the Franglais 3815 “<a href= 3816 'https://www.le-dictionnaire.com/definition.php?mot=downloader'>downloadé</a>”. 3817 More importantly, it doesn’t work for the very common use case 3818 for a <a href="#unicode_locale_id">unicode_locale_id</a>: 3819 <i>locale selection</i>.</p> 3820 <p>To communicate requests for localized content and 3821 internationalization services, locales are used. When people 3822 pick a language from a menu, internally they are picking a 3823 locale (en-GB, es-419, etc.). To allow an application to 3824 support Spanglish or Hinglish locale selection, <a href= 3825 "#unicode_locale_id">unicode_locale_id</a>s can represent 3826 hybrid locales using the T extension key-value 'h0-hybrid'. 3827 (For more information on the T extension, see <em>Section 3.7 3828 <a href="#t_Extension">Unicode BCP 47 T 3829 Extension</a>.</em>)</p> 3830 <p>Examples:</p> 3831 <table class='simple'> 3832 <tbody> 3833 <tr> 3834 <td>hi-t-<u>en-h0-hybrid</u></td> 3835 <td>Hinglish</td> 3836 <td>Hindi-English hybrid locale</td> 3837 </tr> 3838 <tr> 3839 <td>ta-t-<u>en-h0-hybrid</u></td> 3840 <td>Tanglish</td> 3841 <td>Tamil-English hybrid locale</td> 3842 </tr> 3843 <tr> 3844 <td>ba-t-<u>en-h0-hybrid</u></td> 3845 <td>Banglish</td> 3846 <td>Bangla-English hybrid locale</td> 3847 </tr> 3848 <tr> 3849 <td colspan="3">…</td> 3850 </tr> 3851 <tr> 3852 <td>en-t-<u>hi-h0-hybrid</u></td> 3853 <td>Hinglish</td> 3854 <td>English-Hindi hybrid locale</td> 3855 </tr> 3856 <tr> 3857 <td>en-t-<u>zh-h0-hybrid</u></td> 3858 <td>Chinglish</td> 3859 <td>English-Chinese hybrid locale</td> 3860 </tr> 3861 <tr> 3862 <td colspan="3">…</td> 3863 </tr> 3864 </tbody> 3865 </table> 3866 <blockquote> 3867 <p><em>Note: The <a href= 3868 "#unicode_language_id">unicode_language_id</a> should be the 3869 language used as the ‘scaffold’: for the fallback locale for 3870 internationalization services, typically used for more of the 3871 core vocabulary/structure in the content. Thus Hinglish 3872 should be represented as hi-t-h0-en where Hindi is the 3873 scaffold, and as en-t-h0-hi where English is.</em></p> 3874 </blockquote> 3875 <p>The value of -t- is a full <em><a href= 3876 "#unicode_language_id">unicode_language_id</a></em>, and can 3877 contain subtags for script or region where it is important to 3878 include them, as in the following. It may be useful in order to 3879 emphasize the script, even where it is the default script for 3880 the language, if it is not the same as the script of the main 3881 language tag.</p> 3882 <table class='simple'> 3883 <tbody> 3884 <tr> 3885 <td>ru-t<u>-en-latn-gb-h0-hybrid</u></td> 3886 <td>Runglish</td> 3887 <td>Russian with an admixture of British English in Latin 3888 script</td> 3889 </tr> 3890 <tr> 3891 <td>ru-t-<u>en-cyrl-gb-h0-hybrid</u></td> 3892 <td>Runglish</td> 3893 <td>Russian with an admixture of British English in 3894 Cyrillic script</td> 3895 </tr> 3896 </tbody> 3897 </table> 3898 <p>Should there ever be strong need for hybrids of more than 3899 two languages or for other purposes such as hybrid languages as 3900 the source of translated content, additional structure could be 3901 added.</p> 3902 <h3><a name="Validity_Data" href="#Validity_Data" id= 3903 "Validity_Data">3.11 Validity Data</a></h3> 3904 <p class='dtd'><!ELEMENT idValidity (id*) ><br> 3905 <!ELEMENT id ( #PCDATA ) ><br> 3906 <!ATTLIST id type NMTOKEN #REQUIRED ><br> 3907 <!ATTLIST id idStatus NMTOKEN #REQUIRED ></p> 3908 <p>The directory <a href= 3909 'https://github.com/unicode-org/cldr/releases/tag/latest/common/validity/'>common/validity</a> 3910 contains machine-readable data for validating the language, 3911 region, script, and variant subtags, as well as currency, 3912 subdivisions and measure units. Each file contains a number of 3913 subtags with the following <strong>idStatus</strong> 3914 values:</p> 3915 <ul> 3916 <li><strong>regular</strong> — the standard codes used for 3917 the specific type of subtag</li> 3918 <li><strong>special</strong> — certain exceptional language 3919 codes like 'mul' <em>(languages only)</em></li> 3920 <li><strong>unknown</strong> — the code used to indicate the 3921 "unknown", "undetermined" or "invalid" values. For more 3922 information, see <em>Section 3.5.1 <a href= 3923 "#Unknown_or_Invalid_Identifiers">Unknown or Invalid 3924 Identifiers</a></em>.</li> 3925 <li> 3926 <strong>macroregion</strong> — the standard codes that are 3927 macroregions <em>(for regions only).</em> 3928 <ul> 3929 <li>Note that some two-letter region codes are 3930 macroregions, and (in the future) some three-digit codes 3931 may be regular codes.</li> 3932 <li>For details as to which regions are contained within 3933 which macroregions, see the 3934 <strong><containment></strong> element of the 3935 supplemental data.</li> 3936 </ul> 3937 </li> 3938 <li><strong>deprecated</strong> — codes that should not be 3939 used. The <strong><alias></strong> element in the 3940 supplementalMeta file contains more information about these 3941 codes, and which codes should be used instead.</li> 3942 <li><strong>private_use</strong> — codes that, for CLDR, are 3943 considered private use. Note that some private-use 3944 codes in a source standard such as BCP47 have defined CLDR semantics, and are considered regular 3945 codes. For more information, see <em>Section 3.5.3 <a href= 3946 "#Private_Use_Codes">Private Use Codes</a>.</em></li> 3947 <li><strong>reserved</strong> — codes that are private use in a source standard, but are reserved for future use as regular codes by CLDR.</li> 3948 </ul> 3949 <p>The list of subtags for each idStatus use a compact format 3950 as a space-delimited list of StringRanges, as defined in 3951 <em>Section <a href="#String_Range">5.3.4 String 3952 Range</a>.</em> The separator for each StringRange is a 3953 "~".</p> 3954 <p>Each measure unit is a sequence of subtags, such as 3955 “angle-arc-minute”. The first subtag provides a general 3956 “category” of the unit.</p> 3957 <p>In version 28.0, the subdivisions in the validity files used 3958 the ISO format, uppercase with a hyphen separating two 3959 components, instead of the BCP 47 format.</p> 3960 <h2><a name="Locale_Inheritance" href="#Locale_Inheritance" id= 3961 "Locale_Inheritance">4 Locale Inheritance and Matching</a></h2> 3962 <p>The XML format relies on an inheritance model, whereby the 3963 resources are collected into <i>bundles</i>, and the bundles 3964 organized into a tree. Data for the many Spanish locales does 3965 not need to be duplicated across all of the countries having 3966 Spanish as a national language. Instead, common data is 3967 collected in the Spanish language locale, and territory locales 3968 only need to supply differences. The parent of all of the 3969 language locales is a generic locale known as <i>root</i>. 3970 Wherever possible, the resources in the root are language & 3971 territory neutral. For example, the collation (sorting) order 3972 in the root is based on the [<a href="#DUCET">DUCET</a>] 3973 (see<em><a href="tr35-collation.html#Root_Collation">Root 3974 Collation</a></em>). Since English language collation has the 3975 same ordering as the root locale, the 'en' locale data does not 3976 need to supply any collation data, nor do the 'en_US', 'en_GB' 3977 or the any of the various other locales that use English.</p> 3978 <p>Given a particular locale id "en_US_someVariant", the search 3979 chain for a particular resource is the following.</p> 3980 <blockquote> 3981 <pre>en_US_someVariant 3982en_US 3983en 3984root</pre> 3985 </blockquote> 3986 <p><em>The inheritance is often not simple truncation, as will 3987 be seen later in this section.</em></p> 3988 <p>If a type and key are supplied in the locale id, then 3989 logically the chain from that id to the root is searched for a 3990 resource tag with a given type, all the way up to root. If no 3991 resource is found with that tag and type, then the chain is 3992 searched again without the type.</p> 3993 <p>Thus the data for any given locale will only contain 3994 resources that are different from the parent locale. For 3995 example, most territory locales will inherit the bulk of their 3996 data from the language locale: "en" will contain the bulk of 3997 the data: "en_IE" will only contain a few items like currency. 3998 All data that is inherited from a parent is presumed to be 3999 valid, just as valid as if it were physically present in the 4000 file. This provides for much smaller resource bundles, and much 4001 simpler (and less error-prone) maintenance. At the script or 4002 region level, the "primary" child locale will be empty, since 4003 its parent will contain all of the appropriate resources for 4004 it. For more information see <i>CLDR Information : Section 9.3 4005 <a href="tr35-info.html#Default_Content">Default 4006 Content</a>.</i></p> 4007 <p>Certain data items depend only on the region specified in a 4008 locale id (by a <a href= 4009 "#unicode_region_subtag_validity">unicode_region_subtag</a> or 4010 an “rg” <a href="#RegionOverride">Region Override</a> key) , 4011 and are obtained from supplemental data rather than through 4012 locale resources. For example:</p> 4013 <ul> 4014 <li>The currency for the specified region (see <a href= 4015 "tr35-numbers.html#Supplemental_Currency_Data">Supplemental 4016 Currency Data</a>)</li> 4017 <li>The measurement system for the specified region (see 4018 <a href= 4019 "tr35-general.html#Measurement_System_Data">Measurement 4020 System Data</a>)</li> 4021 <li>The week conventions for the specified region (see 4022 <a href="tr35-dates.html#Week_Data">Week Data</a>)</li> 4023 </ul> 4024 <p>(For more information on the specific items handled this 4025 way, see <a href= 4026 "tr35-info.html#Territory_Based_Preferences">Territory-Based 4027 Preferences</a>.) These items will be correct for the specified 4028 region regardless of whether a locale bundle actually exists 4029 with the same combination of language and region as in the 4030 locale id. For example, suppose data is requested for the 4031 locale id "fr_US" and there is no bundle for that combination. 4032 Data obtained via locale inheritance, such as currency patterns 4033 and currency symbols, will be obtained from the parent locale 4034 "fr". However, currency amounts would be formatted by default 4035 using US dollars, just displayed in the manner governed by the 4036 locale "fr". When a locale id does not specify a region, the 4037 region-specific items such as those above are obtained from the 4038 likely region for the locale (obtained via <a href= 4039 "#Likely_Subtags">Likely Subtags</a>).</p> 4040 <p>For the relationship between Inheritance, DefaultContent, 4041 LikelySubtags, and LocaleMatching, see Section 4.2.6 <a href= 4042 "tr35.html#Inheritance_vs_Related">Inheritance vs Related 4043 Information</a>.</p> 4044 <h3><a href="#Lookup" name="Lookup" id="Lookup">4.1 4045 Lookup</a></h3> 4046 <p>If a language has more than one script in customary modern 4047 use, then the CLDR file structure in common/main follows the 4048 following model:</p> 4049 <blockquote> 4050 <p>lang<br> 4051 lang_script<br> 4052 lang_script_region<br> 4053 lang_region <i>(aliases to lang_script_region)</i></p> 4054 </blockquote> 4055 <h4><a href="#Bundle_vs_Item_Lookup" name= 4056 "Bundle_vs_Item_Lookup" id="Bundle_vs_Item_Lookup">4.1.1 Bundle 4057 vs Item Lookup</a></h4> 4058 <p>There are actually two different kinds of inheritance 4059 fallback: <em>resource bundle lookup</em> and 4060 <em>resource item lookup</em>. For the former, a 4061 process is looking to find the first, best resource bundle it 4062 can; for the later, it is fallback within bundles on 4063 individual items, like the translated name for the region "CN" 4064 in Breton.</p> 4065 <p>These are closely related, but distinct, processes. They are 4066 illustrated in the table <a href="#Lookup-Differences">Lookup 4067 Differences</a>, where "key" stands for zero or more key/type 4068 pairs. Logically speaking, when looking up an item for a given 4069 locale, you first do a resource bundle lookup to find the best 4070 bundle for the locale, then you do a inherited item lookup 4071 starting with that resource bundle.</p> 4072 <p>The table <a href="#Lookup-Differences">Lookup 4073 Differences</a> uses the naïve resource bundle lookup for 4074 illustration. More sophisticated systems will get far better 4075 results for resource bundle lookup if they use the algorithm 4076 described in <em>Section 4.4 <a href= 4077 "#LanguageMatching">Language Matching</a></em>. That algorithm 4078 takes into account both the user’s desired locale(s) and the 4079 application’s supported locales, in order to get the best 4080 match.</p> 4081 <p>If the naïve resource bundle lookup is used, the desired 4082 locale needs to be canonicalized using 4.3 <a href= 4083 "#Likely_Subtags">Likely Subtags</a> and the supplemental alias 4084 information, so that locales that CLDR considers identical are 4085 treated as such. Thus eng-Latn-GB should be mapped to en-GB, 4086 and cmn-TW mapped to zh-Hant-TW.</p> 4087 <p>For the purposes of CLDR, everything with the <ldml> 4088 dtd is treated logically as if it is one resource bundle, even 4089 if the implementation separates data into separate physical 4090 resource bundles. For example, suppose that there is a main XML 4091 file for Nama (naq), but there are no <unit> elements for 4092 it because the units are all inherited from root. If the 4093 <unit> elements are separated into a separate data tree 4094 for modularity in the implementation, the Nama <unit> 4095 resource bundle would be empty. However, for purposes of 4096 resource-bundle lookup the resource bundle lookup still stops 4097 at naq.xml.</p> 4098 <div id="iqaw2" style="margin-top: 0px; margin-bottom: 0px;"> 4099 <table class='simple' id="a1bn" border="1" cellpadding="3" 4100 cellspacing="0"> 4101 <caption> 4102 <a href="#Lookup-Differences" name="Lookup-Differences" 4103 id="Lookup-Differences">Lookup Differences</a> 4104 </caption> 4105 <tbody id="iqaw3"> 4106 <tr id="x40y0"> 4107 <th id="x40y1" style="vertical-align: top;" nowrap> 4108 Lookup Type</th> 4109 <th id="x40y3" style="vertical-align: top;" nowrap> 4110 Example</th> 4111 <th id="x40y5" style="vertical-align: top;"> 4112 Comments</th> 4113 </tr> 4114 <tr id="iqaw4"> 4115 <td id="iqaw5" style="vertical-align: top;" nowrap> 4116 <p id="rkc40"><strong>Resource bundle</strong> 4117 lookup</p> 4118 </td> 4119 <td id="iqaw7" style="vertical-align: top;" nowrap> 4120 <p>se-FI →</p> 4121 <p>se →</p> 4122 <p><em>default-locale* →</em></p> 4123 <p>root</p> 4124 </td> 4125 <td id="rkc41" style="vertical-align: top;"> 4126 <p>* The default-locale may have its own inheritance 4127 change; for example, it may be "en-GB → en" 4128 In that case, the chain is expanded by inserting the 4129 chain, resulting in:</p> 4130 <blockquote> 4131 <p>se-FI →</p> 4132 <p>se →</p> 4133 <p>fi →</p> 4134 <p><em>en-GB →</em></p> 4135 <p><em>en →</em></p> 4136 <p>root</p> 4137 </blockquote> 4138 </td> 4139 </tr> 4140 <tr id="iqaw9"> 4141 <td id="iqaw10" style="vertical-align: top;" nowrap> 4142 <p><strong>Inherited item</strong> lookup</p> 4143 </td> 4144 <td id="iqaw12" style="vertical-align: top;" nowrap> 4145 <p>se-FI+key →</p> 4146 <p>se+key →</p> 4147 <p><em>root_alias*+key </em></p> 4148 <p>→ root+key</p> 4149 </td> 4150 <td id="rkc43" style="vertical-align: top;"> 4151 <p>* If there is a root_alias to another key or 4152 locale, then insert that entire chain. For example, 4153 suppose that months for another calendar system have 4154 a root alias to Gregorian months. In that case, the 4155 root alias would change the key, and retry from se-FI 4156 downward. This can happen multiple times.</p> 4157 <blockquote> 4158 <p>se-FI+key →</p> 4159 <p>se+key →</p> 4160 <p>root_alias*+key →</p> 4161 <p><em>se-FI+key2 →</em></p> 4162 <p><em>se+key2 →</em></p> 4163 <p>root_alias*+key2 →</p> 4164 <p>root+key2</p> 4165 </blockquote> 4166 </td> 4167 </tr> 4168 </tbody> 4169 </table> 4170 </div> 4171 <p>Both the resource bundle inheritance and the inherited item 4172 inheritance use the parentLocale data, where available, instead 4173 of simple trunctation.</p> 4174 <p>The fallback is a bit different for these two cases; 4175 internal aliases and keys are are not involved in the bundle 4176 lookup, and the default locale is not involved in the item 4177 lookup. If the default-locale were used in the resource-item 4178 lookup, then strange results will occur. For example, suppose 4179 that the default locale is Swedish, and there is a Nama locale 4180 but no specific inherited item for collation. If the 4181 default-locale were used in resource-item lookup, it would 4182 produce odd and unexpected results for Nama sorting.</p> 4183 <p>The default locale is not even always used in resource 4184 bundle inheritance. For the following services, the fallback is 4185 always directly to the root locale rather than through default 4186 locale.</p> 4187 <ul> 4188 <li>collation</li> 4189 <li>break iteration</li> 4190 <li>case mapping</li> 4191 <li>transliteration 4192 <ul> 4193 <li>The lookup for transliteration is yet more 4194 complicated because of the interplay of source and target 4195 locales: see <em>Part 2 General, Section 4196 10.1 <a href= 4197 "https://www.unicode.org/reports/tr35/tr35-general.html#Inheritance">Inheritance.</a></em></li> 4198 </ul> 4199 </li> 4200 </ul> 4201 <p>Thus if there is no Akan locale, for example, asking for a 4202 collation for Akan should produce the root collation, <em>not 4203 the Swedish collation.</em></p> 4204 <p>The inherited item lookup must remain stable, because the 4205 resources are built with a certain fallback in mind; changing 4206 the core fallback order can render the bundle structure 4207 incoherent.</p> 4208 <p>Resource bundle lookup, on the other hand, is more flexible; 4209 changes in the view of the "best" match between the input 4210 request and the output bundle are more tolerant, when represent 4211 overall improvements for users. For more information, see 4212 <i><a href="#Fallback_Elements">A.1 Element 4213 fallback</a></i>.</p> 4214 <p>Where the LDML inheritance relationship does not match a 4215 target system, such as POSIX, the data logically should be 4216 fully resolved in converting to a format for use by that 4217 system, by adding <i>all</i> inherited data to each locale data 4218 set.</p> 4219 <p>For a more complete description of how inheritance applies 4220 to data, and the use of keywords, see <i><a href= 4221 "#Inheritance_and_Validity">Section 4.2 Inheritance</a></i> 4222 .</p> 4223 <p>The locale data does not contain general character 4224 properties that are derived from the <i>Unicode Character 4225 Database</i> [<a href= 4226 "https://unicode.org/reports/tr41/#UAX44">UAX44</a>]. That data 4227 being common across locales, it is not duplicated in the 4228 bundles. Constructing a POSIX locale from the CLDR data 4229 requires use of UCD data. In addition, POSIX locales may also 4230 specify the character encoding, which requires the data to be 4231 transformed into that target encoding.</p> 4232 <p><b>Warning:</b> If a locale has a different script than its 4233 parent (for example, sr_Latn), then special attention must be 4234 paid to make sure that all inheritance is covered. For example, 4235 auxiliary exemplar characters may need to be empty ("[]") to 4236 block inheritance.</p> 4237 <p><strong>Empty Override:</strong> There is one special value 4238 reserved in LDML to indicate that a child locale is to have no 4239 value for a path, even if the parent locale has a value for 4240 that path. That value is "∅∅∅". For example, if there is no 4241 phrase for "two days ago" in a language, that can be indicated 4242 with:</p> 4243 <pre><field type="day"> 4244 <relative type="-2">∅∅∅</relative> 4245</pre> 4246 <h4><a name="Multiple_Inheritance" id= 4247 "Multiple_Inheritance"></a><a name="Lateral_Inheritance" href= 4248 "#Lateral_Inheritance" id="Lateral_Inheritance">4.1.2 Lateral 4249 Inheritance</a></h4> 4250 <p>In the following instances, resources may inherit from 4251 within the same locale, <em>before inheriting from the parent</em>. </p> 4252 4253 <table border="1" cellpadding="3" cellspacing= 4254 "0" class='simple' > 4255 <tbody> 4256 <tr> 4257 <th nowrap style="vertical-align: top;">Element</th> 4258 <th nowrap style="vertical-align: top;">Source</th> 4259 <th nowrap style="vertical-align: top;">Context</th> 4260 </tr> 4261 <tr> 4262 <td style="vertical-align: top;">currency/pattern</td> 4263 <td style="vertical-align: top;">currencyFormat</td> 4264 <td style="vertical-align: top;">numberSystem = defaultNumberingSystem, unless otherwise specified*<br> 4265 currencyFormatLength type=none, unless otherwise specified<br> 4266 currencyFormat type="standard", unless otherwise specified</td> 4267 </tr> 4268 <tr> 4269 <td style="vertical-align: top;">currency/decimal</td> 4270 <td style="vertical-align: top;">symbols/decimal</td> 4271 <td style="vertical-align: top;">numberSystem = defaultNumberingSystem, unless otherwise specified</td> 4272 </tr> 4273 <tr> 4274 <td style="vertical-align: top;">currency/group</td> 4275 <td style="vertical-align: top;">symbols/group</td> 4276 <td style="vertical-align: top;">numberSystem = defaultNumberingSystem, unless otherwise specified</td> 4277 </tr> 4278 </tbody> 4279 </table> 4280 <p>* The "unless otherwise specified" clause is for when an API or other context indicates a different choice, such as <span style="vertical-align: top;">currencyFormat type="accounting"</span>. </p> 4281 <p>For example, with /currency [@type="CVE"], the decimal symbol for almost all locales is the value from symbols/decimal, but for pt_CV it is explicitly <decimal>$</decimal>.</p> 4282 <p> </p> 4283 <p>The following attributes use lateral inheritance for all elements with the DTD root = ldml, except where otherwise noted. The process is applied recursively.</p> 4284 <table border="1" cellpadding="3" cellspacing= 4285 "0" class='simple' > 4286 <tbody> 4287 <tr> 4288 <th nowrap style="vertical-align: top;">Atttribute</th> 4289 <th nowrap style="vertical-align: top;">Fallback</th> 4290 <th nowrap style="vertical-align: top;">Exception Elements</th> 4291 </tr> 4292 <tr> 4293 <td style="vertical-align: top;">case</td> 4294 <td style="vertical-align: top;">"nominative" → ∅</td> 4295 <td style="vertical-align: top;">caseMinimalPairs</td> 4296 </tr> 4297 <tr> 4298 <td style="vertical-align: top;">gender</td> 4299 <td style="vertical-align: top;">default_gender(locale) → ∅</td> 4300 <td style="vertical-align: top;">genderMinimalPairs</td> 4301 </tr> 4302 <tr> 4303 <td style="vertical-align: top;">count</td> 4304 <td style="vertical-align: top;">plural_rules(locale, x) → "other" → ∅</td> 4305 <td style="vertical-align: top;">minDays, pluralMinimalPairs</td> 4306 </tr> 4307 <tr> 4308 <td style="vertical-align: top;">ordinal</td> 4309 <td style="vertical-align: top;">plural_rules(locale, x) → "other" → ∅</td> 4310 <td style="vertical-align: top;">ordinalMinimalPairs</td> 4311 </tr> 4312 </tbody> 4313 </table> 4314 <p>The gender fallback is to neuter if the locale has a neuter gender, otherwise masculine. This may be extended in the future if necessary. See also <a href="tr35-general.html#Grammatical_Features">Part 2, Section 15, Grammatical Features</a>.</p> 4315 4316 <p>For example, if there is no value for a path, and that path has a 4317 [@count="x"] attribute and value, then:</p> 4318 <ol> 4319 <li>If "x" is numeric, the path falls back to the path with [@count=«the plural rules category for x for that locale»], within that the same locale. 4320 <ol> 4321 <li>For example, [@count="0"] for English falls back to @count="other"], while for French falls back to [@count="one"].</li> 4322 </ol> 4323 </li> 4324 <li>If "x" is anything but "other", it falls back to 4325 a path [@count="other"], within that the same locale.</li> 4326 <li>If "x" is "other", 4327 it falls back to the path 4328 that is completely missing the count item, within that the same locale.</li> 4329 <li>If there is no value for that path the same locale, the same 4330 process is used for the original path in the parent locale.</li> 4331 </ol> 4332 4333 <p>A path may have multiple attributes with lateral inheritance. In such a case, all of the combinations are tried, and in the order supplied above. For example (this is the very worst case):</p> 4334 <p> /compoundUnitPattern1[@count="few"][@gender="feminine"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4335 <p>/compoundUnitPattern1[@count="few"][@gender="feminine"][@case="nominative">]<span style="vertical-align: top;"> →</span> </p> 4336 <p>/compoundUnitPattern1[@count="few"][@gender="feminine"]<span style="vertical-align: top;"> →</span></p> 4337 <p>/compoundUnitPattern1[@count="few"][@gender="neuter"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4338 <p>/compoundUnitPattern1[@count="few"][@gender="neuter"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4339 <p>/compoundUnitPattern1[@count="few"][@gender="neuter"]<span style="vertical-align: top;"> →</span></p> 4340 <p>/compoundUnitPattern1[@count="few"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4341 <p>/compoundUnitPattern1[@count="few"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4342 <p>/compoundUnitPattern1[@count="few"]<span style="vertical-align: top;"> →</span></p> 4343 <p> </p> 4344 <p>/compoundUnitPattern1[@count="other"][@gender="feminine"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4345 <p>/compoundUnitPattern1[@count="other"][@gender="feminine"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4346 <p>/compoundUnitPattern1[@count="other"][@gender="feminine"]<span style="vertical-align: top;"> →</span></p> 4347 <p>/compoundUnitPattern1[@count="other"][@gender="neuter"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4348 <p>/compoundUnitPattern1[@count="other"][@gender="neuter"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4349 <p>/compoundUnitPattern1[@count="other"][@gender="neuter"]<span style="vertical-align: top;"> →</span></p> 4350 <p>/compoundUnitPattern1[@count="other"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4351 <p>/compoundUnitPattern1[@count="other"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4352 <p>/compoundUnitPattern1[@count="other"]<span style="vertical-align: top;"> →</span></p> 4353 <p> </p> 4354 <p>/compoundUnitPattern1[@gender="feminine"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4355 <p>/compoundUnitPattern1[@gender="feminine"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4356 <p>/compoundUnitPattern1[@gender="feminine"]<span style="vertical-align: top;"> →</span></p> 4357 <p>/compoundUnitPattern1[@gender="neuter"][@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4358 <p>/compoundUnitPattern1[@gender="neuter"][@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4359 <p>/compoundUnitPattern1[@gender="neuter"]<span style="vertical-align: top;"> →</span></p> 4360 <p>/compoundUnitPattern1[@case="accusative">]<span style="vertical-align: top;"> →</span></p> 4361 <p>/compoundUnitPattern1[@case="nominative">]<span style="vertical-align: top;"> →</span></p> 4362 <p>/compoundUnitPattern1</p> 4363 4364 <p> </p> 4365 <p><em>Examples:</em></p> 4366 <table class='simple' border="1" cellpadding="3" cellspacing= 4367 "0" id="a1bn3"> 4368 <caption> 4369 <a name="Count_Fallback_normal" href= 4370 "#Count_Fallback_normal" id="Count_Fallback_normal">Count 4371 Fallback: normal</a> 4372 </caption> 4373 <tbody> 4374 <tr> 4375 <th nowrap style="vertical-align: top;">Locale</th> 4376 <th nowrap style="vertical-align: top;">Path</th> 4377 </tr> 4378 <tr> 4379 <td nowrap style="vertical-align: top;">fr-CA</td> 4380 <td nowrap id="iqaw" style="vertical-align: top;"> 4381 <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong></code></td> 4382 </tr> 4383 <tr> 4384 <td nowrap style="vertical-align: top;">fr-CA</td> 4385 <td nowrap id="iqaw16" style="vertical-align: top;"> 4386 <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong></code></td> 4387 </tr> 4388 <tr> 4389 <td nowrap style="vertical-align: top;">fr</td> 4390 <td nowrap id="iqaw19" style="vertical-align: top;"> 4391 <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong></code></td> 4392 </tr> 4393 <tr> 4394 <td nowrap style="vertical-align: top;">fr</td> 4395 <td nowrap id="iqaw18" style="vertical-align: top;"> 4396 <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong></code></td> 4397 </tr> 4398 <tr> 4399 <td nowrap style="vertical-align: top;">root</td> 4400 <td nowrap id="iqaw21" style="vertical-align: top;"> 4401 <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="x"]</strong></code></td> 4402 </tr> 4403 <tr> 4404 <td nowrap style="vertical-align: top;">root</td> 4405 <td nowrap id="iqaw20" style="vertical-align: top;"> 4406 <code>//ldml/units/unitLength[@type="<strong>narrow</strong>"]/unit[@type="mass-gram"]/unitPattern<strong>[@count="other"]</strong></code></td> 4407 </tr> 4408 </tbody> 4409 </table> 4410 <p>Note that there may be an alias in root that changes the 4411 path and starts again from the requested locale, such as:</p> 4412 <p><code><unitLength type="<strong>narrow</strong>"><br> 4413 <alias source="locale" 4414 path="../unitLength[@type='<strong>short</strong>']"/><br> 4415 </unitLength></code></p> 4416 <table class='simple' border="1" cellpadding="3" cellspacing= 4417 "0" id="a1bn2"> 4418 <caption> 4419 <a name="Count_Fallback_currency" href= 4420 "#Count_Fallback_currency" id= 4421 "Count_Fallback_currency">Count Fallback: currency</a> 4422 </caption> 4423 <tbody> 4424 <tr> 4425 <th nowrap style="vertical-align: top;">Locale</th> 4426 <th nowrap style="vertical-align: top;">Path</th> 4427 </tr> 4428 <tr> 4429 <td nowrap style="vertical-align: top;">fr-CA</td> 4430 <td nowrap id="iqaw11" style="vertical-align: top;"> 4431 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong></code></td> 4432 </tr> 4433 <tr> 4434 <td nowrap style="vertical-align: top;">fr-CA</td> 4435 <td nowrap id="iqaw6" style="vertical-align: top;"> 4436 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong></code></td> 4437 </tr> 4438 <tr> 4439 <td nowrap style="vertical-align: top;">fr-CA</td> 4440 <td nowrap id="iqaw8" style="vertical-align: top;"> 4441 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td> 4442 </tr> 4443 <tr> 4444 <td nowrap style="vertical-align: top;">fr</td> 4445 <td nowrap id="iqaw15" style="vertical-align: top;"> 4446 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong></code></td> 4447 </tr> 4448 <tr> 4449 <td nowrap style="vertical-align: top;">fr</td> 4450 <td nowrap id="iqaw14" style="vertical-align: top;"> 4451 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong></code></td> 4452 </tr> 4453 <tr> 4454 <td nowrap style="vertical-align: top;">fr</td> 4455 <td nowrap id="iqaw13" style="vertical-align: top;"> 4456 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td> 4457 </tr> 4458 <tr> 4459 <td nowrap style="vertical-align: top;">root</td> 4460 <td nowrap id="iqaw25" style="vertical-align: top;"> 4461 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="x"]</strong></code></td> 4462 </tr> 4463 <tr> 4464 <td nowrap style="vertical-align: top;">root</td> 4465 <td nowrap id="iqaw24" style="vertical-align: top;"> 4466 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName<strong>[@count="other"]</strong></code></td> 4467 </tr> 4468 <tr> 4469 <td nowrap style="vertical-align: top;">root</td> 4470 <td nowrap id="iqaw23" style="vertical-align: top;"> 4471 <code>//ldml/numbers/currencies/currency[@type="CAD"]/displayName</code></td> 4472 </tr> 4473 </tbody> 4474 </table><br> 4475 <h4><a name="Parent_Locales" href="#Parent_Locales" id= 4476 "Parent_Locales">4.1.3 Parent Locales</a></h4> 4477 <p class="dtd"><!ELEMENT parentLocales ( parentLocale* ) 4478 ><br> 4479 <!ELEMENT parentLocale EMPTY ><br> 4480 <!ATTLIST parentLocale parent NMTOKEN #REQUIRED ><br> 4481 <!ATTLIST parentLocale locales NMTOKENS #REQUIRED ></p> 4482 <p>In some cases, the normal truncation inheritance does not 4483 function well. This happens when:</p> 4484 <ol> 4485 <li>The child locale is of a different script. In this case, 4486 mixing elements from the parent into the child data results 4487 in a mishmash.</li> 4488 <li>A large number of child locales behave similarly, and 4489 differently from the truncation parent.</li> 4490 </ol> 4491 <p>The <span class="element">parentLocale</span> element is 4492 used to override the normal inheritance when accessing CLDR 4493 data.</p> 4494 <p>For case 1, the children are script locales, and the parent 4495 is "root". For example:</p> 4496 <pre> 4497 <parentLocale parent="root" locales="az_Cyrl ha_Arab … zh_Hant"/></pre> 4498 <p>For case 2, the children and parent share the same primary 4499 language, but the region is changed. For example:</p> 4500 <pre> 4501 <parentLocale parent="es_419" locales="es_AR es_BO … es_UY es_VE"/></pre> 4502 <p>Collation data, however, is an exception. Since collation 4503 rules do not truly inherit data from the parent, the 4504 parentLocale element is not necessary and not used for 4505 collation. Thus, for a locale like zh_Hant in the example 4506 above, the parentLocale element would dictate the parent as 4507 "root" when referring to main locale data, but for collation 4508 data, the parent locale would still be "zh", even though the 4509 parentLocale element is present for that locale.</p> 4510 <p>Since parentLocale information is not localizable on a per 4511 locale basis, the parentLocale information is contained in 4512 CLDR’s <a href="tr35-info.html">supplemental data.</a></p> 4513 <p>When a <span class="element">parentLocale</span> element is 4514 used to override normal inheritance, the following invariants 4515 must always be true:</p> 4516 <ol> 4517 <li>If X is the parentLocale of Y, then either X is the root 4518 locale, or X has the same base language code as Y. For 4519 example, the parent of "en" cannot be "fr", and the parent of 4520 "en_YY" cannot be "fr" or "fr_XX".</li> 4521 <li>If X is the parentLocale of Y, Y must not be a base 4522 language locale. For example, the parent of "en" cannot be 4523 "en_XX".</li> 4524 <li>There can never be cycles, such as: X parent of Y ... 4525 parent of X.</li> 4526 </ol> 4527 <h3><a name="Inheritance_and_Validity" href= 4528 "#Inheritance_and_Validity" id="Inheritance_and_Validity">4.2 4529 Inheritance and Validity</a></h3> 4530 <p>The following describes in more detail how to determine the 4531 exact inheritance of elements, and the validity of a given 4532 element in LDML.</p> 4533 <h4><a name="Definitions" href="#Definitions" id= 4534 "Definitions">4.2.1 Definitions</a></h4> 4535 <p><i>Blocking</i> elements are those whose subelements do not 4536 inherit from parent locales. For example, a <collation> 4537 element is a blocking element: everything in a 4538 <collation> element is treated as a single lump of data, 4539 as far as inheritance is concerned. For more information, see 4540 <a href="#Valid_Attribute_Values">Section 5.5 Valid Attribute 4541 Values</a>.</p> 4542 <p>Attributes that serve to distinguish multiple elements at 4543 the same level are called <i>distinguishing</i> attributes. For 4544 example, the <i>type</i> attribute distinguishes different 4545 elements in lists of translations, such as:</p> 4546 <pre><language type="aa">Afar</language> 4547<language type="ab">Abkhazian</language></pre> 4548 <p>Distinguishing attributes affect inheritance; two elements 4549 with different distinguishing attributes are treated as 4550 different for purposes of inheritance. For more information, 4551 see <a href="#Valid_Attribute_Values">Section 5.5 Valid 4552 Attribute Values</a>. Other attributes are called 4553 nondistinguishing (or informational) attributes. These carry 4554 separate information, and do not affect inheritance.</p> 4555 <p>For any element in an XML file, <i>an element chain</i> is a 4556 resolved [<a href="#XPath">XPath</a>] leading from the root to 4557 an element, with attributes on each element in alphabetical 4558 order. So in, say, <a href= 4559 "https://github.com/unicode-org/cldr/blob/master/common/main/el.xml">https://github.com/unicode-org/cldr/blob/master/common/main/el.xml</a> 4560 we may have:</p> 4561 <pre><ldml> 4562 <identity> 4563 <version number="1.1" /> 4564 <language type="el" /> 4565 </identity> 4566 <localeDisplayNames> 4567 <languages> 4568 <language type="ar">Αραβικά</language> 4569...</pre> 4570 <p>Which gives the following element chains (among others):</p> 4571 <ul> 4572 <li>//ldml/identity/version[@number="1.1"]</li> 4573 <li> 4574 //ldml/localeDisplayNames/languages/language[@type="ar"]</li> 4575 </ul> 4576 <p>An element chain A is an <i>extension</i> of an element 4577 chain B if B is equivalent to an initial portion of A. For 4578 example, #2 below is an extension of #1. (Equivalent, depending 4579 on the tree, may not be "identical to". See below for an 4580 example.)</p> 4581 <ol> 4582 <li>//ldml/localeDisplayNames</li> 4583 <li> 4584 //ldml/localeDisplayNames/languages/language[@type="ar"]</li> 4585 </ol> 4586 <p>An LDML file can be thought of as an ordered list of 4587 <i>element pairs</i>: <element chain, data>, where the 4588 element chains are all the chains for the end-nodes. (This 4589 works because of restrictions on the structure of LDML, 4590 including that it does not allow mixed content.) The ordering 4591 is the ordering that the element chains are found in the file, 4592 and thus determined by the DTD.</p> 4593 <p>For example, some of those pairs would be the following. 4594 Notice that the first has the null string as element 4595 contents.</p> 4596 <ul> 4597 <li><b><</b>//ldml/identity/version[@number="1.1"]<b>,</b> 4598 ""<b>></b></li> 4599 <li> 4600 <b><</b>//ldml/localeDisplayNames/languages/language[@type="ar"]<b>,</b> 4601 "Αραβικά"<b>></b></li> 4602 </ul> 4603 <blockquote> 4604 <p><b>Note:</b> There are two exceptions to this:</p> 4605 <ol> 4606 <li>Blocking nodes and their contents are treated as a 4607 single end node.</li> 4608 <li>In terms of computing inheritance, the element pair 4609 consists of the element chain plus all distinguishing 4610 attributes; the value consists of the value (if any) plus 4611 any nondistinguishing attributes.</li> 4612 </ol> 4613 <blockquote> 4614 <p>Thus instead of the element pair being (a) below, it is 4615 (b):</p> 4616 <ol type="a"> 4617 <li> 4618 <b><</b>//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart[@day='sun'][@time='00:00']<b>,</b><br> 4619 4620 <b>""></b></li> 4621 <li> 4622 <b><</b>//ldml/dates/calendars/calendar[@type='gregorian']/week/weekendStart<b>,</b><br> 4623 4624 [@day='sun'][@time='00:00']<b>></b></li> 4625 </ol> 4626 </blockquote> 4627 </blockquote> 4628 <p>Two LDML element chains are <i>equivalent</i> when they 4629 would be identical if all attributes and their values were 4630 removed — except for distinguishing attributes. Thus the 4631 following are equivalent:</p> 4632 <ul> 4633 <li> 4634 <code>//ldml/localeDisplayNames/languages/language[@type="ar"]</code></li> 4635 <li> 4636 <code>//ldml/localeDisplayNames/languages/language[@type="ar"][@draft="unconfirmed"]</code></li> 4637 </ul> 4638 <p>For any locale ID, an <i>locale chain</i> is an ordered list 4639 starting with the root and leading down to the ID. For 4640 example:</p> 4641 <blockquote> 4642 <p><root, de, de_DE, de_DE_xxx></p> 4643 </blockquote> 4644 <h4><a name="Resolved_Data_File" href="#Resolved_Data_File" id= 4645 "Resolved_Data_File">4.2.2 Resolved Data File</a></h4> 4646 <p>To produce fully resolved locale data file from CLDR for a 4647 locale ID L, you start with L, and successively add unique 4648 items from the parent locales until you get up to root. More 4649 formally, this can be expressed as the following procedure.</p> 4650 <ol> 4651 <li>Let Result be initially L.</li> 4652 <li>For each Li in the locale chain for L, starting at L and 4653 going up to root: 4654 <ol> 4655 <li>Let Temp be a copy of the pairs in the LDML file for 4656 Li</li> 4657 <li>Replace each alias in Temp by the resolved list of 4658 pairs it points to. 4659 <ol> 4660 <li>The resolved list of pairs is obtained by 4661 recursively applying this procedure.</li> 4662 <li>That alias now blocks any inheritance from the 4663 parent. (See <i><a href="#Common_Elements">Section 4664 5.1 Common Elements</a></i> for an example.)</li> 4665 </ol> 4666 </li> 4667 <li>For each element pair P in Temp: 4668 <ol> 4669 <li>If P does not contain a blocking element, and 4670 Result does not have an element pair Q with an 4671 equivalent element chain, add P to Result.</li> 4672 </ol> 4673 </li> 4674 </ol> 4675 </li> 4676 </ol> 4677 <p><b>Notes:</b></p> 4678 <ul> 4679 <li>When adding an element pair to a result, it has to go in 4680 the right order for it to be valid according to the DTD.</li> 4681 <li>The identity element and its children are unaffected by 4682 resolution.</li> 4683 <li>The LDML data must be constructed so as to avoid 4684 circularity in step 2.2.</li> 4685 </ul> 4686 <h4><a name="Valid_Data" href="#Valid_Data" id= 4687 "Valid_Data">4.2.3 Valid Data</a></h4> 4688 <p>The attribute <i>draft="x"</i> in LDML means that the data 4689 has not been approved by the subcommittee. (For more 4690 information, see <a href= 4691 "http://cldr.unicode.org/index/process">Process</a>). However, 4692 some data that is not explicitly marked as <i>draft</i> may be 4693 implicitly <i>draft</i>, either because it inherits it from a 4694 parent, or from an enclosing element.</p> 4695 <p><b>Example 2.</b> Suppose that new locale data is added for 4696 af (Afrikaans). To indicate that all of the data is 4697 <i>unconfirmed</i>, the attribute can be added to the top 4698 level.</p> 4699 <p><code><ldml version="1.1" draft="unconfirmed"><br> 4700 <identity><br> 4701 <version number="1.1" /><br> 4702 <language type="af" /><br> 4703 </identity><br> 4704 <characters>...</characters><br> 4705 <localeDisplayNames>...</localeDisplayNames><br> 4706 4707 </ldml></code></p> 4708 <p>Any data can be added to that file, and the status will all 4709 be draft=<i>unconfirmed</i>. Once an item is vetted—<i>whether 4710 it is inherited or explicitly in the file</i>—then its status 4711 can be changed to <i>approved</i>. This can be done either by 4712 leaving draft="unconfirmed" on the enclosing element and 4713 marking the child with draft="approved", such as:</p> 4714 <p><code><ldml version="1.1" draft="unconfirmed"><br> 4715 <identity><br> 4716 <version number="1.1" /><br> 4717 <language type="af" /><br> 4718 </identity><br> 4719 <characters 4720 draft="approved">...</characters><br> 4721 <localeDisplayNames>...</localeDisplayNames><br> 4722 4723 <dates/><br> 4724 <numbers/><br> 4725 <collations/><br> 4726 </ldml></code></p> 4727 <p>However, normally the draft attributes should be 4728 canonicalized, which means they are pushed down to leaf nodes 4729 as described in <i><a href="#Canonical_Form">Section 5.6 4730 Canonical Form</a></i>. If an LDML file does has draft 4731 attributes that are not on leaf nodes, the file should be 4732 interpreted as if it were the canonicalized version of that 4733 file.</p> 4734 <p>More formally, here is how to determine whether data for an 4735 element chain E is implicitly or explicitly draft, given a 4736 locale L. Sections 1, 2, and 4 are simply formalizations of 4737 what is in LDML already. Item 3 adds the new element.</p> 4738 <h4><a name="Checking_for_Draft_Status" href= 4739 "#Checking_for_Draft_Status" id= 4740 "Checking_for_Draft_Status">4.2.4 Checking for Draft 4741 Status</a></h4> 4742 <ol> 4743 <li> 4744 <b>Parent Locale Inheritance</b> 4745 <ol> 4746 <li>Walk through the locale chain until you find a locale 4747 ID L' with a data file D. (L' may equal L).</li> 4748 <li>Produce the fully resolved data file D' for D.</li> 4749 <li>In D', find the first element pair whose element 4750 chain E' is either equivalent to or an extension of 4751 E.</li> 4752 <li>If there is no such E', return <i>true</i></li> 4753 <li>If E' is not equivalent to E, truncate E' to the 4754 length of E.</li> 4755 </ol> 4756 </li> 4757 <li> 4758 <b>Enclosing Element Inheritance</b> 4759 <ol> 4760 <li>Walk through the elements in E', from back to front. 4761 <ol> 4762 <li>If you ever encounter draft=<i>x</i>, return 4763 <i>x</i></li> 4764 </ol> 4765 </li> 4766 <li>If L' = L, return <i>false</i></li> 4767 </ol> 4768 </li> 4769 <li> 4770 <b>Missing File Inheritance</b> 4771 <ol> 4772 <li>Otherwise, walk again through the elements in E', 4773 from back to front. 4774 <ol> 4775 <li>If you encounter a validSubLocales attribute 4776 (deprecated): 4777 <ol> 4778 <li>If L is in the attribute value, return 4779 <i>false</i></li> 4780 <li>Otherwise return <i>true</i></li> 4781 </ol> 4782 </li> 4783 </ol> 4784 </li> 4785 </ol> 4786 </li> 4787 <li> 4788 <b>Otherwise</b> 4789 <ol> 4790 <li>Return <i>true</i></li> 4791 </ol> 4792 </li> 4793 </ol> 4794 <p>The validSubLocales in the most specific (farthest from root 4795 file) locale file "wins" through the full resolution step (data 4796 from more specific files replacing data from less specific 4797 ones).</p> 4798 <h4><a name="Keyword_and_Default_Resolution" href= 4799 "#Keyword_and_Default_Resolution" id= 4800 "Keyword_and_Default_Resolution">4.2.5 Keyword and Default 4801 Resolution</a></h4> 4802 <p>When accessing data based on keywords, the following process 4803 is used. Consider the following example:</p> 4804 <ul> 4805 <li>The locale 'de' has collation types A, B, C, and no 4806 <default> element</li> 4807 <li>The locale 'de_CH' has <default type='B'></li> 4808 </ul> 4809 <p>Here are the searches for various combinations.</p> 4810 <table class='simple' border="1" cellpadding="0" cellspacing= 4811 "0"> 4812 <tr> 4813 <td><strong>User Input</strong></td> 4814 <td><strong>Lookup in Locale</strong></td> 4815 <td><strong>For</strong></td> 4816 <td><strong>Comment</strong></td> 4817 </tr> 4818 <tr> 4819 <td rowspan="3">de_CH<br> 4820 <em>no keyword</em></td> 4821 <td>de_CH</td> 4822 <td>default collation type</td> 4823 <td>finds "B"</td> 4824 </tr> 4825 <tr> 4826 <td>de_CH</td> 4827 <td>collation type=B</td> 4828 <td>not found</td> 4829 </tr> 4830 <tr> 4831 <td>de</td> 4832 <td>collation type=B</td> 4833 <td><em>found</em></td> 4834 </tr> 4835 <tr> 4836 <td rowspan="4">de<br> 4837 <em>no keyword</em></td> 4838 <td>de</td> 4839 <td>default collation type</td> 4840 <td>not found</td> 4841 </tr> 4842 <tr> 4843 <td>root</td> 4844 <td>default collation type</td> 4845 <td>finds "standard"</td> 4846 </tr> 4847 <tr> 4848 <td>de</td> 4849 <td>collation type=standard</td> 4850 <td>not found</td> 4851 </tr> 4852 <tr> 4853 <td>root</td> 4854 <td>collation type=standard</td> 4855 <td><i>found</i></td> 4856 </tr> 4857 <tr> 4858 <td>de_u_co_A</td> 4859 <td>de</td> 4860 <td>collation type=A</td> 4861 <td><i>found</i></td> 4862 </tr> 4863 <tr> 4864 <td rowspan="2">de_u_co_standard</td> 4865 <td>de</td> 4866 <td>collation type=standard</td> 4867 <td>not found</td> 4868 </tr> 4869 <tr> 4870 <td>root</td> 4871 <td>collation type=standard</td> 4872 <td><i>found</i></td> 4873 </tr> 4874 <tr> 4875 <td rowspan="6">de_u_co_foobar</td> 4876 <td>de</td> 4877 <td>collation type=foobar</td> 4878 <td>not found</td> 4879 </tr> 4880 <tr> 4881 <td>root</td> 4882 <td>collation type=foobar</td> 4883 <td>not found, starts looking for default</td> 4884 </tr> 4885 <tr> 4886 <td>de</td> 4887 <td>default collation type</td> 4888 <td>not found</td> 4889 </tr> 4890 <tr> 4891 <td>root</td> 4892 <td>default collation type</td> 4893 <td>finds "standard"</td> 4894 </tr> 4895 <tr> 4896 <td>de</td> 4897 <td>collation type=standard</td> 4898 <td>not found</td> 4899 </tr> 4900 <tr> 4901 <td>root</td> 4902 <td>collation type=standard</td> 4903 <td><i>found</i></td> 4904 </tr> 4905 </table> 4906 <p>Examples of "search" collator lookup; 'de' has a 4907 language-specific version, but 'en' does not:</p> 4908 <table class='simple' border="1" cellpadding="0" cellspacing= 4909 "0"> 4910 <tr> 4911 <td><strong>User Input</strong></td> 4912 <td><strong>Lookup in Locale</strong></td> 4913 <td><strong>For</strong></td> 4914 <td><strong>Comment</strong></td> 4915 </tr> 4916 <tr> 4917 <td rowspan="2">de_CH_u_co_search</td> 4918 <td>de_CH</td> 4919 <td>collation type=search</td> 4920 <td>not found</td> 4921 </tr> 4922 <tr> 4923 <td>de</td> 4924 <td>collation type=search</td> 4925 <td><i>found</i></td> 4926 </tr> 4927 <tr> 4928 <td rowspan="3">en_US_u_co_search</td> 4929 <td>en_US</td> 4930 <td>collation type=search</td> 4931 <td>not found</td> 4932 </tr> 4933 <tr> 4934 <td>en</td> 4935 <td>collation type=search</td> 4936 <td>not found</td> 4937 </tr> 4938 <tr> 4939 <td>root</td> 4940 <td>collation type=search</td> 4941 <td><i>found</i></td> 4942 </tr> 4943 </table> 4944 <p>Examples of lookup for Chinese collation types. Note:</p> 4945 <ul> 4946 <li>All of the Chinese-specific collation types are provided 4947 in the 'zh' locale</li> 4948 <li>For 'zh' the <default> element specifies "pinyin"; 4949 for 'zh_Hant' the <default> element specifies "stroke". 4950 However any of the available Chinese collation types can be 4951 explicitly requested for any Chinese locale.</li> 4952 </ul> 4953 <table class='simple' border="1" cellpadding="0" cellspacing= 4954 "0"> 4955 <tr> 4956 <td><strong>User Input</strong></td> 4957 <td><strong>Lookup in Locale</strong></td> 4958 <td><strong>For</strong></td> 4959 <td><strong>Comment</strong></td> 4960 </tr> 4961 <tr> 4962 <td rowspan="3">zh_Hant<br> 4963 <em>no keyword</em></td> 4964 <td>zh_Hant</td> 4965 <td>default collation type</td> 4966 <td>finds "stroke"</td> 4967 </tr> 4968 <tr> 4969 <td>zh_Hant</td> 4970 <td>collation type=stroke</td> 4971 <td>not found</td> 4972 </tr> 4973 <tr> 4974 <td>zh</td> 4975 <td>collation type=stroke</td> 4976 <td><i>found</i></td> 4977 </tr> 4978 <tr> 4979 <td rowspan="3">zh_Hant_HK_u_co_pinyin</td> 4980 <td>zh_Hant_HK</td> 4981 <td>collation type=pinyin</td> 4982 <td>not found</td> 4983 </tr> 4984 <tr> 4985 <td>zh_Hant</td> 4986 <td>collation type=pinyin</td> 4987 <td>not found</td> 4988 </tr> 4989 <tr> 4990 <td>zh</td> 4991 <td>collation type=pinyin</td> 4992 <td><i>found</i></td> 4993 </tr> 4994 <tr> 4995 <td rowspan="2">zh<br> 4996 <em>no keyword</em></td> 4997 <td>zh</td> 4998 <td>default collation type</td> 4999 <td>finds "pinyin"</td> 5000 </tr> 5001 <tr> 5002 <td>zh</td> 5003 <td>collation type=pinyin</td> 5004 <td><i>found</i></td> 5005 </tr> 5006 </table> 5007 <blockquote> 5008 <p><b>Note:</b> It is an invariant that the default in root 5009 for a given element must<br> 5010 always be a value that exists in root. So you can not have 5011 the following in root:</p> 5012 </blockquote> 5013 <p><code><someElements><br> 5014 <default type='a'/><br> 5015 <someElement type='b'>...</someElement><br> 5016 <someElement type='c'>...</someElement><br> 5017 <b> <!-- no 'a' --></b><br> 5018 </someElements></code></p> 5019 <p>For identifiers, such as language codes, script codes, 5020 region codes, variant codes, types, keywords, currency symbols 5021 or currency display names, the default value is the identifier 5022 itself whenever if no value is found in the root. Thus if there 5023 is no display name for the region code 'QA' in root, then the 5024 display name is simply 'QA'.</p> 5025 <h4><a name="Inheritance_vs_Related" href= 5026 "#Inheritance_vs_Related" id="Inheritance_vs_Related">4.2.6 5027 Inheritance vs Related Information</a></h4> 5028 <p>There are related types of data and processing that are easy 5029 to confuse:</p> 5030 <table class='simple'> 5031 <tr> 5032 <td rowspan="4"> 5033 <p><strong>Inheritance</strong></p> 5034 </td> 5035 <td colspan="2">Part of the internal mechanism used by CLDR 5036 to organize and manage locale data. This is used to share 5037 common resources, and ease maintenance, and provide the 5038 best fallback behavior in the absence of data. <em>Should 5039 not be used for locale matching or likely 5040 subtags.</em></td> 5041 </tr> 5042 <tr> 5043 <td><em>Example:</em></td> 5044 <td>parent(en_AU) ⇒ en_001<br> 5045 parent(en_001) ⇒ en<br> 5046 parent(en) ⇒ root</td> 5047 </tr> 5048 <tr> 5049 <td><em>Data:</em></td> 5050 <td>supplementalData.xml <parentLocale></td> 5051 </tr> 5052 <tr> 5053 <td><em>Spec:</em></td> 5054 <td><strong>Section <a href="#Inheritance_and_Validity">4.2 5055 Inheritance and Validity</a></strong></td> 5056 </tr> 5057 <tr> 5058 <td rowspan="4"><strong>DefaultContent</strong></td> 5059 <td colspan="2">Part of the internal mechanism used by CLDR 5060 to manage locale data. A particular sublocale is designated 5061 the defaultContent for a parent, so that the parent 5062 exhibits consistent behavior. <em>Should not be used for 5063 locale matching or likely subtags.</em></td> 5064 </tr> 5065 <tr> 5066 <td><em>Example:</em></td> 5067 <td>addLikelySubtags(sr-ME) ⇒ sr-Latn-ME, 5068 minimize(de-Latn-DE) ⇒ de</td> 5069 </tr> 5070 <tr> 5071 <td><em>Data:</em></td> 5072 <td>supplementalMetadata.xml <defaultContent></td> 5073 </tr> 5074 <tr> 5075 <td><em>Spec:</em></td> 5076 <td><strong>Part 6: Section 9.3 <a href= 5077 "tr35-info.html#Default_Content">Default 5078 Content</a></strong></td> 5079 </tr> 5080 <tr> 5081 <td rowspan="4"><strong>LikelySubtags</strong></td> 5082 <td colspan="2">Provides most likely full subtag (script 5083 and region) in the absence of other information. A core 5084 component of LocaleMatching.</td> 5085 </tr> 5086 <tr> 5087 <td><em>Example:</em></td> 5088 <td>addLikelySubtags(zh) ⇒ zh-Hans-CN<br> 5089 addLikelySubtags(zh-TW) ⇒ zh-Hant-TW<br> 5090 minimize(zh-Hans, favorRegion) ⇒ zh-TW</td> 5091 </tr> 5092 <tr> 5093 <td><em>Data:</em></td> 5094 <td>likelySubtags.xml <likelySubtags></td> 5095 </tr> 5096 <tr> 5097 <td><em>Spec:</em></td> 5098 <td><strong>Section <a href="#Likely_Subtags">4.3 Likely 5099 Subtags</a></strong></td> 5100 </tr> 5101 <tr> 5102 <td rowspan="4"><strong>LocaleMatching</strong></td> 5103 <td colspan="2">Provides the best match for the user’s 5104 language(s) among an application’s supported 5105 languages.</td> 5106 </tr> 5107 <tr> 5108 <td><em>Example:</em></td> 5109 <td>bestLocale(userLangs=<en, fr>, 5110 appLangs=<fr-CA, ru>) ⇒ fr-CA</td> 5111 </tr> 5112 <tr> 5113 <td><em>Data:</em></td> 5114 <td>languageInfo.xml <languageMatching></td> 5115 </tr> 5116 <tr> 5117 <td><em>Spec:</em></td> 5118 <td><strong>Section <a href="#LanguageMatching">4.4 5119 Language Matching</a></strong></td> 5120 </tr> 5121 </table> 5122 <h3><a name="Likely_Subtags" href="#Likely_Subtags" id= 5123 "Likely_Subtags">4.3 Likely Subtags</a></h3> 5124 <p class="dtd"><!ELEMENT likelySubtag EMPTY ><br> 5125 <!ATTLIST likelySubtag from NMTOKEN #REQUIRED><br> 5126 <!ATTLIST likelySubtag to NMTOKEN #REQUIRED></p> 5127 <p>There are a number of situations where it is useful to be 5128 able to find the most likely language, script, or region. For 5129 example, given the language "zh" and the region "TW", what is 5130 the most likely script? Given the script "Thai" what is the 5131 most likely language or region? Given the region TW, what is 5132 the most likely language and script?</p> 5133 <p>Conversely, given a locale, it is useful to find out which 5134 fields (language, script, or region) may be superfluous, in the 5135 sense that they contain the likely tags. For example, "en_Latn" 5136 can be simplified down to "en" since "Latn" is the likely 5137 script for "en"; "ja_Jpan_JP" can be simplified down to 5138 "ja".</p> 5139 <p>The <i>likelySubtag</i> supplemental data provides default 5140 information for computing these values. This data is based on 5141 the default content data, the population data, and the 5142 suppress-script data in [<a href="#BCP47">BCP47</a>]. It is 5143 heuristically derived, and may change over time.</p> 5144 <p>For the relationship between Inheritance, DefaultContent, 5145 LikelySubtags, and LocaleMatching, see <strong><em>Section 5146 4.2.6 <a href="tr35.html#Inheritance_vs_Related">Inheritance vs 5147 Related Information</a></em></strong>.</p> 5148 <p>To look up data in the table, see if a locale matches one of 5149 the <b>from</b> attribute values. If so, fetch the 5150 corresponding <b>to</b> attribute value. For example, the 5151 Chinese data looks like the following:</p> 5152 <blockquote> 5153 <p class="example"><likelySubtag from="zh" 5154 to="zh_Hans_CN"/><br> 5155 <likelySubtag from="zh_HK" to="zh_Hant_HK"/><br> 5156 <likelySubtag from="zh_Hani" to="zh_Hani_CN"/><br> 5157 <likelySubtag from="zh_Hant" to="zh_Hant_TW"/><br> 5158 <likelySubtag from="zh_MO" to="zh_Hant_MO"/><br> 5159 <likelySubtag from="zh_TW" to="zh_Hant_TW"/></p> 5160 </blockquote> 5161 <p>So looking up "zh_TW" returns "zh_Hant_TW", while looking up 5162 "zh" returns "zh_Hans_CN".</p> 5163 <p>In more detail, the data is designed to be used in the 5164 following operations.</p> 5165 <p>Note that as of CLDR v24, any field present in the 'from' 5166 field, is also present in the 'to' field, so an input field 5167 will not change in "Add Likely Subtags" operation. The data and 5168 operations can also be used with language tags using [<a href= 5169 "#BCP47">BCP47</a>] syntax, with the appropriate changes. In 5170 addition, certain common 'denormalized' language subtags such 5171 as 'iw' (for 'he') may occur in both the 'from' and 'to' 5172 fields. This allows for implementations that use those 5173 denormalized subtags to use the data with only minor changes to 5174 the operations.</p> 5175 <p>An implementation may choose exclude language tags with the language subtag "und" from the following operation. In such a case, only the canonicalization is done. An implementation can declare that it is doing the exclusion, or can take a parameter that controls whether or not to do it.</p> 5176 <p> </p> 5177 <p><i><b>Add Likely Subtags:</b></i> <em>Given a source locale 5178 X, to return a locale Y where the empty subtags have been 5179 filled in by the most likely subtags.</em> This is written as X 5180 ⇒ Y ("X maximizes to Y").</p> 5181 <p>A subtag is called <em>empty</em> if it is a missing script 5182 or region subtag, or it is a base language subtag with the 5183 value "und". In the description below, a subscript on a subtag 5184 <em>x</em> indicates which tag it is from: 5185 <em>x<sub>s</sub></em> is in the source, 5186 <em>x<sub>m</sub></em>is in a match, and <em>x<sub>r</sub></em> 5187 is in the final result.</p> 5188 <p>This operation is performed in the following way.</p> 5189 <ol> 5190 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5191 <strong>Canonicalize.</strong> 5192 <ol> 5193 <li>Make sure the input locale is in canonical form: uses 5194 the right separator, and has the right casing.</li> 5195 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5196 Replace any deprecated subtags with their canonical 5197 values using the <alias> data in supplemental 5198 metadata. Use the first value in the replacement list, if 5199 it exists. Language tag replacements may have multiple 5200 parts, such as "sh" ➞ "sr_Latn" or mo" ➞ "ro_MD". In such 5201 a case, the original script and/or region are retained if 5202 there is one. Thus "sh_Arab_AQ" ➞ "sr_Arab_AQ", not 5203 "sr_Latn_AQ".</li> 5204 <li>If the tag is a legacy language tag 5205 (marked as “Type: grandfathered” in BCP 47; see <variable 5206 id="$grandfathered" type="choice"> in the supplemental 5207 data), then return it.</li> 5208 <li>Remove the script code 'Zzzz' and the region code 5209 'ZZ' if they occur.</li> 5210 <li>Get the components of the cleaned-up source tag 5211 <em>(language<sub>s</sub>, script<sub>s</sub>,</em> and 5212 <em>region<sub>s</sub></em>), plus any variants and 5213 extensions.</li> 5214 </ol> 5215 </li> 5216 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5217 <strong>Lookup.</strong> Lookup each of the following in 5218 order, and stop on the first match: 5219 <ol> 5220 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5221 <em>language<sub>s</sub>_script<sub>s</sub>_region<sub>s</sub></em></li> 5222 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5223 <em>language<sub>s</sub>_region<sub>s</sub></em></li> 5224 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5225 <em>language<sub>s</sub>_script<sub>s</sub></em></li> 5226 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5227 <em><em>language<sub>s</sub></em></em></li> 5228 <li>und<em>_script<sub>s</sub></em> </li> 5229 </ol> 5230 </li> 5231 <li> 5232 <strong>Return</strong> 5233 <ol> 5234 <li>If there is no match,either return 5235 <ol> 5236 <li>an error value, or</li> 5237 <li>the match for "und" (in APIs where a valid 5238 language tag is required).</li> 5239 </ol> 5240 </li> 5241 <li>Otherwise there is a match = <span style= 5242 "margin-top: 0.5em; margin-bottom: 0.5em"><em>language<sub>m</sub>_script<sub>m</sub>_region<sub>m</sub></em></span></li> 5243 <li>Let x<sub>r</sub> = x<sub>s</sub> if x<sub>s</sub> is 5244 not empty, and x<sub>m</sub> otherwise.</li> 5245 <li>R<span style= 5246 "margin-top: 0.5em; margin-bottom: 0.5em">eturn the 5247 language tag composed of <em>language<sub>r</sub> _ 5248 script<sub>r</sub> _ region<sub>r</sub></em> + variants + 5249 extensions</span> .</li> 5250 </ol> 5251 </li> 5252 </ol> 5253 <p>The lookup can be optimized. For example, if any of the tags 5254 in Step 2 are the same as previous ones in that list, they do 5255 not need to be tested.</p> 5256 <p><i>Example1:</i></p> 5257 <ul> 5258 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5259 <p>Input is ZH-ZZZZ-SG.</p> 5260 </li> 5261 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5262 <p>Normalize to zh_SG.</p> 5263 </li> 5264 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5265 <p>Lookup in table. No match.</p> 5266 </li> 5267 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5268 <p>Lookup zh, and get the match (zh_Hans_CN). Substitute 5269 SG, and return zh_Hans_SG.</p> 5270 </li> 5271 </ul> 5272 <p>To find the most likely language for a country, or language 5273 for a script, use "und" as the language subtag. For example, 5274 looking up "und_TW" returns zh_Hant_TW.</p> 5275 <p>A goal of the algorithm is that if X ⇒ Y, and X' results 5276 from replacing an empty subtag in X by the corresponding 5277 subtag in Y, then X' ⇒ Y. For example, if und_AF ⇒ fa_Arab_AF, 5278 then:</p> 5279 <ul> 5280 <li>fa_Arab_AF ⇒ fa_Arab_AF</li> 5281 <li>und_Arab_AF ⇒ fa_Arab_AF</li> 5282 <li>fa_AF ⇒ fa_Arab_AF</li> 5283 </ul> 5284 <p>There are a small number of exceptions to this goal in the 5285 current data, where X ∈ {und_Bopo, und_Brai, und_Cakm, 5286 und_Limb, und_Shaw}.</p> 5287 <p><b><i>Remove</i></b> <i><b>Likely Subtags:</b> Given a 5288 locale, remove any fields that Add Likely Subtags would 5289 add.</i></p> 5290 <p>The reverse operation removes fields that would be added by 5291 the first operation.</p> 5292 <ol> 5293 <li style="margin-top: 0.5em; margin-bottom: 0.5em">First get 5294 max = AddLikelySubtags(inputLocale). If an error is signaled, 5295 return it.</li> 5296 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Remove 5297 the variants from max.</li> 5298 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Get the 5299 components of the max (<em>language<sub>max</sub></em>, 5300 <em>script<sub>max</sub></em>, <em>region<sub>max</sub></em>).</li> 5301 <li style="margin-top: 0.5em; margin-bottom: 0.5em">Then for 5302 <i>trial</i> in {<em>language<sub>max</sub></em>, 5303 <em>language<sub>max</sub>_region<sub>max</sub></em>, 5304 <em>language<sub>max</sub>_script<sub>max</sub></em>} 5305 <ul> 5306 <li style="margin-top: 0.5em; margin-bottom: 0.5em">If 5307 AddLikelySubtags(<i>trial</i>) = max, then return 5308 <i>trial</i> + variants.</li> 5309 </ul> 5310 </li> 5311 <li style="margin-top: 0.5em; margin-bottom: 0.5em">If you do 5312 not get a match, return max + variants.</li> 5313 </ol> 5314 <p>Example:</p> 5315 <ul> 5316 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5317 <p>Input is zh_Hant. Maximize to get zh_Hant_TW.</p> 5318 </li> 5319 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5320 <p>zh => zh_Hans_CN. No match, so continue.</p> 5321 </li> 5322 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 5323 <p>zh_TW => zh_Hant_TW. Matches, so return zh_TW.</p> 5324 </li> 5325 </ul> 5326 <p>A variant of this favors the script over the region, thus 5327 using {language, language_script, language_region} in the 5328 above. If that variant is used, then the result in this example 5329 would be zh_Hant instead of zh_TW.</p> 5330 <h3><a name="LanguageMatching" href="#LanguageMatching" id= 5331 "LanguageMatching">4.4 Language Matching</a></h3> 5332 <p class="dtd"><!ELEMENT languageMatching ( languageMatches* 5333 ) ><br> 5334 <!ELEMENT languageMatches ( paradigmLocales*, 5335 matchVariable*, languageMatch* ) ><br> 5336 <!ATTLIST languageMatches type NMTOKEN #REQUIRED ></p> 5337 <p class="dtd"><!ELEMENT languageMatch EMPTY ><br> 5338 <!ATTLIST languageMatch desired CDATA #REQUIRED ><br> 5339 <!ATTLIST languageMatch supported CDATA #REQUIRED ><br> 5340 <!ATTLIST languageMatch percent NMTOKEN #REQUIRED ><br> 5341 <!ATTLIST languageMatch distance NMTOKEN #IMPLIED ><br> 5342 <!ATTLIST languageMatch oneway ( true | false ) #IMPLIED 5343 ></p> 5344 <p class="dtd"><!ELEMENT languageMatches ( paradigmLocales*, 5345 matchVariable*, languageMatch* ) ><br> 5346 <!ATTLIST languageMatches type NMTOKEN #REQUIRED ></p> 5347 <p class="dtd"><!ELEMENT paradigmLocales EMPTY ><br> 5348 <!ATTLIST paradigmLocales locales NMTOKENS #REQUIRED 5349 ></p> 5350 <p>Implementers are often faced with the issue of how to match 5351 the user's requested languages with their product's supported 5352 languages. For example, suppose that a product supports {ja-JP, 5353 de, zh-TW}. If the user understands written American English, 5354 German, French, Swiss German, and Italian, then 5355 <strong>de</strong> would be the best match; if s/he 5356 understands only Chinese (zh), then zh-TW would be the best 5357 match.</p> 5358 <p>The standard truncation-fallback algorithm does not work 5359 well when faced with the complexities of natural language. The 5360 language matching data is designed to fill that gap. Stated in 5361 those terms, language matching can have the effect of a more 5362 complex fallback, such as:</p> 5363 <p>sr-Cyrl-RS<br> 5364 sr-Cyrl<br> 5365 sr-Latn-RS<br> 5366 sr-Latn<br> 5367 sr<br> 5368 hr-Latn<br> 5369 hr</p> 5370 <p>Language matching is used to find the best supported locale 5371 ID given a requested list of languages. The requested list 5372 could come from different sources, such as such as the user's 5373 list of preferred languages in the OS Settings, or from a 5374 browser Accept-Language list. For example, if my native tongue 5375 is English, I can understand Swiss German and German, my French 5376 is rusty but usable, and Italian basic, ideally an 5377 implementation would allow me to select {gsw, de, fr} as my 5378 preferred list of languages, skipping Italian because my 5379 comprehension is not good enough for arbitrary content.</p> 5380 <p>Language Matching can also be used to get fallback data 5381 elements. In many cases, there may not be full data for a 5382 particular locale. For example, for a Breton speaker, the best 5383 fallback if data is unavailable might be French. That is, 5384 suppose we have found a Breton bundle, but it does not contain 5385 translation for the key "CN" (for the country China). It is 5386 best to return "chine", rather than falling back to the value 5387 default language such as Russian and getting "Кітай". The 5388 language matching data can be used to get the closest fallback 5389 locales (of those supported) to a given language.</p> 5390 <p>For the relationship between Inheritance, DefaultContent, 5391 LikelySubtags, and LocaleMatching, see <strong><em>Section 5392 4.2.6 <a href="tr35.html#Inheritance_vs_Related">Inheritance vs 5393 Related Information</a></em></strong>.</p> 5394 <p>When such fallback is used for inherited item lookup, the 5395 normal order of inheritance is used for inherited item lookup, 5396 except that before using any data from <strong>root</strong>, 5397 the data for the fallback locales would be used if available. 5398 Language matching does not interact with the fallback of 5399 resources <em>within the locale-parent chain</em>. For 5400 example, suppose that we are looking for the value for a 5401 particular path <strong>P</strong> in <strong>nb-NO</strong>. 5402 In the absence of aliases, normally the following lookup is 5403 used.</p> 5404 <blockquote> 5405 <p><strong>nb-NO</strong> → <strong>nb</strong> → 5406 <strong>root</strong></p> 5407 </blockquote> 5408 <p>That is, we first look in <strong>nb-NO</strong>. If there 5409 is no value for <strong>P</strong> there, then we look in 5410 <strong>nb</strong>. If there is no value for 5411 <strong>P</strong> there, we return the value for 5412 <strong>P</strong> in root (or a code value, if there is 5413 nothing there). Remember that if there is an alias element 5414 along this path, then the lookup may restart with a different 5415 path in <strong>nb-NO</strong> (or another locale).</p> 5416 <p>However, suppose that <strong>nb-NO</strong> has the 5417 fallback values <strong>[nn da sv en]</strong>, derived from 5418 language matching. In that case, an implementation <em>may</em> 5419 progressively lookup each of the listed locales, with the 5420 appropriate substitutions, returning the first value that is 5421 not found in <strong>root</strong>. This follows roughly the 5422 following pseudocode:</p> 5423 <ul> 5424 <li>value = lookup(P, nb-NO); if (locationFound != root) 5425 return value;</li> 5426 <li>value = lookup(P, nn-NO); if (locationFound != root) 5427 return value;</li> 5428 <li>value = lookup(P, da-NO); if (locationFound != root) 5429 return value;</li> 5430 <li>value = lookup(P, sv-NO); if (locationFound != root) 5431 return value;</li> 5432 <li>value = lookup(P, en-NO); return value;</li> 5433 </ul> 5434 <p>The locales in the fallback list are not used recursively. 5435 For example, for the lookup of a path in nb-NO, if 5436 <strong>fr</strong> were a fallback value for 5437 <strong>da</strong>, it would not matter for the above process. 5438 Only the original language matters.</p> 5439 <p>The language matching data is intended to be used according 5440 to the following algorithm. This is a logical description, and 5441 can be optimized for production in many ways. In this 5442 algorithm, the languageMatching data is interpreted as an 5443 ordered list.</p> 5444 <p>Distances between given pair of subtags can be larger or smaller than the typical distances. For example, the distance between en and en-GB can be greater than those between en-GB and en-IE. In some cases, language and/or script differences can be as small as the typical region difference. (Example: sr-Latn vs. sr-Cyrl).</p> 5445 <p>The distances resulting from the table are not linear, but are rather chosen to produce expected results. So a distance of 10 is not necessarily twice as "bad" as a distance of 5. Implementations may want to have a mode where script distances should swamp language distances. The tables are built such that this can be accomplished by multiplying the language distance by 0.25.</p> 5446 <p>The language matching algorithm takes a list of a user’s 5447 desired languages, and a list of the application’s supported 5448 languages.</p> 5449 <ul> 5450 <li>Set the best weighted distance BWD to ∞</li> 5451 <li>Set the best desired language BD to null</li> 5452 <li>Set the best supported language BS to null</li> 5453 <li>For each desired language D 5454 <ul> 5455 <li>Compute a demotion value F, based on the position in 5456 the list. 5457 <ul> 5458 <li>This demotion value is up to the implementation, 5459 but is typically a positive value that increases 5460 according to how far D is from the start of the 5461 desired language list.</li> 5462 </ul> 5463 </li> 5464 <li>For each supported language S 5465 <ul> 5466 <li>Find the matching distance MD as described 5467 below.</li> 5468 <li>Compute the weighted distance as F + MD</li> 5469 <li>If WD < BD 5470 <ul> 5471 <li>BWD = WD</li> 5472 <li>BD = D</li> 5473 <li>BS = S</li> 5474 </ul> 5475 </li> 5476 </ul> 5477 </li> 5478 </ul> 5479 </li> 5480 <li>If the BWD is less than a threshold, return <BD, BS> 5481 <ul> 5482 <li>The threshold is implementation-defined, typically 5483 set to greater than a default region difference, and less 5484 than a default script difference.</li> 5485 </ul> 5486 </li> 5487 <li>Otherwise BD = the default supported language (like 5488 English); return <BD, null></li> 5489 </ul> 5490 <p>To find the matching distance MD between any two languages, 5491 perform the following steps.</p> 5492 <ol> 5493 <li>Maximize each language using Section 4.3 <a href= 5494 "#Likely_Subtags">Likely Subtags</a>. 5495 <ul> 5496 <li>und is a special case: see below.</li> 5497 </ul> 5498 </li> 5499 <li>Set the match-distance MD to 0</li> 5500 <li>For each subtag in {language, script, region} 5501<ol> 5502 <li>If respective subtags in each language tag are 5503 identical, remove the subtag from each (logically) and 5504 continue.</li> 5505 <li>Traverse the languageMatching data until a match is 5506 found. 5507 <ul> 5508 <li>* matches any field.</li> 5509 <li>If the oneway flag is false, then the match is 5510 symmetric; otherwise only match one direction.</li> 5511 <li>For region matching, use the mechanisms in <strong>Section 4.4.1 <a href= 5512 "#EnhancedLanguageMatching">Enhanced Language 5513 Matching</a></strong>.</li> 5514 </ul> 5515 </li> 5516 <li>Add the <strong>distance</strong> attribute value to MD. 5517 <ul> 5518 <li>This used to be a <strong>percent</strong> attribute value, which was 100 - the distance attribute value.</li> 5519 </ul> 5520 </li> 5521 <li>Remove the subtag from each (logically)</li> 5522 </ol> 5523 </li> 5524 <li>Return MD</li> 5525 </ol> 5526 <p>It is typically useful to set the discount factor between 5527 successive elements of the desired languages list to be 5528 slightly greater than the default region difference. That 5529 avoids the following problem:<br></p> 5530 <p><em>Supported languages:</em> "de, fr, ja"<br></p> 5531 <p><em>User's desired languages:</em> "de-AT, fr"</p> 5532 <p>This user would expect to get "de", not "fr". In practice, 5533 when a user selects a list of preferred languages, they don't 5534 include all the regional variants ahead of their second base 5535 language. Yet while the user's desired languages really doesn't 5536 tell us the priority ranking among their languages, normally 5537 the fall-off between the user's languages is substantially 5538 greater than regional variants. But unless F is greater than 5539 the distance between de-AT and de-DE, then the user’s 5540 second-choice language would be returned.</p> 5541 <p>The base language subtag "und" is a special case. Suppose we 5542 have the following situation:</p> 5543 <ul> 5544 <li>desired languages: {und, it}</li> 5545 <li>supported languages: {en, it}</li> 5546 <li>resulting language: en<br></li> 5547 </ul> 5548 <p>Part of this is because 'und' has a special function in BCP 5549 47; it stands in for 'no supplied base language'. To prevent 5550 this from happening, if the desired base language is und, the 5551 language matcher should not apply likely subtags to 5552 it. </p> 5553 <p>Examples:</p> 5554 <p>For example, suppose that nn-DE and nb-FR are being 5555 compared. They are first maximized to nn-Latn-DE and 5556 nb-Latn-FR, respectively. The list is searched. The first match 5557 is with "*-*-*", for a match of 96%. The languages are 5558 truncated to nn-Latn and nb-Latn, then to nn and nb. The first 5559 match is also for a value of 96%, so the result is 92%.</p> 5560 <p>Note that language matching is orthogonal to the how closely 5561 two languages are related linguistically. For example, Breton 5562 is more closely related to Welsh than to French, but French is 5563 the better match (because it is more likely that a Breton 5564 reader will understand French than Welsh). This also 5565 illustrates that the matches are often asymmetric: it is not 5566 likely that a French reader will understand Breton.</p> 5567 <p>The "*" acts as a wild card, as shown in the following 5568 example:</p> 5569 <p class="example"><languageMatch desired="es-*-ES" 5570 supported="es-*-ES" percent="100"/><br> 5571 <!-- Latin American Spanishes are closer to each other. 5572 Approximate by having es-ES be further from everything 5573 else.--></p> 5574 <p> </p> 5575 <p class="example"><languageMatch desired="es-*-ES" 5576 supported="es-*-*" percent="93"/></p> 5577 <p class="example"><br> 5578 <languageMatch desired="*" supported="*" 5579 percent="1"/><br> 5580 <!-- [Default value - must be at end!] Normally there is no 5581 comprehension of different languages.--></p> 5582 <p class="example"><br> 5583 <languageMatch desired="*-*" supported="*-*" 5584 percent="20"/><br> 5585 <!-- [Default value - must be at end!] Normally there is 5586 little comprehension of different scripts.--></p> 5587 <p class="example"><br> 5588 <languageMatch desired="*-*-*" supported="*-*-*" 5589 percent="96"/><br> 5590 <!-- [Default value - must be at end!] Normally there are 5591 small differences across regions.--></p> 5592 <p>When the language+region is not matched, and there is 5593 otherwise no reason to pick among the supported regions for 5594 that language, then some measure of geographic "closeness" can 5595 be used. The results may be more understandable by users. 5596 Looking for en-SK, for example, should fall back to something 5597 within Europe (eg en-GB) in preference to something far away 5598 and unrelated (eg en-SG). Such a closeness metric does not need 5599 to be exact; a small amount of data can be used to give an 5600 approximate distance between any two regions. However, any such 5601 data must be used carefully; although Hong Kong is closer to 5602 India than to the UK, it is unlikely that en-IN would be a 5603 better match to en-HK than en-GB would.</p> 5604 <h4><a name="EnhancedLanguageMatching" href= 5605 "#EnhancedLanguageMatching" id="EnhancedLanguageMatching">4.4.1 5606 Enhanced Language Matching</a></h4> 5607 <p>The enhanced format for language matching adds structure to 5608 enable better matching of languages. It is distinguished by 5609 having a suffix "_new" on the type, as in the example below. 5610 The extended structure allows matching to take into account 5611 broad similarities that would give better results. For example, 5612 for English the regions that are or inherit from US 5613 (AS|GU|MH|MP|PR|UM|VI|US) form a “cluster”. Each region in that 5614 cluster should be closer to each other than to any other 5615 region. And a region outside the cluster should be closer to 5616 another region outside that cluster than to one inside. We get 5617 this issue with the “world languages” like English, Spanish, 5618 Portuguese, Arabic, etc.</p> 5619 <p><em>Example:</em></p> 5620 <pre> 5621 <languageMatches type="written_new"><br> <paradigmLocales locales="en en-GB es es-419 pt-BR pt-PT"/><br> <matchVariable id="$enUS" value="AS+GU+MH+MP+PR+UM+US+VI"/><br> <matchVariable id="$cnsar" value="HK+MO"/><br> <matchVariable id="$americas" value="019"/><br> <matchVariable id="$maghreb" value="MA+DZ+TN+LY+MR+EH"/><br> <languageMatch desired="no" supported="nb" distance="1"/><!-- no ⇒ nb --><br>… 5622 <languageMatch desired="ar_*_$maghreb" supported="ar_*_$maghreb" distance="4"/> 5623 <!-- ar; *; $maghreb ⇒ ar; *; $maghreb --> 5624 <languageMatch desired="ar_*_$!maghreb" supported="ar_*_$!maghreb" distance="4"/> 5625 <!-- ar; *; $!maghreb ⇒ ar; *; $!maghreb --><br>…</pre> 5626 <p>The <strong>matchVariable</strong> allows for a rule to 5627 matche to multiple regions, as illustrated by 5628 <strong>$maghreb</strong>. The syntax is simple: it allows for 5629 + for <em>union</em> and - for <em>set difference</em>, but no 5630 precedence. So A+B-A+D is interpreted as (((A+B)-A)+D), not as 5631 (A+B)-(A+D). The variable <strong>id</strong> has a value of 5632 the form [$][a-zA-Z0-9]+. If $X is defined, then $!X 5633 automatically means all those regions that are not in $X.</p> 5634 <p dir="ltr">When the set is interpreted, then macrolanguages 5635 are (logically) transformed into a list of their contents, so 5636 “053+GB” → “AU+GB+NF+NZ”. This is done recursively, so 009 → 5637 “053+054+057+061+QO” → “AU+NF+NZ+FJ+NC+PG+SB +VU...”. Note that 5638 we use 019 for all of the Americas in the variables above, 5639 because en-US should be in the same cluster as es-419 and its 5640 contents.</p> 5641 <p>In the rules, the percent value (100..0) is replaced by a 5642 <strong>distance</strong> value, which is the inverse 5643 (0..100).</p> 5644 <p dir="ltr">These new variables and rules divide up the world 5645 into clusters, where items in the same clusters (for specific 5646 languages) get the normal regional difference, and items in 5647 different clusters get different weights.</p><br> 5648 <p dir="ltr">Each cluster can have one or more associated 5649 <strong>paradigmLocales</strong>. These are locales that are 5650 preferred within a cluster. So when matching desired=[en-SA] 5651 against [en-GU en en-IN en-GB], the value en-GB is returned. 5652 Both of {en-GU en} are in a different cluster. While {en-IN 5653 en-GB} are in the same cluster, and the same distance from 5654 en-SA, the preference is given to en-GB because it is in the 5655 paradigm locales. It would be possible to express this in 5656 rules, but using this mechanism handles these very common cases 5657 without bulking up the tables.<br></p> 5658 <p dir="ltr">The <strong>paradigmLocales</strong> also allow 5659 matching to macroregions. For example, desired=[es-419] should 5660 match to {es-MX} more closely than to {es}, and vice versa: 5661 {es-MX} should match more closely to {es-419} than to {es}. But 5662 es-MX should match more closely to es-419 than to any of the 5663 other es-419 sublocales. In general, in the absence of other 5664 distance data, there is a ‘paradigm’ in each cluster that the 5665 others should match more closely to: en(-US), en-GB, es(-ES), 5666 es-419, ru(-RU)...</p> 5667 <h2><a name="XML_Format" href="#XML_Format" id="XML_Format">5 5668 XML Format</a></h2> 5669 <p>There are two kinds of data that can be expressed in LDML: 5670 language-dependent data and supplementary data. In either case, 5671 data can be split across multiple files, which can be in 5672 multiple directory trees.</p> 5673 <p>For example, the language-dependent data for Japanese in 5674 CLDR is present in the following files:</p> 5675 <ul> 5676 <li>common/collation/ja.xml</li> 5677 <li>common/main/ja.xml</li> 5678 <li>common/rbnf/ja.xml</li> 5679 <li>common/segmentations/ja.xml</li> 5680 </ul> 5681 <p>Data for cased languages such as French are in files 5682 like:</p> 5683 <ul> 5684 <li>common/casing/fr.xml</li> 5685 </ul> 5686 <p>The status of the data is the same, whether or not data is 5687 split. That is, for the purpose of validation and lookup, all 5688 of the data for the above ja.xml files is treated as if it was 5689 in a single file. These files have the <ldml> root 5690 element and use ldml.dtd. The file name must match the identity 5691 element. For example, the <ldml> file pa_Arab_PK.xml must 5692 contain the following elements:</p> 5693 <pre> 5694 <strong><ldml></strong><br> <identity><br> …<br> <strong><language type="pa"/><br> <script type="Arab"/><br> <territory type="PK"/></strong><br> </identity> 5695…</pre> 5696 <p>Supplemental data can have different root elements, 5697 currently: ldmlBCP47, supplementalData, keyboard, and platform. 5698 Keyboard and platform files are considered distinct. The 5699 ldmlBCP47 files and supplementalData files that have the same 5700 root are all logically part of the same file; they are simply 5701 split into separate files for convenience. Implementations may 5702 split the files in different ways, also for their convenience. 5703 The files in /properties are also supplemental data files, but 5704 are structured like UCD properties.</p> 5705 <p>For example, supplemental data relating to Japan or the 5706 Japanese writing are in:</p> 5707 <ul> 5708 <li>common/supplemental/ (in many files, such as 5709 supplementalData.xml)</li> 5710 <li>common/transforms/Hiragana-Katakana.xml</li> 5711 <li>common/transforms/Hiragana-Latin.xml</li> 5712 <li>common/properties/scriptMetadata.txt</li> 5713 <li>common/bcp47/calendar.xml</li> 5714 <li>uca/allkeys_CLDR.txt (sorting)</li> 5715 <li>/keyboards/chromeos/ja-t-k0-chromeos.xml</li> 5716 <li>...</li> 5717 </ul> 5718 <p>Like the <ldml> files, the keyboard file names must 5719 match internal data: in particular, the locale attribute on the 5720 keyboard element must have a value that corresponds to the file 5721 name, such as <keyboard locale="af-t-k0-android"> for the 5722 file af-t-k0-android.xml.</p> 5723 <p>The following sections describe the structure of the XML 5724 format for language-dependent data. The more precise syntax is 5725 in the ldml.dtd file<i>; however, the DTD does not describe all 5726 the constraints on the structure.</i></p> 5727 <p>To start with, the root element is <ldml>, with the 5728 following DTD entry:</p> 5729 <p class='dtd'><!ELEMENT ldml 5730 (identity,(alias|(fallback*,localeDisplayNames?,layout?,contextTransforms?,characters?,<br> 5731 5732 delimiters?,measurement?,dates?,numbers?,units?,listPatterns?,collations?,posix?,<br> 5733 5734 segmentations?,rbnf?,annotations?,metadata?,references?,special*)))></p> 5735 <p>The XML structure is stable over releases. Elements and 5736 attributes may be deprecated: they are retained in the DTD but 5737 their usage is strongly discouraged. In most cases, an 5738 alternate structure is provided for expressing the information. 5739 There is only one exception: newer DTDs cannot be used with 5740 version 1.1 files, without some modification.</p> 5741 <p>In general, all translatable text in this format is in 5742 element contents, while attributes are reserved for types and 5743 non-translated information (such as numbers or dates). The 5744 reason that attributes are not used for translatable text is 5745 that spaces are not preserved, and we cannot predict where 5746 spaces may be significant in translated material.</p> 5747 <p>There are two kinds of elements in LDML: <i>rule</i> 5748 elements and <i>structure</i> elements. For structure elements, 5749 there are restrictions to allow for effective inheritance and 5750 processing:</p> 5751 <ol> 5752 <li>There is no "mixed" content: if an element has textual 5753 content, then it cannot contain any elements.</li> 5754 <li>The [<a href="#XPath">XPath</a>] leading to the content 5755 is unique; no two different pieces of textual content have 5756 the same [<a href="#XPath">XPath</a>].</li> 5757 </ol> 5758 <p>Rule elements do not have this restriction, but also do not 5759 inherit, except as an entire block. The rule elements are 5760 listed in serialElements in the supplemental metadata. See also 5761 <i><a href="#Inheritance_and_Validity">Section 4.2 Inheritance 5762 and Validity</a></i>. For more technical details, see <a href= 5763 "http://cldr.unicode.org/development/updating-dtds">Updating-DTDs</a>.</p> 5764 <p>Note that the data in examples given below is purely 5765 illustrative, and does not match any particular language. For a 5766 more detailed example of this format, see [<a href= 5767 "#LDML">Example</a>]. There is also a DTD for this format, but 5768 <i>remember that the DTD alone is not sufficient to understand 5769 the semantics, the constraints, nor the 5770 interrelationships between the different elements and 5771 attributes</i>. You may wish to have copies of each of these to 5772 hand as you proceed through the rest of this document.</p> 5773 <p>In particular, all elements allow for draft versions to 5774 coexist in the file at the same time. Thus most elements are 5775 marked in the DTD as allowing multiple instances. However, 5776 unless an element is listed as a serialElement, or has a 5777 distinguishing attribute, it can only occur once as a 5778 subelement of a given element. Thus, for example, the following 5779 is illegal even though allowed by the DTD:</p> 5780 <p><languages><br> 5781 <language type="aa">...</language><br> 5782 <language type="aa">..</language></p> 5783 <p>There must be only one instance of these per parent, unless 5784 there are other distinguishing attributes (such as an alt 5785 element).</p> 5786 <p>In general, LDML data should be in NFC format. However, 5787 certain elements may need to contain characters that are not in 5788 NFC, including exemplars, transforms, segmentations, and 5789 p/s/t/i/pc/sc/tc/ic rules in collation. These elements must not 5790 be normalized (either to NFC or NFD), or their meaning may be 5791 changed. Thus LDML documents must not be normalized as a whole. 5792 To prevent problems with normalization, no element value can 5793 start with a combining slash (U+0338 COMBINING LONG SOLIDUS 5794 OVERLAY).</p> 5795 <p>Lists, such as <span class= 5796 "attribute">singleCountries</span> are space-delimited. That 5797 means that they are separated by one or more XML whitespace 5798 characters,</p> 5799 <ul> 5800 <li>singleCountries</li> 5801 <li>preferenceOrdering</li> 5802 <li>references</li> 5803 </ul> 5804 <h3><a name="Common_Elements" href="#Common_Elements" id= 5805 "Common_Elements">5.1 Common Elements</a></h3> 5806 <p>At any level in any element, two special elements are 5807 allowed.</p> 5808 <h4><a name="special" href="#special" id="special">5.1.1 5809 Element special</a></h4> 5810 <p>This element is designed to allow for arbitrary additional 5811 annotation and data that is product-specific. It has one 5812 required attribute <span class="attribute">xmlns</span>, which 5813 specifies the XML <a href= 5814 "https://www.w3.org/TR/REC-xml-names/">namespace</a> of the 5815 special data. For example, the following used the version 1.0 5816 POSIX special element.</p> 5817 <pre><!DOCTYPE ldml SYSTEM "<span style= 5818 "color: blue">https://unicode.org/cldr/dtd/1.0/ldml.dtd</span>" [ 5819 <!ENTITY % posix SYSTEM "<span style= 5820"color: blue">https://unicode.org/cldr/dtd/1.0/ldmlPOSIX.dtd</span>"> 5821<span style="color: blue">%posix;</span> 5822]> 5823<ldml> 5824... 5825<special xmlns:posix="<span style= 5826"color: blue">https://www.opengroup.org/regproducts/xu.htm</span>"> 5827 <span style= 5828"color: green"><!-- old abbreviations for pre-GUI days --></span> 5829 <posix:messages> 5830 <posix:yesstr><span style= 5831"color: blue">Yes</span></posix:yesstr> 5832 <posix:nostr><span style= 5833"color: blue">No</span></posix:nostr> 5834 <posix:yesexpr><span style= 5835"color: blue">^[Yy].*</span></posix:yesexpr> 5836 <posix:noexpr><span style= 5837"color: blue">^[Nn].*</span></posix:noexpr> 5838 </posix:messages> 5839 </special> 5840</ldml> 5841</pre> 5842 <h5><a name="Sample_Special_Elements" href= 5843 "#Sample_Special_Elements" id="Sample_Special_Elements">5.1.1.1 5844 Sample Special Elements</a></h5> 5845 <p>The elements in this section are <i><b>not</b></i> part of 5846 the Locale Data Markup Language 1.0 specification. Instead, 5847 they are special elements used for application-specific data to 5848 be stored in the Common Locale Repository. They may change or 5849 be removed future versions of this document, and are present 5850 her more as examples of how to extend the format. (Some of 5851 these items may move into a future version of the Locale Data 5852 Markup Language specification.)</p> 5853 <ul> 5854 <li><a href= 5855 "https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd">https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</a></li> 5856 <li><a href= 5857 "https://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd">https://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd</a></li> 5858 </ul> 5859 <p>The above examples are old versions: consult the 5860 documentation for the specific application to see which should 5861 be used.</p> 5862 <p>These DTDs use namespaces and the special element. To 5863 include one or more, use the following pattern to import the 5864 special DTDs that are used in the file:</p> 5865 <pre><?xml version="<span style= 5866 "color: blue">1.0</span>" encoding="<span style= 5867 "color: blue">UTF-8</span>" ?> 5868<!DOCTYPE ldml SYSTEM "<span style= 5869"color: blue">https://unicode.org/cldr/dtd/1.1/ldml.dtd</span>" [ 5870 <!ENTITY % <span style= 5871"color: blue">icu</span> SYSTEM "<span style= 5872"color: blue">https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</span>"> 5873 <!ENTITY % <span style= 5874"color: blue">openOffice</span> SYSTEM "<span style= 5875"color: blue">https://unicode.org/cldr/dtd/1.1/ldmlOpenOffice.dtd</span>"> 5876<span style="color: blue">%icu; 5877%openOffice; 5878</span>]></pre> 5879 <p>Thus to include just the ICU DTD, one uses:</p> 5880 <pre><?xml version="<span style= 5881 "color: blue">1.0</span>" encoding="<span style= 5882 "color: blue">UTF-8</span>" ?> 5883<!DOCTYPE ldml SYSTEM "<span style= 5884"color: blue">https://unicode.org/cldr/dtd/1.1/ldml.dtd</span>" [ 5885 <!ENTITY % icu SYSTEM "<span style= 5886"color: blue">https://unicode.org/cldr/dtd/1.1/ldmlICU.dtd</span>"> 5887<span style="color: blue">%icu; 5888</span>]></pre> 5889 <blockquote> 5890 <p><b>Note:</b> A previous version of this document contained 5891 a special element for <a href= 5892 "http://www.open-std.org/jtc1/sc22/wg20/docs/n897-14652w25.pdf"> 5893 ISO TR 14652</a> compatibility data. That element has been 5894 withdrawn, pending further investigation, since 14652 is a 5895 Type 1 TR: "when the required support cannot be obtained for 5896 the publication of an International Standard, despite 5897 repeated effort". See the ballot comments on <a href= 5898 "http://www.open-std.org/jtc1/sc22/wg20/docs/n948-J1N6769-14652.pdf"> 5899 14652 Comments</a> for details on the 14652 defects. For 5900 example, most of these patterns make little provision for 5901 substantial changes in format when elements are empty, so are 5902 not particularly useful in practice. Compare, for example, 5903 the mail-merge capabilities of production software such as 5904 Microsoft Word or OpenOffice.</p> 5905 <p><b>Note:</b> While the CLDR specification guarantees 5906 backwards compatibility, the definition of specials is up to 5907 other organizations. Any assurance of backwards compatibility 5908 is up to those organizations.</p> 5909 </blockquote> 5910 <p>A number of the elements above can have extra information 5911 for <a name="OpenOffice" href="#OpenOffice" id= 5912 "OpenOffice">openoffice.org</a>, such as the following 5913 example:</p> 5914 <pre> <special xmlns:openOffice="<span style= 5915 "color: blue">https://www.openoffice.org</span>"> 5916 <openOffice:search> 5917 <openOffice:searchOptions> 5918 <openOffice:transliterationModules><span style="color: blue">IGNORE_CASE</span></openOffice:transliterationModules> 5919 </openOffice:searchOptions> 5920 </openOffice:search> 5921 </special> 5922</pre> 5923 <h4><a name="Alias_Elements" href="#Alias_Elements" id= 5924 "Alias_Elements">5.1.2 Element alias</a></h4> 5925 <p class="dtd"><!ELEMENT alias (special*) ><br> 5926 <!ATTLIST alias source NMTOKEN #REQUIRED ><br> 5927 <!ATTLIST alias path CDATA #IMPLIED></p> 5928 <p>The contents of any element in root can be replaced by an 5929 alias, which points to the path where the data can be 5930 found.</p> 5931 <p>Aliases will only ever appear in root with the form 5932 //ldml/.../alias[@source="locale"][@path="..."].</p> 5933 <p>Consider the following example in root:</p> 5934 <pre> 5935 <calendar type="gregorian"><br> <months><br> <default choice="format"/><br> <monthContext type="format"><br> <default choice="wide"/><br> <monthWidth type="abbreviated"><br> <strong><alias source="locale" path="../monthWidth[@type='wide']"/></strong><br> </monthWidth></pre> 5936 <p>If the locale "de_DE" is being accessed for a month name for 5937 format/abbreviated, then a resource bundle at "de_DE" will be 5938 searched for a resource element at the that path. If not found 5939 there, then the resource bundle at "de" will be searched, and 5940 so on. When the alias is found in root, then the search is 5941 restarted, but searching for format/<strong>wide</strong> 5942 element instead of format/abbreviated.</p> 5943 <p>If the <b>path</b> attribute is present, then its value is 5944 an [<a href="#XPath">XPath</a>] that points to a different node 5945 in the tree. For example:</p> 5946 <pre> 5947 <alias source="locale" path="../monthWidth[@type='wide']"/></pre> 5948 <p>The default value if the path is not present is the same 5949 position in the tree. All of the attributes in the [<a href= 5950 "#XPath">XPath</a>] must be <i>distinguishing</i> elements. For 5951 more details, see <a href="#Inheritance_and_Validity">Section 5952 4.2 Inheritance and Validity</a>.</p> 5953 <p>There is a special value for the source attribute, the 5954 constant <b>source="locale"</b>. This special value is 5955 equivalent to the locale being resolved. For example, consider 5956 the following example, where locale data for 'de' is being 5957 resolved:</p> 5958 <div align="center"> 5959 <center> 5960 <table border="1" cellpadding="0" cellspacing="1"> 5961 <caption> 5962 <a name="Inheritance_with_source_locale_" href= 5963 "#Inheritance_with_source_locale_" id= 5964 "Inheritance_with_source_locale_">Inheritance with 5965 source="locale"</a> 5966 </caption> 5967 <tr> 5968 <th>Root</th> 5969 <th>de</th> 5970 <th bgcolor="#C0C0C0">Resolved</th> 5971 </tr> 5972 <tr> 5973 <td><code><x><br> 5974 <a>1</a><br> 5975 <b>2</b><br> 5976 <c>3</c><br> 5977 <br> 5978 </x></code></td> 5979 <td><code><x><br> 5980 <a>11</a><br> 5981 <b>12</b><br> 5982 <br> 5983 <d>14</d><br> 5984 </x></code></td> 5985 <td bgcolor="#C0C0C0"><code><x><br> 5986 <a>11</a><br> 5987 <b>12</b><br> 5988 <span style= 5989 "background-color: #FFFF00"><span class= 5990 "inherited"><span style= 5991 "font-weight: 400;"><c>3</c></span></span></span><br> 5992 5993 <d>14</d><br> 5994 </x></code></td> 5995 </tr> 5996 <tr> 5997 <td><code><y><br> 5998 <alias source="locale" path="../x"><br> 5999 </y></code></td> 6000 <td><code><y><br> 6001 <br> 6002 <b>22</b><br> 6003 <br> 6004 <br> 6005 <e>25</e><br> 6006 </y></code></td> 6007 <td bgcolor="#C0C0C0"><code><y><br> 6008 <span style= 6009 "background-color: #FFFF00"><span class= 6010 "inherited"><span style= 6011 "font-weight: 400;"><a>11</a></span></span></span><br> 6012 6013 <b>22</b><br> 6014 <span style= 6015 "background-color: #FFFF00"><span class= 6016 "inherited"><span style= 6017 "font-weight: 400;"><c>3</c></span></span></span><br> 6018 6019 <span style= 6020 "background-color: #FFFF00"><span class= 6021 "inherited"><span style= 6022 "font-weight: 400;"><d>14</d></span></span></span><br> 6023 6024 <e>25</e><br> 6025 </y></code></td> 6026 </tr> 6027 </table> 6028 </center> 6029 </div> 6030 <p>The first row shows the inheritance within the <x> 6031 element, whereby <c> is inherited from root. The second 6032 shows the inheritance within the <y> element, whereby 6033 <a>, <c>, and <d> are inherited also from 6034 root, but from an alias there. The alias in root is logically 6035 replaced not by the elements in root itself, but by elements in 6036 the 'target' locale.</p> 6037 <p>For more details on data resolution, see <a href= 6038 "#Inheritance_and_Validity">Section 4.2 Inheritance and 6039 Validity</a>.</p> 6040 <p>Aliases must be resolved recursively. An alias may point to 6041 another path that results in another alias being found, and so 6042 on. For example, looking up Thai buddhist abbreviated months 6043 for the locale <strong>xx-YY</strong> may result in the 6044 following chain of aliases being followed:</p> 6045 <blockquote> 6046 <p> 6047 ../../calendar[@type="buddhist"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"]</p> 6048 <p>xx-YY → xx → root // finds alias that changes path to:</p> 6049 <p> 6050 ../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="abbreviated"]</p> 6051 <p>xx-YY → xx → root // finds alias that changes path to:</p> 6052 <p> 6053 ../../calendar[@type="gregorian"]/months/monthContext[@type="format"]/monthWidth[@type="wide"]</p> 6054 <p>xx-YY → xx // finds value here</p> 6055 </blockquote> 6056 <p>It is an error to have a circular chain of aliases. That is, 6057 a collection of LDML XML documents must not have situations 6058 where a sequence of alias lookups (including inheritance and 6059 lateral inheritance) can be followed indefinitely without 6060 terminating.</p> 6061 <h4><a name="Element_displayName" href="#Element_displayName" 6062 id="Element_displayName">5.1.3 Element displayName</a></h4> 6063 <p>Many elements can have a display name. This is a translated 6064 name that can be presented to users when discussing the 6065 particular service. For example, a number format, used to 6066 format numbers using the conventions of that locale, can have 6067 translated name for presentation in GUIs.</p> 6068 <pre> <numberFormat> 6069 <displayName><span style= 6070"color: blue">Prozentformat</span></displayName> 6071... 6072 <numberFormat></pre> 6073 <p>Where present, the display names must be unique; that is, 6074 two distinct code would not get the same display name. 6075 (There is one exception to this: in time zones, where parsing 6076 results would give the same GMT offset, the standard and 6077 daylight display names can be the same across different time 6078 zone IDs.) Any translations should follow customary practice 6079 for the locale in question. For more information, see [<a href= 6080 "#DataFormats">Data Formats</a>].</p> 6081 <h4><a name="Escaping_Characters" href="#Escaping_Characters" 6082 id="Escaping_Characters">5.1.4 Escaping Characters</a></h4> 6083 <p>Unfortunately, XML does not have the capability to contain 6084 all Unicode code points. Due to this, in certain instances 6085 extra syntax is required to represent those code points that 6086 cannot be otherwise represented in element content. The 6087 escaping syntax is only defined on a few types of elements, 6088 such as in collation or exemplar sets, and uses the appropriate 6089 syntax for that type.</p> 6090 <p>The element <cp>, which was formerly used for this 6091 purpose, has been deprecated.</p> 6092 <h3><a name="Common_Attributes" href="#Common_Attributes" id= 6093 "Common_Attributes">5.2 Common Attributes</a></h3> 6094 <h4><a name="Attribute_type" href="#Attribute_type" id= 6095 "Attribute_type">5.2.1 Attribute type</a></h4> 6096 <p>The attribute <i>type</i> is also used to indicate an 6097 alternate resource that can be selected with a matching 6098 type=option in the locale id modifiers, or be referenced by a 6099 default element. For example:</p> 6100 <pre><ldml> 6101 ... 6102 <currencies> 6103 <currency><span style= 6104"color: blue">...</span></currency> 6105 <currency type="<span style= 6106"color: blue">preEuro</span>"><span style= 6107"color: blue">...</span></currency> 6108 </currencies> 6109</ldml></pre> 6110 <h4><a name="Attribute_draft" href="#Attribute_draft" id= 6111 "Attribute_draft">5.2.2 Attribute draft</a></h4> 6112 <p>If this attribute is present, it indicates the status of all 6113 the data in this element and any subelements (unless they have 6114 a contrary <i>draft</i> value), as per the following:</p> 6115 <ul> 6116 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 6117 <i>approved:</i> fully approved by the technical committee 6118 (equals the CLDR 1.3 value of <i>false</i>, or an absent 6119 <i>draft</i> attribute). This does not mean that the data is 6120 guaranteed to be error-free—this is the best judgment of the 6121 committee.</li> 6122 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 6123 <i>contributed</i>: partially approved by the technical 6124 committee.</li> 6125 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 6126 <i>provisional</i>: partially confirmed. Implementations may 6127 choose to accept the provisional data, especially if there is 6128 no translated alternative.</li> 6129 <li style="margin-top: 0.5em; margin-bottom: 0.5em"> 6130 <i>unconfirmed</i>: no confirmation available.</li> 6131 </ul> 6132 <p>For more information on precisely how these values are 6133 computed for any given release, see 6134 <a href= 6135 "http://cldr.unicode.org/index/process#TOC-Data--Submission-and-Vetting"> 6136 Data Submission and Vetting Process</a> on the CLDR 6137 website.</p> 6138 <p>The draft attribute should only occur on "leaf" elements, 6139 and is deprecated elsewhere. For a more formal description of 6140 how elements are inherited, and what their draft status is, see 6141 <i><a href="#Inheritance_and_Validity">Section 4.2 Inheritance 6142 and Validity</a></i>.</p> 6143 <h4><a name="alt_attribute" href="#alt_attribute" id= 6144 "alt_attribute">5.2.3 Attribute alt</a></h4> 6145 <p>This attribute labels an alternative value for an element. 6146 The value is a <i>descriptor</i> indicates what kind of 6147 alternative it is, and takes one of the following</p> 6148 <ul> 6149 <li><i>variantname</i> meaning that the value is a variant of 6150 the normal value, and may be used in its place in certain 6151 circumstances. If a variant value is absent for a particular 6152 locale, the normal value is used. The variant mechanism 6153 should only be used when such a fallback is acceptable.</li> 6154 <li><span style="color: blue">proposed</span>, optionally 6155 followed by a number, indicating that the value is a proposed 6156 replacement for an existing value.</li> 6157 <li><i>variantname</i><span style= 6158 "color: blue">-proposed</span>, optionally followed by a 6159 number, indicating that the value is a proposed replacement 6160 variant value.</li> 6161 </ul> 6162 <p>"<span style="color: blue">proposed</span>" should only be 6163 present if the draft status is not "approved". It indicates 6164 that the data is proposed replacement data that has been added 6165 provisionally until the differences between it and the other 6166 data can be vetted. For example, suppose that the translation 6167 for September for some language is "Settembru", and a bug 6168 report is filed that that should be "Settembro". The new data 6169 can be entered in, but marked as <i>alt="proposed"</i> until it 6170 is vetted.</p> 6171 <pre>... 6172<month type="9">Settembru</month> 6173<month type="9" draft="unconfirmed" alt="proposed">Settembro</month> 6174<month type="10">...</pre> 6175 <p>Now assume another bug report comes in, saying that the 6176 correct form is actually "Settembre". Another alternative can 6177 be added:</p> 6178 <pre>... 6179<month type="9" draft="unconfirmed" alt="proposed2">Settembre</month> 6180...</pre> 6181 <p>The values for <i>variantname</i> at this time include 6182 "<span style="color: blue">variant</span>", "<span style= 6183 "color: blue">list</span>", "<span style= 6184 "color: blue">email</span>", "<span style= 6185 "color: blue">www</span>", "<span class= 6186 "attributeValue">short</span>", and "<span style= 6187 "color: blue">secondary</span>".</p> 6188 <p>For a more complete description of how draft applies to 6189 data, see <i><a href="#Inheritance_and_Validity">Section 4.2 6190 Inheritance and Validity</a></i>.</p> 6191 <p class="element2">Attribute <a name="references_attribute" 6192 href="#references_attribute" id= 6193 "references_attribute">references</a></p> 6194 <p>The value of this attribute is a token representing a 6195 reference for the information in the element, including 6196 standards that it may conform to. <references>. (In older 6197 versions of CLDR, the value of the attribute was freeform text. 6198 That format is deprecated.)</p> 6199 <p><i>Example:</i></p> 6200 <p class="example"><territory type="UM" 6201 references="R222">USAs yttre öar</territory></p> 6202 <p>The reference element may be inherited. Thus, for example, 6203 R222 may be used in sv_SE.xml even though it is not defined 6204 there, if it is defined in sv.xml.</p> 6205 <p><... allow="verbatim" ...> (deprecated)</p> 6206 <p>This attribute was originally intended for use in marking 6207 display names whose capitalization differed from what was 6208 indicated by the now-deprecated <inText> element 6209 (perhaps, for example, because the names included a proper 6210 noun). It was never supported in the dtd and is not needed for 6211 use with the new <contextTransforms> element.</p> 6212 <h3><a name="Common_Structures" href="#Common_Structures" id= 6213 "Common_Structures">5.3 Common Structures</a></h3> 6214 <h4><a name="Date_Ranges" href="#Date_Ranges" id= 6215 "Date_Ranges">5.3.1 Date and Date Ranges</a></h4> 6216 <p>When attribute specify date ranges, it is usually done with 6217 attributes <i>from</i> and <i>to</i>. The <i>from</i> attribute 6218 specifies the starting point, and the <i>to</i> attribute 6219 specifies the end point. The deprecated <i>time</i> attribute 6220 was formerly used to specify time with the deprecated 6221 weekEndStart and weekEndEnd elements, which were themselves 6222 inherently <i>from</i> or <i>to</i>.</p> 6223 <p>The data format is a restricted ISO 8601 format, restricted 6224 to the fields <i>year, month, day, hour, minute,</i> and 6225 <i>second</i> in that order, with "-" used as a separator 6226 between date fields, a space used as the separator between the 6227 date and the time fields, and ":" used as a separator between 6228 the time fields. If the minute or minute and second are absent, 6229 they are interpreted as zero. If the hour is also missing, then 6230 it is interpreted based on whether the attribute is <i>from</i> 6231 or <i>to</i>.</p> 6232 <ul> 6233 <li> 6234 <p class="note"><i>from</i> defaults to "00:00:00" 6235 (midnight at the start of the day).</p> 6236 </li> 6237 <li> 6238 <p class="note"><i>to</i> defaults to "24:00:00" (midnight 6239 at the end of the day).</p> 6240 </li> 6241 </ul> 6242 <p class="note">That is, Friday at 24:00:00 is the same time as 6243 Saturday at 00:00:00. Thus when the hour is missing, the 6244 <i>from and to</i> are interpreted inclusively: the range 6245 includes all of the day mentioned.</p> 6246 <p class="note">For example, the following are equivalent:</p> 6247 <table style="margin-top: 0.5em; margin-bottom: 0.5em" id= 6248 "table25"> 6249 <tr> 6250 <td><usesMetazone from="1991-10-27" to="2006-04-02" 6251 .../></td> 6252 </tr> 6253 <tr> 6254 <td><usesMetazone from="1991-10-27 00:00:00" 6255 to="2006-04-02 24:00:00" .../></td> 6256 </tr> 6257 <tr> 6258 <td><usesMetazone from="1991-10-<font color= 6259 "#FF0000"><b>26 24</b></font>:00:00" 6260 to="2006-04-<font color="#FF0000"><b>03 6261 00</b></font>:00:00" .../></td> 6262 </tr> 6263 </table> 6264 <p>If the <i>from</i> element is missing, it is assumed to be 6265 as far backwards in time as there is data for; if the <i>to</i> 6266 element is missing, then it is from this point onwards, with no 6267 known end point.</p> 6268 <p>The dates and times are specified in local time, unless 6269 otherwise noted. (In particular, the metazone values are in UTC 6270 (also known as GMT).</p> 6271 <h4><a name="Text_Directionality" href="#Text_Directionality" 6272 id="Text_Directionality">5.3.2 Text Directionality</a></h4> 6273 <p>The content of certain elements, such as date or number 6274 formats, may consist of several sub-elements with an inherent 6275 order (for example, the year, month, and day for dates). In 6276 some cases, the order of these sub-elements may be changed 6277 depending on the bidirectional context in which the element is 6278 embedded.</p> 6279 <p>For example, short date formats in languages such as Arabic 6280 may contain neutral or weak characters at the beginning or end 6281 of the element content. In such a case, the overall order of 6282 the sub-elements may change depending on the surrounding 6283 text.</p> 6284 <p>Element content whose display may be affected in this way 6285 should include an explicit direction mark, such as U+200E 6286 LEFT-TO-RIGHT MARK or U+200F RIGHT-TO-LEFT MARK, at the 6287 beginning or end of the element content, or both.</p> 6288 <h4><a name="Unicode_Sets" href="#Unicode_Sets" id= 6289 "Unicode_Sets">5.3.3 Unicode Sets</a></h4> 6290 <p>Some attribute values or element contents use 6291 <em>UnicodeSet</em> notation. A UnicodeSet represents a finite 6292 set of Unicode code points and strings, and is defined by lists 6293 of code points and strings, Unicode property sets, and set 6294 operators, all bounded by square brackets. In this context, a 6295 code point means a string consisting of exactly one code 6296 point.</p> 6297 <p>A UnicodeSet implements the semantics in <i>UTS #18: Unicode 6298 Regular Expressions</i> [<a href= 6299 "https://www.unicode.org/reports/tr41/#UTS18">UTS18</a>] Levels 6300 1 & 2 that are relevant to determining sets of characters. 6301 Note however that it may deviate from the syntax provided in 6302 [<a href= 6303 "https://www.unicode.org/reports/tr41/#UTS18">UTS18</a>], which 6304 is illustrative rather than a requirement. There is one 6305 exception to the supported semantics, Section <a href= 6306 "https://unicode.org/reports/tr18/#RL2.6">RL2.6</a> 6307 <em>Wildcards in Property Values</em>. That feature can be 6308 supported in clients such as ICU by implementing a “hook” as is 6309 done in the <a href= 6310 "https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7Bname%3D%2FAPPLE%2F%7D"> 6311 online UnicodeSet utilities</a>.</p> 6312 <p>A UnicodeSet may be cited in specifications outside of the 6313 domain of LDML. In such a case, the specification may specify a 6314 subset of the syntax provided here.</p> 6315 <p>The following provides EBNF syntax for a UnicodeSet:</p> 6316 <div align='center'> 6317 <table class='simple'> 6318 <tr> 6319 <th>Symbol</th> 6320 <th>Expression</th> 6321 <th>Examples</th> 6322 </tr> 6323 <tr> 6324 <th>root</th> 6325 <td><code>= prop<br> 6326 | '[-]'<br> 6327 | '[' [\-\^]? s seq+ ']'</code></td> 6328 <td>\p{x=y},<br> 6329 [abc]</td> 6330 </tr> 6331 <tr> 6332 <th>seq</th> 6333 <td><code>= root (s [\&\-] s root)* s<br> 6334 | range s</code></td> 6335 <td>[abc]-[cde], a<br></td> 6336 </tr> 6337 <tr> 6338 <th>range</th> 6339 <td><code>= char ('-' char)?<br> 6340 | '{' (s char)+ s '}'</code></td> 6341 <td>a, a-c, {abc}</td> 6342 </tr> 6343 <tr> 6344 <th>prop</th> 6345 <td><code>= '\' [pP] '{' propName ([≠=] s value1+)? 6346 '}'<br> 6347 | '[:' '^'? propName ([≠=] s value2+)? ':]'</code></td> 6348 <td>\p{x=y}, [:x=y:]<br></td> 6349 </tr> 6350 <tr> 6351 <th>propName</th> 6352 <td><code>= s [A-Za-z0-9] [A-Za-z0-9_\x20]* s</code></td> 6353 <td>General_Category,<br> 6354 General Category</td> 6355 </tr> 6356 <tr> 6357 <th>value1</th> 6358 <td><code>= [^\}]<br> 6359 | '\' quoted</code></td> 6360 <td>Lm,<br> 6361 \n,<br> 6362 \}</td> 6363 </tr> 6364 <tr> 6365 <th>value2</th> 6366 <td><code>= [^:]<br> 6367 | '\' quoted</code></td> 6368 <td>Lm,<br> 6369 \n,<br> 6370 \:</td> 6371 </tr> 6372 <tr> 6373 <th>char</th> 6374 <td><code>= [^\& \- \[ \[ \] \\ \} \{ [:Pat_WS:]]<br> 6375 | '\' quoted</code></td> 6376 <td>a, b, c, \n</td> 6377 </tr> 6378 <tr> 6379 <th>quoted</th> 6380 <td><code>= 'u' (hex{4} | bracketedHex)<br> 6381 | 'x' (hex{2} | bracketedHex)<br> 6382 | 'U00' ('0' hex{5} | '10' hex{4})<br> 6383 | 'N{' propName '}'<br> 6384 | [[\u0000-\U00010FFFF]-[uxUN]]</code></td> 6385 <td><em><strong>error</strong> if lengths not exact</em></td> 6386 </tr> 6387 <tr> 6388 <th>charName</th> 6389 <td><code>= s [A-Za-z0-9] [-A-Za-z0-9_\x20]* s</code></td> 6390 <td>TIBETAN LETTER -A</td> 6391 </tr> 6392 <tr> 6393 <th>bracketedHex</th> 6394 <td><code>= '{' s hexCodePoint (s hexCodePoint)* s 6395 '}'</code></td> 6396 <td>{61 2019 62}</td> 6397 </tr> 6398 <tr> 6399 <th>hexCodePoint</th> 6400 <td><code>= hex{1,5} | '10' hex{4}</code></td> 6401 <td> </td> 6402 </tr> 6403 <tr> 6404 <th>hex</th> 6405 <td><code>= [0-9A-Fa-f]</code></td> 6406 <td> </td> 6407 </tr> 6408 <tr> 6409 <th>s</th> 6410 <td><code>= [:Pattern_White_Space:]*</code></td> 6411 <td>optional whitespace</td> 6412 </tr> 6413 </table> 6414 </div> 6415 <p>Some constraints on UnicodeSet syntax are not captured by 6416 this EBNF. Notably, property names and values are restricted to 6417 those supported by the implementation, and have additional constraints imposed by 6418 [<a href="https://unicode.org/reports/tr41/#UAX44">UAX44</a>]. In addition, quoted 6419 values that resolve to more than one code point are disallowed in ranges of the form 6420 <code>char '-' char</code>.</p> 6421 <p>The syntax characters are listed in the table below:</p> 6422 <table> 6423 <tbody> 6424 <tr> 6425 <th>Char</th> 6426 <th>Hex</th> 6427 <th>Name</th> 6428 <th>Usage</th> 6429 </tr> 6430 <tr> 6431 <td>$</td> 6432 <td>U+0024</td> 6433 <td>DOLLAR SIGN</td> 6434 <td>Equivalent of \uFFFF (This is for implementations 6435 that return \uFFFF when accessing before the first or 6436 after the last character)</td> 6437 </tr> 6438 <tr> 6439 <td>&</td> 6440 <td>U+0026</td> 6441 <td>AMPERSAND</td> 6442 <td>Intersecting UnicodeSets</td> 6443 </tr> 6444 <tr> 6445 <td>-</td> 6446 <td>U+002D</td> 6447 <td>HYPHEN-MINUS</td> 6448 <td>Ranges of characters; also set difference.</td> 6449 </tr> 6450 <tr> 6451 <td>:</td> 6452 <td>U+003A</td> 6453 <td>COLON</td> 6454 <td>POSIX-style property syntax</td> 6455 </tr> 6456 <tr> 6457 <td>[</td> 6458 <td>U+005B</td> 6459 <td>LEFT SQUARE BRACKET</td> 6460 <td>Grouping; POSIX property syntax</td> 6461 </tr> 6462 <tr> 6463 <td>]</td> 6464 <td>U+005D</td> 6465 <td>RIGHT SQUARE BRACKET</td> 6466 <td>Grouping; POSIX property syntax</td> 6467 </tr> 6468 <tr> 6469 <td>\</td> 6470 <td>U+005C</td> 6471 <td>REVERSE SOLIDUS</td> 6472 <td>Escaping</td> 6473 </tr> 6474 <tr> 6475 <td>^</td> 6476 <td>U+005E</td> 6477 <td>CIRCUMFLEX ACCENT</td> 6478 <td>Posix negation syntax</td> 6479 </tr> 6480 <tr> 6481 <td>{</td> 6482 <td>U+007B</td> 6483 <td>LEFT CURLY BRACKET</td> 6484 <td>Strings in set; Perl property syntax</td> 6485 </tr> 6486 <tr> 6487 <td>}</td> 6488 <td>U+007D</td> 6489 <td>RIGHT CURLY BRACKET</td> 6490 <td>Strings in set; Perl property syntax</td> 6491 </tr> 6492 <tr> 6493 <td> </td> 6494 <td>U+0020 U+0009..U+000D U+0085<br> 6495 U+200E U+200F<br> 6496 U+2028 U+2029</td> 6497 <td>ASCII whitespace,<br> 6498 LRM, RLM,<br> 6499 LINE/PARAGRAPH SEPARATOR</td> 6500 <td>Ignored except when escaped</td> 6501 </tr> 6502 </tbody> 6503 </table><br> 6504 <h5><a href="#Lists_of_Code_Points" name="Lists_of_Code_Points" 6505 id="Lists_of_Code_Points">5.3.3.1 Lists of Code Points</a></h5> 6506 <p>Lists are a sequence of strings that may include ranges, 6507 which are indicated by a '-' between two code points, as in 6508 "a-z". The sequence <em>start-end</em> specifies the range of 6509 all code points from the start to end, inclusive, in Unicode 6510 order. For example, <b>[a c d-f m]</b> is equivalent to <b>[a c 6511 d e f m]</b>. Whitespace can be freely used for clarity, as 6512 <b>[a c d-f m]</b> means the same as <b>[acd-fm]</b>.</p> 6513 <p>A string with multiple code points is represented in a list 6514 by being surrounded by curly braces, such as in <strong>[a-z 6515 {ch}]</strong>. It can be used with the range notation, as 6516 described in <em>Section <a href="#String_Range">5.3.4 String 6517 Range</a></em> . There is an additional restriction on string 6518 ranges in a UnicodeSet: the number of codepoints in the first 6519 string of the range must be identical to the number in the 6520 second. Thus [{ab}-{c}] and [{ab}-c] are invalid.</p> 6521 <p>In UnicodeSets, there are two ways to quote syntax code 6522 points:</p> 6523 <p><a name="Backslash_Escapes" id= 6524 "Backslash_Escapes"></a>Outside of single quotes, certain 6525 backslashed code point sequences can be used to quote code 6526 points:</p> 6527 <table class='simple'> 6528 <tr> 6529 <td>\x{h...h}<br> 6530 \u{h...h}</td> 6531 <td>list of 1-6 hex digits ([0-9A-Fa-f]), separated by 6532 spaces</td> 6533 </tr> 6534 <tr> 6535 <td>\xhh</td> 6536 <td>2 hex digits</td> 6537 </tr> 6538 <tr> 6539 <td>\uhhhh</td> 6540 <td>Exactly 4 hex digits</td> 6541 </tr> 6542 <tr> 6543 <td>\Uhhhhhhhh</td> 6544 <td>Exactly 8 hex digits</td> 6545 </tr> 6546 <tr> 6547 <td>\a</td> 6548 <td>U+0007 (BEL / ALERT)</td> 6549 </tr> 6550 <tr> 6551 <td>\b</td> 6552 <td>U+0008 (BACKSPACE)</td> 6553 </tr> 6554 <tr> 6555 <td>\t</td> 6556 <td>U+0009 (TAB / CHARACTER TABULATION)</td> 6557 </tr> 6558 <tr> 6559 <td>\n</td> 6560 <td>U+000A (LINE FEED)</td> 6561 </tr> 6562 <tr> 6563 <td>\v</td> 6564 <td>U+000B (LINE TABULATION)</td> 6565 </tr> 6566 <tr> 6567 <td>\f</td> 6568 <td>U+000C (FORM FEED)</td> 6569 </tr> 6570 <tr> 6571 <td>\r</td> 6572 <td>U+000D (CARRIAGE RETURN)</td> 6573 </tr> 6574 <tr> 6575 <td>\\</td> 6576 <td>U+005C (BACKSLASH / REVERSE SOLIDUS)</td> 6577 </tr> 6578 <tr> 6579 <td>\N{name}</td> 6580 <td>The Unicode code point named "name".</td> 6581 </tr> 6582 <tr> 6583 <td>\p{…},\P{…}</td> 6584 <td>Unicode property (see below)</td> 6585 </tr> 6586 </table><br> 6587 <p>Anything else following a backslash is mapped to itself, 6588 except the property syntax described below, or in an 6589 environment where it is defined to have some special 6590 meaning.</p> 6591 <p>Any code point formed as the result of a backslash escape 6592 loses any special meaning and is treated as a literal. In 6593 particular, note that \x, \u and \U escapes create literal code 6594 points. (In contrast, Java treats Unicode escapes as just a way 6595 to represent arbitrary code points in an ASCII source file, and 6596 any resulting code points are <i><b>not</b></i> tagged as 6597 literals.)</p> 6598 <p>Unicode property sets are defined as described as described 6599 in <i>UTS #18: Unicode Regular Expressions</i> [<a href= 6600 "https://www.unicode.org/reports/tr41/#UTS18">UTS18</a>], Level 6601 1 and RL2.5, including the syntax where given. For an example 6602 of a concrete implementation of this, see [<a href= 6603 "#ICUUnicodeSet">ICUUnicodeSet</a>].</p> 6604 <h5><a href="#Unicode_Properties" name="Unicode_Properties" id= 6605 "Unicode_Properties">5.3.3.2 Unicode Properties</a></h5> 6606 <p>Briefly, Unicode property sets are specified by any Unicode 6607 property and a value of that property, such as 6608 <b>[:General_Category=Letter:]</b>. for Unicode letters or 6609 <b>\p{uppercase}</b> is the set of upper case letters in 6610 Unicode. The property names are defined by the 6611 PropertyAliases.txt file and the property values by the 6612 PropertyValueAliases.txt file. For more information, see 6613 [<a href="https://unicode.org/reports/tr41/#UAX44">UAX44</a>]. 6614 The syntax for specifying the property sets is an extension of 6615 either POSIX or Perl syntax, by the addition of 6616 "=<value>". For example, you can match letters by using 6617 the POSIX-style syntax:</p> 6618 <p><b>[:General_Category=Letter:]</b></p> 6619 <p>or by using the Perl-style syntax</p> 6620 <p><b>\p{General_Category=Letter}</b>.</p> 6621 <p>Property names and values are case-insensitive, and 6622 whitespace, "-", and "_" are ignored. The property name can be 6623 omitted for the <strong>General_Category</strong> and 6624 <strong>Script</strong> properties, but is required for other 6625 properties. If the property value is omitted, it is assumed to 6626 represent a boolean property with the value "true". Thus 6627 <b>[:Letter:]</b> is equivalent to 6628 <b>[:General_Category=Letter:]</b>, and <b>[:Wh-ite-s 6629 pa_ce:]</b> is equivalent to <b>[:Whitespace=true:]</b>.</p> 6630 <p>The table below shows the two kinds of syntax: POSIX and 6631 Perl style. Also, the table shows the "Negative" version, which 6632 is a property that excludes all code points of a given kind. 6633 For example, <b>[:^Letter:]</b> matches all code points that 6634 are not <b>[:Letter:]</b>.</p> 6635 <table> 6636 <tr> 6637 <th> </th> 6638 <th>Positive</th> 6639 <th>Negative</th> 6640 </tr> 6641 <tr> 6642 <td>POSIX-style Syntax</td> 6643 <td>[:type=value:]</td> 6644 <td>[:^type=value:]</td> 6645 </tr> 6646 <tr> 6647 <td>Perl-style Syntax</td> 6648 <td>\p{type=value}</td> 6649 <td>\P{type=value}</td> 6650 </tr> 6651 </table> 6652 <h5><a href="#Boolean_Operations" name="Boolean_Operations" id= 6653 "Boolean_Operations">5.3.3.3 Boolean Operations</a></h5> 6654 <p>The low-level lists or properties then can be freely 6655 combined with the normal set operations (union, inverse, 6656 difference, and intersection):</p> 6657 <ul> 6658 <li>To union two sets, simply concatenate them. For example, 6659 <b>[[:letter:] [:number:]]</b></li> 6660 <li>To intersect two sets, use the '&' operator. For 6661 example, <b>[[:letter:] & [a-z]]</b></li> 6662 <li>To take the set-difference of two sets, use the '-' 6663 operator. For example, <b>[[:letter:] - [a-z]]</b></li> 6664 <li>To invert a set, place a '^' immediately after the 6665 opening '['. For example, <b>[^a-z]</b>. In any other 6666 location, the '^' does not have a special meaning. The 6667 inversion [^X] is equivalent to [[\x{0}-\x{10FFFF}]-[X]]. 6668 Thus multi-code point strings are discarded.</li> 6669 <li>Symmetric difference (~) is not supported.</li> 6670 </ul> 6671 <p>The binary operators '&', '-', and the implicit union 6672 have equal precedence and bind left-to-right. Thus 6673 <b>[[:letter:]-[a-z]-[\u0100-\u01FF]]</b> is equal to 6674 <b>[[[:letter:]-[a-z]]-[\u0100-\u01FF]]</b>. Another example is 6675 the set <b>[[ace][bdf] - [abc][def]]</b>, which is not the 6676 empty set, but instead equal to <b>[[[[ace] [bdf]] - [abc]] 6677 [def]]</b>, which equals <b>[[[abcdef] - [abc]] [def]]</b>, 6678 which equals <b>[[def] [def]]</b>, which equals 6679 <b>[def]</b>.</p> 6680 <p><strong>One caution:</strong> the '&' and '-' operators 6681 operate between sets. That is, they must be immediately 6682 preceded and immediately followed by a set. For example, the 6683 pattern <b>[[:Lu:]-A]</b> is illegal, since it is interpreted 6684 as the set <b>[:Lu:]</b> followed by the incomplete range 6685 <b>-A</b>. To specify the set of upper case letters except for 6686 'A', enclose the 'A' in brackets: <b>[[:Lu:]-[A]]</b>.</p> 6687 <h5><a href="#UnicodeSet_Examples" name="UnicodeSet_Examples" 6688 id="UnicodeSet_Examples">5.3.3.4 UnicodeSet Examples</a></h5> 6689 <p>The following table summarizes the syntax that can be 6690 used.</p> 6691 <table style="margin-top: 0.5em; margin-bottom: 0.5em" id= 6692 "table18"> 6693 <tr> 6694 <th>Example</th> 6695 <th>Description</th> 6696 </tr> 6697 <tr> 6698 <td nowrap>[a]</td> 6699 <td>The set containing 'a' alone</td> 6700 </tr> 6701 <tr> 6702 <td nowrap>[a-z]</td> 6703 <td>The set containing 'a' through 'z' and all letters in 6704 between, in Unicode order.<br> 6705 Thus it is the same as [\u0061-\u007A].</td> 6706 </tr> 6707 <tr> 6708 <td nowrap>[^a-z]</td> 6709 <td>The set containing all code points but 'a' through 6710 'z'.<br> 6711 Thus it is the same as [\u0000-\u0060 6712 \u007B-\x{10FFFF}].</td> 6713 </tr> 6714 <tr> 6715 <td nowrap>[[pat1][pat2]]</td> 6716 <td>The union of sets specified by pat1 and pat2</td> 6717 </tr> 6718 <tr> 6719 <td nowrap>[[pat1]&[pat2]]</td> 6720 <td>The intersection of sets specified by pat1 and 6721 pat2</td> 6722 </tr> 6723 <tr> 6724 <td nowrap>[[pat1]-[pat2]]</td> 6725 <td>The asymmetric difference of sets specified by pat1 and 6726 pat2</td> 6727 </tr> 6728 <tr> 6729 <td nowrap>[a {ab} {ac}]</td> 6730 <td>The code point 'a' and the multi-code point strings 6731 "ab" and "ac"</td> 6732 </tr> 6733 <tr> 6734 <td nowrap>[x\u{61 2019 62}y]</td> 6735 <td>Equivalent to [x\u0061\u2019\u0062y] (= [xa’by])</td> 6736 </tr> 6737 <tr> 6738 <td nowrap>[{ax}-{bz}]</td> 6739 <td>The set containing [{ax} {ay} {az} {bx} {by} {bz}], 6740 using the range syntax to get all the strings from {ax} to 6741 {bz} as described in <em>Section <a href= 6742 "#String_Range">5.3.4 String Range</a></em>.</td> 6743 </tr> 6744 <tr> 6745 <td nowrap>[:Lu:]</td> 6746 <td>The set of code points with a given property value, as 6747 defined by PropertyValueAliases.txt. In this case, these 6748 are the Unicode upper case letters. The long form for this 6749 is <b>[:General_Category=Uppercase_Letter:]</b>.</td> 6750 </tr> 6751 <tr> 6752 <td nowrap>[:L:]</td> 6753 <td>The set of code points belonging to all Unicode 6754 categories starting with 'L', that is, 6755 <b>[[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]]</b>. The long form for 6756 this is <b>[:General_Category=Letter:]</b>.</td> 6757 </tr> 6758 </table><br> 6759 <h4><a name="String_Range" href="#String_Range" id= 6760 "String_Range">5.3.4 String Range</a></h4> 6761 <p>A String Range is a compact format for specifying a list of 6762 strings.</p> 6763 <p><strong>Syntax:<br></strong></p> 6764 <blockquote> 6765 <p>X <em>sep</em> Y<br></p> 6766 </blockquote> 6767 <p>The separator and the format of strings X, Y may vary 6768 depending on the domain. For example,</p> 6769 <ul> 6770 <li>for the validity files the separator is ~,</li> 6771 <li>for UnicodeSet the separator is -, and any 6772 multi-codepoint string is enclosed in {…}.</li> 6773 </ul> 6774 <p><strong>Validity: <br></strong></p> 6775 <blockquote> 6776 <p>A string range X <em>sep</em> Y is valid iff len(X) ≥ 6777 len(Y) > 0, where len(X) is the length of X in code 6778 points.</p> 6779 <p><em>There may be additional, domain-specific requirements 6780 for validity of the expansion of the string range.</em></p> 6781 </blockquote> 6782 <p><strong>Interpretation:<br></strong></p> 6783 <ol> 6784 <li>Break X into P and S, where len(S) = len(Y) 6785 <ul> 6786 <li>Note that P will be an empty string if the lengths of 6787 X and Y are equal.</li> 6788 </ul> 6789 </li> 6790 <li>Form the combinations of all 6791 P+(s₀..y₀)+(s₁..y₁)+...(sₙ..yₙ) 6792 <ul> 6793 <li>s₀ is the first code point in S, etc.</li> 6794 </ul> 6795 </li> 6796 </ol> 6797 <p><strong>Examples:</strong></p> 6798 <table> 6799 <tbody> 6800 <tr> 6801 <td>ab-ad</td> 6802 <td>→</td> 6803 <td>ab ac ad</td> 6804 </tr> 6805 <tr> 6806 <td>ab-d</td> 6807 <td>→</td> 6808 <td>ab ac ad</td> 6809 </tr> 6810 <tr> 6811 <td>ab-cd</td> 6812 <td>→</td> 6813 <td>ab ac ad bb bc bd cb cc cd</td> 6814 </tr> 6815 <tr> 6816 <td>-</td> 6817 <td>→</td> 6818 <td> </td> 6819 </tr> 6820 <tr> 6821 <td>-</td> 6822 <td>→</td> 6823 <td> </td> 6824 </tr> 6825 </tbody> 6826 </table><br> 6827 <h3><a name="Identity_Elements" href="#Identity_Elements" id= 6828 "Identity_Elements">5.4 Identity Elements</a></h3> 6829 <p class="dtd"><!ELEMENT identity (alias | (version, 6830 generation?, language, script?, territory?, variant?, special*) 6831 ) ></p> 6832 <p>The identity element contains information identifying the 6833 target locale for this data, and general information about the 6834 version of this data.</p> 6835 <p class="element2"><version number="<u>$</u>Revision: 1.227 6836 <u>$</u>"></p> 6837 <p>The version element provides, in an attribute, the version 6838 of this file. The contents of the element can contain 6839 textual notes about the changes between this version and the 6840 last. For example:</p> 6841 <blockquote> 6842 <pre><version number="<span style= 6843 "color: blue">1.1</span>"><span style= 6844 "color: blue">Various notes and changes in version 1.1</span></version></pre> 6845 <p>This is not to be confused with the version attribute on 6846 the ldml element, which tracks the dtd version.</p> 6847 </blockquote> 6848 <p class="element2"><generation date="<u>$</u>Date: 6849 2007/07/17 23:41:16 <u>$</u>" /></p> 6850 <p>The generation element is now deprecated. It was used to 6851 contain the last modified date for the data. This could be in 6852 two formats: ISO 8601 format, or CVS format (illustrated by the 6853 example above).</p> 6854 <p class="element2"><language type="<span style= 6855 "color: blue">en</span>"/></p> 6856 <p>The language code is the primary part of the specification 6857 of the locale id, with values as described above.</p> 6858 <p class="element2"><script type="<span style= 6859 "color: blue">Latn</span>" /></p> 6860 <p>The script code may be used in the identification of written 6861 languages, with values described above.</p> 6862 <p class="element2"><territory type="<span style= 6863 "color: blue">US</span>"/></p> 6864 <p>The territory code is a common part of the specification of 6865 the locale id, with values as described above.</p> 6866 <p class="element2"><variant type="<span class= 6867 "attributeValue">NYNORSK</span>"/></p> 6868 <p>The variant code is the tertiary part of the specification 6869 of the locale id, with values as described above.</p> 6870 <p>When combined according to the rules described in 6871 <i><a href="#Unicode_Language_and_Locale_Identifiers">Section 6872 3, Unicode Language and Locale Identifiers</a></i>, the 6873 language element, along with any of the optional script, 6874 territory, and variant elements, must identify a known, stable 6875 locale identifier. Otherwise, it is an error.</p> 6876 <h3><a name="Valid_Attribute_Values" href= 6877 "#Valid_Attribute_Values" id="Valid_Attribute_Values">5.5 Valid 6878 Attribute Values</a></h3> 6879 <p>The <a href="#DTD_Annotations">DTD Annotations</a> in Section 5.7 are used to determine whether elements, attributes, or attribute values are valid (or deprecated).</p> 6880 6881 <h3><a name="Canonical_Form" href="#Canonical_Form" id= 6882 "Canonical_Form">5.6 Canonical Form</a></h3> 6883 <p>The following are restrictions on the format of LDML files 6884 to allow for easier parsing and comparison of files.</p> 6885 <p>Peer elements have consistent order. That is, if the DTD or 6886 this specification requires the following order in an element 6887 <strong>foo</strong>:</p> 6888 <pre><foo> 6889 <pattern> 6890 <somethingElse> 6891</foo></pre> 6892 <p>It can never require the reverse order in a different 6893 element <strong>bar</strong>.</p> 6894 <pre><bar> 6895 <somethingElse> 6896 <pattern> 6897</bar></pre> 6898 <p>Note that there was one case that had to be corrected in 6899 order to make this true. For that reason, pattern occurs twice 6900 under currency:</p> 6901 <pre class="dtd"> 6902 <!ELEMENT currency (alias | (pattern*, displayName?, symbol?, pattern*, 6903decimal?, group?, special*)) ></pre> 6904 <p><a href="https://www.w3.org/TR/REC-xml/">XML</a> files can 6905 have a wide variation in textual form, while representing 6906 precisely the same data. By putting the LDML files in the 6907 repository into a canonical form, this allows us to use the 6908 simple diff tools used widely (and in CVS) to detect 6909 differences when vetting changes, without those tools being 6910 confused. This is not a requirement on other uses of LDML; just 6911 simply a way to manage repository data more easily.</p> 6912 <h4><a name="Content" href="#Content" id="Content">5.6.1 6913 Content</a></h4> 6914 <ol> 6915 <li>All start elements are on their own line, indented by 6916 <i>depth</i> tabs.</li> 6917 <li>All end elements (except for leaf nodes) are on their own 6918 line, indented by <i>depth</i> tabs.</li> 6919 <li>Any leaf node with empty content is in the form 6920 <foo/>.</li> 6921 <li>There are no blank lines except within comments or 6922 content.</li> 6923 <li>Spaces are used within a start element. There are no 6924 extra spaces within elements. 6925 <ul> 6926 <li><code><version number="1.2"/></code>, not 6927 <code><version number = "1.2" /></code></li> 6928 <li><code></identity></code>, not 6929 <code></identity ></code></li> 6930 </ul> 6931 </li> 6932 <li>All attribute values use double quote ("), not single 6933 (').</li> 6934 <li>There are no CDATA sections, and no escapes except those 6935 absolutely required. 6936 <ul> 6937 <li>no &apos; since it is not necessary</li> 6938 <li>no '&#x61;', it would be just 'a'</li> 6939 </ul> 6940 </li> 6941 <li>All attributes with defaulted values are suppressed.</li> 6942 <li>The draft and alt="proposed.*" attributes are only on 6943 leaf elements.</li> 6944 <li>The tzid are canonicalized in the following way: 6945 <ol> 6946 <li type="a">All tzids as of as CLDR 1.1 (2004.06.08) in 6947 zone.tab are canonical.</li> 6948 <li>After that point, the first time a tzid is 6949 introduced, that is the canonical form.</li> 6950 </ol> 6951 <p>That is, new IDs are added, but existing ones keep the 6952 original form. The <i>TZ</i> timezone database keeps a set 6953 of equivalences in the "backward" file. These are used to 6954 map other tzids to the canonical form. For example, when 6955 <code>America/Argentina/Catamarca</code> was introduced as 6956 the new name for the previous 6957 <code>America/Catamarca</code> , a link was added in the 6958 backward file.</p> 6959 <p><code>Link America/Argentina/Catamarca 6960 America/Catamarca</code></p> 6961 </li> 6962 </ol> 6963 <p><i>Example:</i></p> 6964 <pre><ldml draft="unconfirmed" > 6965 <identity> 6966 <version number="1.2"/> 6967 <language type="en"/> 6968 <territory type="AS"/> 6969 </identity> 6970 <numbers> 6971 <currencyFormats> 6972 <currencyFormatLength> 6973 <currencyFormat> 6974 <pattern>¤#,##0.00;(¤#,##0.00)</pattern> 6975 </currencyFormat> 6976 </currencyFormatLength> 6977 </currencyFormats> 6978 </numbers> 6979</ldml></pre> 6980 <h4><a name="Ordering" href="#Ordering" id="Ordering">5.6.2 6981 Ordering</a></h4> 6982 <p>An element is ordered first by the element name, and then if 6983 the element names are identical, by the sorted set of 6984 attribute-value pairs. For the latter, compare the first pair 6985 in each (in sorted order by attribute pair). If not identical, 6986 go to the second pair, and so on.</p> 6987 <p>Elements and attributes are ordered according to their order 6988 in the respective DTDs. Attribute value comparison is a bit 6989 more complicated, and may depend on the attribute and type. 6990 This is currently done with specific ordering tables.</p> 6991 <p>Any future additions to the DTD must be structured so as to 6992 allow compatibility with this ordering. See also <a href= 6993 "#Valid_Attribute_Values">Section 5.5 Valid Attribute 6994 Values.</a></p> 6995 <h4><a name="Comments" href="#Comments" id="Comments">5.6.3 6996 Comments</a></h4> 6997 <ol> 6998 <li>Comments are of the form <!-- <i>stuff</i> 6999 -->.</li> 7000 <li>They are logically attached to a node. There are 4 kinds: 7001 <ol> 7002 <li>Inline always appear after a leaf node, on the same 7003 line at the end. These are a single line.</li> 7004 <li>Preblock comments always precede the attachment node, 7005 and are indented on the same level.</li> 7006 <li>Postblock comments always follow the attachment node, 7007 and are indented on the same level.</li> 7008 <li>Final comment, after </ldml></li> 7009 </ol> 7010 </li> 7011 <li>Multiline comments (except the final comment) have each 7012 line after the first indented to one deeper level.</li> 7013 </ol> 7014 <p><b>Examples:</b></p> 7015 <pre><eraAbbr> 7016 <era type="0">BC</era> <!-- might add alternate BDE in the future --> 7017... 7018<timeZoneNames> 7019 <!-- Note: zones that do not use daylight time need further work --> 7020 <zone type="America/Los_Angeles"> 7021 ... 7022 <!-- Note: the following is known to be sparse, 7023 and needs to be improved in the future --> 7024 <zone type="Asia/Jerusalem"></pre> 7025 <h3><a name="DTD_Annotations" href="#DTD_Annotations" id= 7026 "DTD_Annotations">5.7 DTD Annotations</a></h3> 7027 <p>The information in a standard DTD is insufficient for use in 7028 CLDR. To make up for that, DTD annotations are added. These are 7029 of the form<br> 7030 <!--@...--><br> 7031 and are included below the !ELEMENT or !ATTLIST line that they 7032 apply to. The current annotations are:</p> 7033 <table> 7034 <tr> 7035 <th>Type</th> 7036 <th>Description</th> 7037 </tr> 7038 <tr> 7039 <td><!--@VALUE--></td> 7040 <td>The attribute is not distinguishing, and is treated 7041 like an element value</td> 7042 </tr> 7043 <tr> 7044 <td><!--@METADATA--></td> 7045 <td>The attribute is a “comment” on the data, like the 7046 draft status. It is not typically used in 7047 implementations.</td> 7048 </tr> 7049 <tr> 7050 <td><!--@ORDERED--></td> 7051 <td>The element's children are ordered, and do not 7052 inherit.</td> 7053 </tr> 7054 <tr> 7055 <td><!--@DEPRECATED--></td> 7056 <td>The element or attribute is deprecated, and should not 7057 be used.</td> 7058 </tr> 7059 <tr> 7060 <td><!--@DEPRECATED: attribute-value1, 7061 attribute-value2--></td> 7062 <td>The attribute values are deprecated, and should not be 7063 used. Spaces between tokens are not significant.</td> 7064 </tr> 7065 <tr> 7066 <td><!--@MATCH:{attribute value constraint}--></td> 7067 <td>Requires the attribute value to match the constraint.</td> 7068 </tr> 7069 </table> 7070 <p>There is additional information in the 7071 attributeValueValidity.xml file that is used internally for 7072 testing. For example, the following line indicates that the 7073 'currency' element in the ldml dtd must have values from the 7074 bcp47 'cu' type.</p> 7075 <p class='example'><attributeValues dtds='ldml' 7076 elements='currency' 7077 attributes='type'>$_bcp47_cu</attributeValues></p> 7078 <p>The element values may be literals, regular expressions, or 7079 variables (some of which are set programmatically according to 7080 other CLDR data, such as the above. However, the information as 7081 this point does not cover all attribute values, is used only 7082 for testing, and should not be used in implementations since 7083 the structure may change without notice.</p> 7084 <h4>5.7.1<a href="#match_expressions" name="match_expressions">Attribute Value Constraints</a></h4> 7085 <p>The following are constraints on the attribute values. Note: in future versions, the format may change, and/or the constaints may be tightened.</p> 7086 <table class='simple'> 7087 <tbody> 7088 <tr> 7089 <th>Constraint</th> 7090 <th colspan="2">Comments</th> 7091 </tr> 7092 <tr> 7093 <td>any</td> 7094 <td colspan="2">any string value</td> 7095 </tr> 7096 <tr> 7097 <td>any/TODO</td> 7098 <td colspan="2">placeholder for future constraints</td> 7099 </tr> 7100 <tr> 7101 <td>bcp47/anykey</td> 7102 <td colspan="2">any bcp47 key or tkey</td> 7103 </tr> 7104 <tr> 7105 <td>bcp47/anyvalue</td> 7106 <td colspan="2">any bcp47 value (type) or tvalue</td> 7107 </tr> 7108 <tr> 7109 <td>literal/{literal values}</td> 7110 <td colspan="2">comma separated</td> 7111 </tr> 7112 <tr> 7113 <td>regex/{regex expression}</td> 7114 <td colspan="2">valid regex expression</td> 7115 </tr> 7116 <tr> 7117 <td>bcp47/{key or tkey}</td> 7118 <td colspan="2">matches possible values for that key or tkey</td> 7119 </tr> 7120 <tr> 7121 <td>metazone</td> 7122 <td colspan="2">valid metazone</td> 7123 </tr> 7124 <tr> 7125 <td>range/{start_number~{end_number}}</td> 7126 <td colspan="2">number between (inclusive) start and end</td> 7127 </tr> 7128 <tr> 7129 <td>time/{time or date or date-time pattern}</td> 7130 <td colspan="2">eg HH:mm</td> 7131 </tr> 7132 <tr> 7133 <td>unicodeset/{unicodeset pattern}</td> 7134 <td colspan="2">valid unicodeset</td> 7135 </tr> 7136 <tr> 7137 <td rowspan="4">validity/{field}</td> 7138 <td colspan="2">currency, language, locale, region, script, subdivision, short-unit, unit, variant</td> 7139 </tr> 7140 <tr> 7141 <td colspan="2">The field can be qualified by particular enums, such as:</td> 7142 </tr> 7143 <tr> 7144 <td>validity/unit/regular deprecated</td> 7145 <td>matches only <em>deprecated</em> and <em>regular</em></td> 7146 </tr> 7147 <tr> 7148 <td>validity/unit/!deprecated</td> 7149 <td>matches all but <em>deprecated</em></td> 7150 </tr> 7151 <tr> 7152 <td>version</td> 7153 <td colspan="2">1 to 4 digit field version, such as 35.3.9</td> 7154 </tr> 7155 <tr> 7156 <td>set/{match}</td> 7157 <td colspan="2">set of elements that match {match}</td> 7158 </tr> 7159 <tr> 7160 <td>or/{match1}XX{match2}…</td> 7161 <td colspan="2">matches at least one of {match1}, etc</td> 7162 </tr> 7163 </tbody> 7164 </table><br> 7165 <h2><a name="Property_Data" href="#Property_Data" id= 7166 "Property_Data">6 Property Data</a></h2> 7167 <p>Some data in CLDR does not use an XML format, but rather a 7168 semicolon-delimited format derived from that of the Unicode 7169 Character Database. That is because the data is more likely to 7170 be parsed by implementations that already parse UCD data. Those 7171 files are present in the common/properties directory.</p> 7172 <p>Each file has a header that explains the format and usage of 7173 the data.</p> 7174 <h3><a name="Script_Metadata" href="#Script_Metadata" id= 7175 "Script_Metadata">6.1 Script Metadata</a></h3> 7176 <p><code>scriptMetadata.txt</code></p> 7177 <p>This file provides general information about scripts that 7178 may be useful to implementations processing text. The 7179 information is the best currently available, and may change 7180 between versions of CLDR. The format is similar to Unicode 7181 Character Database property file, and is documented in the 7182 header of the data file.</p> 7183 <h3><a name="Extended_Pictographic" href= 7184 "#Extended_Pictographic" id="Extended_Pictographic">6.2 7185 Extended Pictographic</a></h3> 7186 <p><code>ExtendedPictographic.txt</code></p> 7187 <p>This file was used to define the ExtendedPictographic data 7188 used for “future-proofing” emoji behavior, especially in 7189 segmentation. As of Emoji version 11.0, the set of 7190 Extended_Pictographic is incorporated into the emoji data files 7191 found at <a href= 7192 "https://unicode.org/Public/emoji/">unicode.org/Public/emoji/</a>.</p> 7193 <h3><a name="Labels.txt" href="#Labels.txt" id="Labels.txt">6.3 7194 Labels.txt</a></h3> 7195 <p><code>labels.txt</code></p> 7196 <p>This file provides general information about associations of 7197 labels to characters that may be useful to implementations of 7198 character-picking applications. The information is the best 7199 currently available, and may change between versions of CLDR. 7200 The format is similar to Unicode Character Database property 7201 file, and is documented in the header of the data file.</p> 7202 <p>Initially, the contents are focused on emoji, but may be 7203 expanded in the future to other types of characters. Note that 7204 a character may have multiple labels.</p> 7205 <h3><a name="Segmentation_Tests" href="#Segmentation_Tests">6.4 7206 Segmentation Tests</a></h3> 7207 <p>CLDR provides a tailoring to the <a href="https://unicode.org/reports/tr29/">Grapheme Cluster Break (gcb)</a> algorithm to avoid splitting Indic aksaras. The corresponding test files for that are located in common/properties/segments/, along with a readme.txt that provides more details. There are also specific test files for the supported Indic scripts in the unittest directory.</p> 7208 <h2><a name="Format_Parse_Issues" href="#Format_Parse_Issues" 7209 id="Format_Parse_Issues">7 Issues in Formatting and 7210 Parsing</a></h2> 7211 <h3><a name="Lenient_Parsing" href="#Lenient_Parsing" id= 7212 "Lenient_Parsing">7.1 Lenient Parsing</a></h3> 7213 <h4><a name="Motivation" href="#Motivation" id= 7214 "Motivation">7.1.1 Motivation</a></h4> 7215 <p>User input is frequently messy. Attempting to parse it by 7216 matching it exactly against a pattern is likely to be 7217 unsuccessful, even when the meaning of the input is clear to a 7218 human being. For example, for a date pattern of "MM/dd/yy", the 7219 input "June 1, 2006" will fail.</p> 7220 <p>The goal of lenient parsing is to accept user input whenever 7221 it is possible to decipher what the user intended. Doing so 7222 requires using patterns as data to guide the parsing process, 7223 rather than an exact template that must be matched. This 7224 informative section suggests some heuristics that may be useful 7225 for lenient parsing of dates, times, and numbers.</p> 7226 <h4><a name="Loose_Matching" href="#Loose_Matching" id= 7227 "Loose_Matching">7.1.2 Loose Matching</a></h4> 7228 <p>Loose matching ignores attributes of the strings being 7229 compared that are not important to matching. It involves the 7230 following steps:</p> 7231 <ul> 7232 <li>Remove "." from currency symbols and other fields used 7233 for matching, and also from the input string unless: 7234 <ul> 7235 <li>"." is in the decimal set, and</li> 7236 <li>its position in the input string is immediately 7237 before a decimal digit</li> 7238 </ul> 7239 </li> 7240 <li>Ignore all format characters: in particular, ignore any 7241 RLM, LRM or ALM used to control BIDI formatting.</li> 7242 <li>Ignore all characters in [:Zs:] unless they occur between 7243 letters. (In the heuristics below, even those between letters 7244 are ignored except to delimit fields)</li> 7245 <li>Map all characters in [:Dash:] to U+002D 7246 HYPHEN-MINUS</li> 7247 <li>Use the data in the <character-fallback> element to 7248 map equivalent characters (for example, curly to straight 7249 apostrophes). Other apostrophe-like characters should also be 7250 treated as equivalent, especially if the character actually 7251 used in a format may be unavailable on some keyboards. For 7252 example: 7253 <ul> 7254 <li>U+02BB MODIFIER LETTER TURNED COMMA (ʻ) might be 7255 typed instead as U+2018 LEFT SINGLE QUOTATION MARK 7256 (‘).</li> 7257 <li>U+02BC MODIFIER LETTER APOSTROPHE (ʼ) might be typed 7258 instead as U+2019 RIGHT SINGLE QUOTATION MARK (’), U+0027 7259 APOSTROPHE, etc.</li> 7260 <li>U+05F3 HEBREW PUNCTUATION GERESH (׳) might be typed 7261 instead as U+0027 APOSTROPHE.</li> 7262 </ul> 7263 </li> 7264 <li>Apply mappings particular to the domain (i.e., for dates 7265 or for numbers, discussed in more detail below)</li> 7266 <li>Apply case folding (possibly including language-specific 7267 mappings such as Turkish i)</li> 7268 <li>Normalize to NFKC; thus <i>no-break space</i> will map to 7269 <i>space</i>; half-width <i>katakana</i> will map to 7270 full-width.</li> 7271 </ul> 7272 <p>Loose matching involves (logically) applying the above 7273 transform to both the input text and to each of the field 7274 elements used in matching, before applying the specific 7275 heuristics below. For example, if the input number text is " - 7276 NA f. 1,000.00", then it is mapped to "-naf1,000.00" before 7277 processing. The currency signs are also transformed, so "NA f." 7278 is converted to "naf" for purposes of matching. As with other 7279 Unicode algorithms, this is a logical statement of the process; 7280 actual implementations can optimize, such as by applying the 7281 transform incrementally during matching.</p> 7282 <h3><a name="Invalid_Patterns" href="#Invalid_Patterns" id= 7283 "Invalid_Patterns">7.2 Handling Invalid Patterns</a></h3> 7284 <p>Processes sometimes encounter invalid number or date 7285 patterns, such as a number pattern with “¤¤¤¤¤” (valid pattern 7286 character but invalid length in current CLDR), a date pattern 7287 with “nn” (invalid pattern character in current CLDR), or a 7288 date pattern with “MMMMMM” (invalid length in current CLDR). 7289 The recommended behavior for handling such an invalid pattern 7290 field is:</p> 7291 <ul> 7292 <li>For a field using a currently-invalid length for a valid 7293 pattern character: 7294 <ul> 7295 <li>In <strong>formatting,</strong> emit U+FFFD 7296 REPLACEMENT CHARACTER for the invalid field.</li> 7297 <li>In <strong>parsing,</strong> the field may be parsed 7298 as if it had a valid length.</li> 7299 </ul> 7300 </li> 7301 <li>For a pattern that contains a currently-invalid pattern 7302 character (applies only to date patterns, for which A-Za-z 7303 are reserved as pattern characters but not all defined as 7304 valid): 7305 <ul> 7306 <li>Produce an error (set an error code or throw an 7307 exception) when an attempt is made to create a formatter 7308 with such a pattern or to apply such a pattern to an 7309 existing formatter.</li> 7310 </ul> 7311 </li> 7312 </ul> 7313 <h2><a name="Deprecated_Structure" href="#Deprecated_Structure" 7314 id="Deprecated_Structure">Annex A Deprecated Structure</a></h2> 7315 <p>The <a href="#DTD_Annotations">DTD Annotations</a> in Section 5.7 are used to determine whether elements, attributes, or attribute values are deprecated.</p> 7316 <p>While valid LDML, they are strongly 7317 discouraged, and no longer used in CLDR.</p> 7318 <p>The remainder of this section describes selected cases of 7319 deprecated structure that were present in previous versions of 7320 CLDR.</p> 7321 <h3><a name="Fallback_Elements" href="#Fallback_Elements" id= 7322 "Fallback_Elements">A.1 Element fallback</a></h3> 7323 <p class="dtd"><!ELEMENT fallback (#PCDATA) ></p> 7324 <p>The fallback element is deprecated. Implementations should 7325 use instead the information in <em><a href= 7326 "#LanguageMatching">Section 4.4 Language Matching</a></em> for 7327 doing language fallback.</p> 7328 <h3><a name="BCP47_Keyword_Mapping" href= 7329 "#BCP47_Keyword_Mapping" id="BCP47_Keyword_Mapping">A.2 BCP 47 7330 Keyword Mapping</a></h3> 7331 <p><b>Note:</b> <i>This structure is deprecated and replaced 7332 with <a href="#Unicode_Locale_Extension_Data_Files">Section 7333 3.6.4 U Extension Data Files</a>.</i></p> 7334 <p class="dtd"><!ELEMENT bcp47KeywordMappings ( mapKeys?, 7335 mapTypes* ) ><br> 7336 <!ELEMENT mapKeys ( keyMap* ) ><br> 7337 <!ELEMENT keyMap EMPTY ><br> 7338 <!ATTLIST keyMap type NMTOKEN #REQUIRED ><br> 7339 <!ATTLIST keyMap bcp47 NMTOKEN #REQUIRED ><br> 7340 <!ELEMENT mapTypes ( typeMap* ) ><br> 7341 <!ATTLIST mapTypes type NMTOKEN #REQUIRED ><br> 7342 <!ELEMENT typeMap EMPTY ><br> 7343 <!ATTLIST typeMap type CDATA #REQUIRED ><br> 7344 <!ATTLIST typeMap bcp47 NMTOKEN #REQUIRED ><br></p> 7345 <p>This section defines mappings between old Unicode locale 7346 identifier key/type values and their BCP 47 'u' extension 7347 subtag representations. The 'u' extension syntax described in 7348 <a href="#u_Extension">Section 3.6 Unicode BCP 47 U 7349 Extension</a> restricts a key to two ASCII alphanumerics and a 7350 type to three to eight ASCII alphanumerics. A key or a type 7351 which does not meet that syntax requirement is converted 7352 according to the mapping data defined by the mapKeys or 7353 mapTypes elements. For example, a keyword "collation=phonebook" 7354 is converted to BCP 47 'u' extension subtags "co-phonebk" by 7355 the mapping data below:</p> 7356 <pre> <mapKeys> 7357 ... 7358 <keyMap type="collation" bcp47="co"/> 7359 ... 7360 </mapKeys> 7361 <mapTypes type="collation"> 7362 ... 7363 <typeMap type="phonebook" bcp47="phonebk"/> 7364 ... 7365 </mapTypes> 7366 </pre> 7367 <h3><a name="Choice_Patterns" href="#Choice_Patterns" id= 7368 "Choice_Patterns">A.3 Choice Patterns</a></h3> 7369 <p><b>Note:</b> <i>This structure is deprecated and replaced 7370 with count attributes.</i></p> 7371 <p>A choice pattern is a string that chooses among a number of 7372 strings, based on numeric value. It has the following form:</p> 7373 <p><choice_pattern> = <choice> ( '|' <choice> 7374 )*<br> 7375 <choice> = 7376 <number><relation><string><br> 7377 <number> = ('+' | '-')? (<font size="3">'∞' | [0-9]+ ('.' 7378 [0-9]+)?)<br> 7379 <relation> = '<' | '</font> <span style= 7380 "color: blue">≤'</span></p> 7381 <p>The interpretation of a choice pattern is that given a 7382 number N, the pattern is scanned from right to left, for each 7383 choice evaluating <number> <relation> N. The first 7384 choice that matches results in the corresponding string. If no 7385 match is found, then the first string is used. For example:</p> 7386 <table border="1" cellpadding="0" cellspacing="0"> 7387 <tr> 7388 <td width="33%">Pattern</td> 7389 <td width="33%">N</td> 7390 <td width="34%">Result</td> 7391 </tr> 7392 <tr> 7393 <td width="33%" rowspan="4">0≤Rf|1≤Ru|1<Re</td> 7394 <td width="33%">-<font size="3">∞,</font> -3, -1, 7395 -0.000001</td> 7396 <td width="34%">Rf (defaulted to first string)</td> 7397 </tr> 7398 <tr> 7399 <td width="33%">0, 0.01, 0.9999</td> 7400 <td width="34%">Rf</td> 7401 </tr> 7402 <tr> 7403 <td width="33%">1</td> 7404 <td width="34%">Ru</td> 7405 </tr> 7406 <tr> 7407 <td width="33%">1.00001, 5, 99, <font size= 7408 "3">∞</font></td> 7409 <td width="34%">Re</td> 7410 </tr> 7411 </table> 7412 <p>Quoting is done using ' characters, as in date or number 7413 formats.</p> 7414 <h3><a name="Element_default" href="#Element_default" id= 7415 "Element_default">A.4 Element default</a></h3> 7416 <p><b>Note:</b> <i>This structure is deprecated.</i> Use 7417 replacement structure instead, for example:</p> 7418 <ul> 7419 <li>For <collations>, now use the 7420 <defaultCollation> element.</li> 7421 <li>For <calendars>, the default calendar type for a 7422 locale is now specified by <i><a href= 7423 "tr35-dates.html#Calendar_Preference_Data">Calendar 7424 Preference Data</a></i>.</li> 7425 </ul> 7426 <p>In some cases, a number of elements are present. The default 7427 element can be used to indicate which of them is the default, 7428 in the absence of other information. The value of the choice 7429 attribute is to match the value of the type attribute for the 7430 selected item.</p> 7431 <pre><timeFormats> 7432 <default choice="<span style="color: red">medium</span>" /> 7433 <timeFormatLength type="<span style= 7434"color: blue">full</span>"> 7435 <timeFormat type="<span style= 7436"color: blue">standard</span>"> 7437 <pattern type="<span style= 7438"color: blue">standard</span>"><span style= 7439"color: blue">h:mm:ss a z</span></pattern> 7440 </timeFormat> 7441 </timeFormatLength> 7442 <timeFormatLength type="<span style= 7443"color: blue">long</span>"> 7444 <timeFormat type="<span style= 7445"color: blue">standard</span>"> 7446 <pattern type="<span style= 7447"color: blue">standard</span>"><span style= 7448"color: blue">h:mm:ss a z</span></pattern> 7449 </timeFormat> 7450 </timeFormatLength> 7451 <timeFormatLength type="<span style= 7452"color: red">medium</span>"> 7453 <timeFormat type="<span style= 7454"color: blue">standard</span>"> 7455 <pattern type="<span style= 7456"color: blue">standard</span>"><span style= 7457"color: blue">h:mm:ss a</span></pattern> 7458 </timeFormat> 7459 </timeFormatLength> 7460...</pre> 7461 <p>Like all other elements, the <default> element is 7462 inherited. Thus, it can also refer to inherited resources. For 7463 example, suppose that the above resources are present in fr, 7464 and that in fr_BE we have the following:</p> 7465 <pre><timeFormats> 7466 <default choice="<span style="color: red">long</span>"/> 7467</timeFormats></pre> 7468 <p>In that case, the default time format for fr_BE would be the 7469 inherited "long" resource from fr. Now suppose that we had in 7470 fr_CA:</p> 7471 <pre> <timeFormatLength type="<span style= 7472 "color: red">medium</span>"> 7473 <timeFormat type="<span style= 7474"color: blue">standard</span>"> 7475 <pattern type="<span style= 7476"color: blue">standard</span>"><span style= 7477"color: blue">...</span></pattern> 7478 </timeFormat> 7479 </timeFormatLength> 7480 </pre> 7481 <p>In this case, the <default> is inherited from fr, and 7482 has the value "medium". It thus refers to this new "medium" 7483 pattern in this resource bundle.</p> 7484 <h3><a name="Deprecated_Common_Attributes" href= 7485 "#Deprecated_Common_Attributes" id= 7486 "Deprecated_Common_Attributes">A.5 Deprecated Common 7487 Attributes</a></h3> 7488 <h4><a name="Attribute_standard" href="#Attribute_standard" id= 7489 "Attribute_standard">A.5.1 Attribute standard</a></h4> 7490 <p class="element2"><b>Note:</b> This attribute is deprecated. 7491 Instead, use a reference element with the attribute 7492 standard="true".</p> 7493 <p>The value of this attribute is a list of strings 7494 representing standards: international, national, organization, 7495 or vendor standards. The presence of this attribute indicates 7496 that the data in this element is compliant with the indicated 7497 standards. Where possible, for uniqueness, the string should be 7498 a URL that represents that standard. The strings are separated 7499 by commas; leading or trailing spaces on each string are not 7500 significant. Examples:</p> 7501 <p><code><collation standard="<span style="color: blue">MSA 7502 200:2002</span>"><br> 7503 ...<br> 7504 <dateFormatStyle 7505 standard=”https://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=26780&amp;ICS1=1&amp;ICS2=140&amp;ICS3=30”></code></p> 7506 <h4><a name="Attribute_draft_nonLeaf" href= 7507 "#Attribute_draft_nonLeaf" id="Attribute_draft_nonLeaf">A.5.2 7508 Attribute draft in non-leaf elements</a></h4> 7509 <p>The draft attribute is deprecated except in leaf elements 7510 (elements that do not have any subelements)</p> 7511 <h3><a name="Element_base" href="#Element_base" id= 7512 "Element_base">A.6 Element base</a></h3> 7513 <p><b>Note:</b> <i>This element is deprecated.</i> Use the 7514 collation <import> element instead.</p> 7515 <p>The optional base element <code><base><span style= 7516 "color: blue">...</span></base></code> , contains an 7517 alias element that points to another data source that defines a 7518 <i>base</i> collation. If present, it indicates that the 7519 settings and rules in the collation are modifications applied 7520 on <i>top of the</i> respective elements in the base collation. 7521 That is, any successive settings, where present, override what 7522 is in the base as described in <a href= 7523 "tr35-collation.html#Setting_Options">Setting Options</a>. Any 7524 successive rules are concatenated to the end of the rules in 7525 the base. The results of multiple rules applying to the same 7526 characters is covered in <a href= 7527 "tr35-collation.html#Orderings">Orderings</a>.</p> 7528 <h3><a name="Element_rules" href="#Element_rules" id= 7529 "Element_rules">A.7 Element rules</a></h3> 7530 <p><b>Note:</b> <i>The XML collation syntax is deprecated; this 7531 includes the <rules> element and its subelements, except 7532 that the <import> element has been moved up to be a 7533 subelement of <collation>.</i> Use the basic collation 7534 syntax with the <a href="tr35-collation.html#Rules"><cr> 7535 element</a> instead.</p> 7536 <p class="dtd"><!ELEMENT rules (alias | ( ( reset | import 7537 ), ( reset | import | p | pc | s | sc | t | tc | i | ic | x)* 7538 )) ></p> 7539 <h3><a name="Deprecated_subelements_of_dates" href= 7540 "#Deprecated_subelements_of_dates" id= 7541 "Deprecated_subelements_of_dates">A.8 Deprecated subelements of 7542 <dates></a></h3> 7543 <ul> 7544 <li><localizedPatternChars></li> 7545 <li><dateRangePattern>, replaced by 7546 <intervalFormats>.</li> 7547 </ul> 7548 <h3><a name="Deprecated_subelements_of_calendars" href= 7549 "#Deprecated_subelements_of_calendars" id= 7550 "Deprecated_subelements_of_calendars">A.9 Deprecated 7551 subelements of <calendars></a></h3> 7552 <ul> 7553 <li><monthNames> and <monthAbbr>; month name 7554 forms are specified in the <months> element. The older 7555 monthNames, monthAbbr are equivalent to: using the months 7556 element with the context type="<span style= 7557 "color: blue">format</span>" and the width type="<span style= 7558 "color: blue">wide</span>" (for ...Names) and 7559 type="<span style="color: blue">narrow</span>" (for ...Abbr), 7560 respectively.</li> 7561 <li><dayNames> and <dayAbbr>; weekday name forms 7562 are specified in the <days> element. The older 7563 dayNames, dayAbbr are equivalent to: using the days element 7564 with the context type="<span style= 7565 "color: blue">format</span>" and the width type="<span style= 7566 "color: blue">wide</span>" (for ...Names) and 7567 type="<span style="color: blue">narrow</span>" (for ...Abbr), 7568 respectively.</li> 7569 <li><a name="week" href="#week" id="week"><week></a> is 7570 deprecated in the main LDML files, because the data is more 7571 appropriately organized as connected to territories, not to 7572 linguistic data. Use the supplemental <weekData> 7573 element instead.</li> 7574 <li><am> and <pm>; these are now included as part 7575 of the <dayPeriods> element</li> 7576 <li><fields> is deprecated as a subelement of 7577 <calendars> instead, a <fields> element should be 7578 located just under a <dates> element. See <a href= 7579 "tr35-dates.html#Calendar_Fields">Calendar Fields</a>.</li> 7580 </ul> 7581 <h3><a name="Deprecated_subelements_of_timeZoneNames" href= 7582 "#Deprecated_subelements_of_timeZoneNames" id= 7583 "Deprecated_subelements_of_timeZoneNames">A.10 Deprecated 7584 subelements of <timeZoneNames></a></h3> 7585 <ul> 7586 <li><hoursFormat> e.g. "{0}/{1}" for "-0800/-0700"</li> 7587 <li><a name="fallbackRegionFormat" href= 7588 "#fallbackRegionFormat" id= 7589 "fallbackRegionFormat"><fallbackRegionFormat></a> 7590 (deprecated), e.g. "{0} Time ({1})" for "United States 7591 Time (New York)"</li> 7592 <li><abbreviationFallback></li> 7593 <li><preferenceOrdering>, a preference ordering among 7594 modern zones; use metazones instead.</li> 7595 <li><singleCountries>, use <a href= 7596 "tr35-dates.html#Primary_Zones">Primary Zones</a></li> 7597 </ul> 7598 <h3><a name="Deprecated_subelements_of_zone_metazone" href= 7599 "#Deprecated_subelements_of_zone_metazone" id= 7600 "Deprecated_subelements_of_zone_metazone">A.11 Deprecated 7601 subelements of <zone> and <metazone></a></h3> 7602 <ul> 7603 <li><commonlyUsed>, formerly used to indicate whether a 7604 zone was commonly used in the locale.</li> 7605 </ul> 7606 <h3><a name= 7607 "Renamed_attribute_values_for_contextTransformUsage" href= 7608 "#Renamed_attribute_values_for_contextTransformUsage" id= 7609 "Renamed_attribute_values_for_contextTransformUsage">A.12 7610 Renamed attribute values for <contextTransformUsage> 7611 element</a></h3> 7612 <p>The <contextTransformUsage> element was introduced in 7613 CLDR 21. The values for its <em>type</em> attribute are 7614 documented in <a href= 7615 "tr35-general.html#contextTransformUsage_type_attribute_values"> 7616 <contextTransformUsage> type attribute values</a>. In 7617 CLDR 25, some of these values were renamed from their previous 7618 values for improved clarity:</p> 7619 <ul> 7620 <li>"type" was renamed to "keyValue"</li> 7621 <li>"displayName" was renamed to "currencyName"</li> 7622 <li>"displayName-count" was renamed to 7623 "currencyName-count"</li> 7624 <li>"tense" was renamed to "relative"</li> 7625 </ul> 7626 <h3><a name="Deprecated_subelements_of_segmentations" href= 7627 "#Deprecated_subelements_of_segmentations" id= 7628 "Deprecated_subelements_of_segmentations">A.13 Deprecated 7629 subelements of <segmentations></a></h3> 7630 <ul> 7631 <li><exceptions> and <exceptions> were deprecated 7632 and replaced with <suppressions> and 7633 <suppression>.</li> 7634 </ul> 7635 <h3><a name="Element_cp" href="#Element_cp" id= 7636 "Element_cp">A.14 Element cp</a></h3> 7637 <p>The cp element was used to escape characters that cannot be 7638 represented in XML, even with NCRs. These escapes were only 7639 allowed in certain elements, according to the DTD.</p> 7640 <p>However, this mechanism is very clumsy, and was replaced by 7641 specialized syntax.</p> 7642 <table> 7643 <tr> 7644 <th>Code Point</th> 7645 <th>XML Example</th> 7646 </tr> 7647 <tr> 7648 <td><code>U+0000</code></td> 7649 <td><code><cp hex="0"></code></td> 7650 </tr> 7651 </table> 7652 <p> </p> 7653 <h3><a name="validSubLocales" href="#validSubLocales" id= 7654 "validSubLocales">A.15 Attribute validSubLocales</a></h3> 7655 <p>The attribute <i>validSubLocales</i> allowed sublocales in a 7656 given tree to be treated as though a file for them were present 7657 when there was not one. It only had an effect for locales that 7658 inherit from the current file where a file is missing.</p> 7659 <p><b>Example 1.</b> Suppose that in a particular LDML tree, 7660 there are no region locales for German, for example, there is a 7661 de.xml file, but no files for de_AT.xml, de_CH.xml, or 7662 de_DE.xml. Then no elements are valid for any of those region 7663 locales. If we want to mark one of those files as having valid 7664 elements, then we introduce an empty file, such as the 7665 following.</p> 7666 <p><code><ldml version="1.1"><br> 7667 <identity><br> 7668 <version number="1.1" /><br> 7669 <language type="de" /><br> 7670 <territory type="AT" /><br> 7671 </identity><br> 7672 </ldml></code></p> 7673 <p>With the <i>validSubLocales</i> attribute, instead of adding 7674 the empty files for de_AT.xml, de_CH.xml, and de_DE.xml, in the 7675 de file we could add to the parent locale a list of the child 7676 locales that should behave as if files were present.</p> 7677 <p><code><ldml version="1.1" validSubLocales="de_AT de_CH 7678 de_DE"><br> 7679 <identity><br> 7680 <version number="1.1" /><br> 7681 <language type="de" /><br> 7682 </identity><br> 7683 ...<br> 7684 </ldml></code></p> 7685 <p>Now that the <i>validSubLocales</i> attribute has been 7686 deprecated, it is recommended to simply add empty files to 7687 specify which sublocales are valid. This convention is used 7688 throughout the CLDR.</p> 7689 <h3><a name="postCodeElements" href="#postCodeElements" id= 7690 "postCodeElements">A.16 Elements postalCodeData, 7691 postCodeRegex</a></h3> 7692 <p>The postal code validation data has been deprecated. Please 7693 see other services that are kept up to date, such as:</p> 7694 <ul> 7695 <li><a href= 7696 "https://i18napis.appspot.com/address/data/US">https://i18napis.appspot.com/address/data/US</a></li> 7697 <li><a href= 7698 "https://i18napis.appspot.com/address/data/CH">https://i18napis.appspot.com/address/data/CH</a></li> 7699 <li>...</li> 7700 </ul> 7701 <p>See <a href="tr35-info.html#Postal_Code_Validation">Postal 7702 Code Validation</a></p> 7703 <h3><a name="telephoneCodeData" href="#telephoneCodeData" id= 7704 "telephoneCodeData">A.17 Element telephoneCodeData</a></h3> 7705 <p>The element <telephoneCodeData> and its subelements 7706 have been deprecated and the data removed.</p> 7707 <hr> 7708 <h2><a name="Links_to_Other_Parts" href="#Links_to_Other_Parts" 7709 id="Links_to_Other_Parts">Annex B Links to Other Parts</a></h2> 7710 <p>The LDML specification is split into several <a href= 7711 "#Parts">parts</a> by topic, with one HTML document per part. 7712 The following tables provide redirects for links to specific 7713 topics. Please update your links and bookmarks.</p> 7714 <p>Part 1 Links: Core (this document): No redirects needed.</p> 7715 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 7716 <caption> 7717 <a href="#Part_2_Links" name="Part_2_Links" id= 7718 "Part_2_Links">Part 2 Links</a>: <a href= 7719 "tr35-general.html">General</a> (display names & 7720 transforms, etc.) 7721 </caption> 7722 <tr> 7723 <th>Old section</th> 7724 <th>Section in new part</th> 7725 </tr> 7726 <tr> 7727 <td>5.4 <a name="Display_Name_Elements" href= 7728 "#Display_Name_Elements" id="Display_Name_Elements">Display 7729 Name Elements</a></td> 7730 <td>1 <a href= 7731 "tr35-general.html#Display_Name_Elements">Display Name 7732 Elements</a></td> 7733 </tr> 7734 <tr> 7735 <td>5.5 <a name="Layout_Elements" href="#Layout_Elements" 7736 id="Layout_Elements">Layout Elements</a></td> 7737 <td>2 <a href="tr35-general.html#Layout_Elements">Layout 7738 Elements</a></td> 7739 </tr> 7740 <tr> 7741 <td>5.6 <a name="Character_Elements" href= 7742 "#Character_Elements" id="Character_Elements">Character 7743 Elements</a></td> 7744 <td>3 <a href= 7745 "tr35-general.html#Character_Elements">Character 7746 Elements</a></td> 7747 </tr> 7748 <tr> 7749 <td>5.6.1 <a name="ExemplarSyntax" href="#ExemplarSyntax" 7750 id="ExemplarSyntax">Exemplar Syntax</a></td> 7751 <td>3.1 <a href="tr35-general.html#ExemplarSyntax">Exemplar 7752 Syntax</a></td> 7753 </tr> 7754 <tr> 7755 <td>5.6.2 Restrictions</td> 7756 <td>3.1 <a href="tr35-general.html#ExemplarSyntax">Exemplar 7757 Syntax</a></td> 7758 </tr> 7759 <tr> 7760 <td>5.6.3 Mapping</td> 7761 <td>3.2 <a href= 7762 "tr35-general.html#Character_Mapping">Mapping</a></td> 7763 </tr> 7764 <tr> 7765 <td>5.6.4 <a name="IndexLabels" href="#IndexLabels" id= 7766 "IndexLabels">Index Labels</a></td> 7767 <td>3.3 <a href="tr35-general.html#IndexLabels">Index 7768 Labels</a></td> 7769 </tr> 7770 <tr> 7771 <td>5.6.5 Ellipsis</td> 7772 <td>3.4 <a href= 7773 "tr35-general.html#Ellipsis">Ellipsis</a></td> 7774 </tr> 7775 <tr> 7776 <td>5.6.6 More Information</td> 7777 <td>3.5 <a href= 7778 "tr35-general.html#Character_More_Info">More 7779 Information</a></td> 7780 </tr> 7781 <tr> 7782 <td>5.7 <a name="Delimiter_Elements" href= 7783 "#Delimiter_Elements" id="Delimiter_Elements">Delimiter 7784 Elements</a></td> 7785 <td>4 <a href= 7786 "tr35-general.html#Delimiter_Elements">Delimiter 7787 Elements</a></td> 7788 </tr> 7789 <tr> 7790 <td>C.6 <a name="Measurement_System_Data" href= 7791 "#Measurement_System_Data" id= 7792 "Measurement_System_Data">Measurement System Data</a></td> 7793 <td>5 <a href= 7794 "tr35-general.html#Measurement_System_Data">Measurement 7795 System Data</a></td> 7796 </tr> 7797 <tr> 7798 <td>5.8 <a name="Measurement_Elements" href= 7799 "#Measurement_Elements" id= 7800 "Measurement_Elements">Measurement Elements 7801 (deprecated)</a></td> 7802 <td>5.1 <a href= 7803 "tr35-general.html#Measurement_Elements">Measurement 7804 Elements (deprecated)</a></td> 7805 </tr> 7806 <tr> 7807 <td>5.11 <a name="Unit_Elements" href="#Unit_Elements" id= 7808 "Unit_Elements">Unit Elements</a></td> 7809 <td>6 <a href="tr35-general.html#Unit_Elements">Unit 7810 Elements</a></td> 7811 </tr> 7812 <tr> 7813 <td>5.12 <a name="POSIX_Elements" href="#POSIX_Elements" 7814 id="POSIX_Elements">POSIX Elements</a></td> 7815 <td>7 <a href="tr35-general.html#POSIX_Elements">POSIX 7816 Elements</a></td> 7817 </tr> 7818 <tr> 7819 <td>5.13 <a name="Reference_Elements" href= 7820 "#Reference_Elements" id="Reference_Elements">Reference 7821 Element</a></td> 7822 <td>8 <a href= 7823 "tr35-general.html#Reference_Elements">Reference 7824 Element</a></td> 7825 </tr> 7826 <tr> 7827 <td>5.15 <a name="Segmentations" href="#Segmentations" id= 7828 "Segmentations">Segmentations</a></td> 7829 <td>9 <a href= 7830 "tr35-general.html#Segmentations">Segmentations</a></td> 7831 </tr> 7832 <tr> 7833 <td>5.15.1 <a name="Segmentation_Inheritance" href= 7834 "#Segmentation_Inheritance" id= 7835 "Segmentation_Inheritance">Segmentation 7836 Inheritance</a></td> 7837 <td>9.1 <a href= 7838 "tr35-general.html#Segmentation_Inheritance">Segmentation 7839 Inheritance</a></td> 7840 </tr> 7841 <tr> 7842 <td>5.16 <a name="Transforms" href="#Transforms" id= 7843 "Transforms">Transforms</a></td> 7844 <td>10 <a href= 7845 "tr35-general.html#Transforms">Transforms</a></td> 7846 </tr> 7847 <tr> 7848 <td>N <a name="Transform_Rules" href="#Transform_Rules" id= 7849 "Transform_Rules">Transform Rules</a></td> 7850 <td>10.3 <a href= 7851 "tr35-general.html#Transform_Rules_Syntax">Transform Rules 7852 Syntax</a></td> 7853 </tr> 7854 <tr> 7855 <td>5.18 <a name="ListPatterns" href="#ListPatterns" id= 7856 "ListPatterns">List Patterns</a></td> 7857 <td>11 <a href="tr35-general.html#ListPatterns">List 7858 Patterns</a></td> 7859 </tr> 7860 <tr> 7861 <td>C.20 <a name="List_Gender" href="#List_Gender" id= 7862 "List_Gender">Gender of Lists</a></td> 7863 <td>11.1 <a href="tr35-general.html#List_Gender">Gender of 7864 Lists</a></td> 7865 </tr> 7866 <tr> 7867 <td>5.19 <a name="Context_Transform_Elements" href= 7868 "#Context_Transform_Elements" id= 7869 "Context_Transform_Elements">ContextTransform 7870 Elements</a></td> 7871 <td>12 <a href= 7872 "tr35-general.html#Context_Transform_Elements">ContextTransform 7873 Elements</a></td> 7874 </tr> 7875 <tr> 7876 <td></td> 7877 <td><a href="tr35-general.html#"></a></td> 7878 </tr> 7879 </table> 7880 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 7881 <caption> 7882 <a href="#Part_3_Links" name="Part_3_Links" id= 7883 "Part_3_Links">Part 3 Links</a>: <a href= 7884 "tr35-numbers.html">Numbers</a> (number & currency 7885 formatting) 7886 </caption> 7887 <tr> 7888 <th>Old section</th> 7889 <th>Section in new part</th> 7890 </tr> 7891 <tr> 7892 <td>C.13 <a name="Numbering_Systems" href= 7893 "#Numbering_Systems" id="Numbering_Systems">Numbering 7894 Systems</a></td> 7895 <td>1 <a href= 7896 "tr35-numbers.html#Numbering_Systems">Numbering 7897 Systems</a></td> 7898 </tr> 7899 <tr> 7900 <td>5.10 <a name="Number_Elements" href="#Number_Elements" 7901 id="Number_Elements">Number Elements</a></td> 7902 <td>2 <a href="tr35-numbers.html#Number_Elements">Number 7903 Elements</a></td> 7904 </tr> 7905 <tr> 7906 <td>5.10.1 <a name="Number_Symbols" href="#Number_Symbols" 7907 id="Number_Symbols">Number Symbols</a></td> 7908 <td>2.3 <a href="tr35-numbers.html#Number_Symbols">Number 7909 Symbols</a></td> 7910 </tr> 7911 <tr> 7912 <td>G <a name="Number_Format_Patterns" href= 7913 "#Number_Format_Patterns" id= 7914 "Number_Format_Patterns">Number Format Patterns</a></td> 7915 <td>3 <a href= 7916 "tr35-numbers.html#Number_Format_Patterns">Number Format 7917 Patterns</a></td> 7918 </tr> 7919 <tr> 7920 <td>5.10.2 <a name="Currencies" href="#Currencies" id= 7921 "Currencies">Currencies</a></td> 7922 <td>4 <a href= 7923 "tr35-numbers.html#Currencies">Currencies</a></td> 7924 </tr> 7925 <tr> 7926 <td>C.1 <a name="Supplemental_Currency_Data" href= 7927 "#Supplemental_Currency_Data" id= 7928 "Supplemental_Currency_Data">Supplemental Currency 7929 Data</a></td> 7930 <td>4.1 <a href= 7931 "tr35-numbers.html#Supplemental_Currency_Data">Supplemental 7932 Currency Data</a></td> 7933 </tr> 7934 <tr> 7935 <td>C.11 <a name="Language_Plural_Rules" href= 7936 "#Language_Plural_Rules" id= 7937 "Language_Plural_Rules">Language Plural Rules</a></td> 7938 <td>5 <a href= 7939 "tr35-numbers.html#Language_Plural_Rules">Language Plural 7940 Rules</a></td> 7941 </tr> 7942 <tr> 7943 <td>5.17 <a name="Rule-Based_Number_Formatting" href= 7944 "#Rule-Based_Number_Formatting" id= 7945 "Rule-Based_Number_Formatting">Rule-Based Number 7946 Formatting</a></td> 7947 <td>6 <a href= 7948 "tr35-numbers.html#Rule-Based_Number_Formatting">Rule-Based 7949 Number Formatting</a></td> 7950 </tr> 7951 </table> 7952 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 7953 <caption> 7954 <a href="#Part_4_Links" name="Part_4_Links" id= 7955 "Part_4_Links">Part 4 Links</a>: <a href= 7956 "tr35-dates.html">Dates</a> (date, time, time zone 7957 formatting) 7958 </caption> 7959 <tr> 7960 <th>Old section</th> 7961 <th>Section in new part</th> 7962 </tr> 7963 <tr> 7964 <td><a name="Date_Elements" href="#Date_Elements" id= 7965 "Date_Elements">5.9 Date Elements</a></td> 7966 <td>1 <a href= 7967 "tr35-dates.html#Overview_Dates_Element_Supplemental">Overview: 7968 Dates Element, Supplemental Date and Calendar 7969 Information</a></td> 7970 </tr> 7971 <tr> 7972 <td><a name="Calendar_Elements" href="#Calendar_Elements" 7973 id="Calendar_Elements">5.9.1 Calendar Elements</a></td> 7974 <td>2 <a href="tr35-dates.html#Calendar_Elements">Calendar 7975 Elements</a></td> 7976 </tr> 7977 <tr> 7978 <td><a name="months_days_quarters_eras" href= 7979 "#months_days_quarters_eras" id= 7980 "months_days_quarters_eras">Elements months, days, 7981 quarters, eras</a></td> 7982 <td>2.1 <a href= 7983 "tr35-dates.html#months_days_quarters_eras">Elements 7984 months, days, quarters, eras</a></td> 7985 </tr> 7986 <tr> 7987 <td><a name="monthPatterns_cyclicNameSets" href= 7988 "#monthPatterns_cyclicNameSets" id= 7989 "monthPatterns_cyclicNameSets">Elements monthPatterns, 7990 cyclicNameSets</a></td> 7991 <td>2.2 <a href= 7992 "tr35-dates.html#monthPatterns_cyclicNameSets">Elements 7993 monthPatterns, cyclicNameSets</a></td> 7994 </tr> 7995 <tr> 7996 <td><a name="dayPeriods" href="#dayPeriods" id= 7997 "dayPeriods">Element dayPeriods</a></td> 7998 <td>2.3 <a href="tr35-dates.html#dayPeriods">Element 7999 dayPeriods</a></td> 8000 </tr> 8001 <tr> 8002 <td><a name="dateFormats" href="#dateFormats" id= 8003 "dateFormats">Element dateFormats</a></td> 8004 <td>2.4 <a href="tr35-dates.html#dateFormats">Element 8005 dateFormats</a></td> 8006 </tr> 8007 <tr> 8008 <td><a name="timeFormats" href="#timeFormats" id= 8009 "timeFormats">Element timeFormats</a></td> 8010 <td>2.5 <a href="tr35-dates.html#timeFormats">Element 8011 timeFormats</a></td> 8012 </tr> 8013 <tr> 8014 <td><a name="dateTimeFormats" href="#dateTimeFormats" id= 8015 "dateTimeFormats">Element dateTimeFormats</a></td> 8016 <td>2.6 <a href="tr35-dates.html#dateTimeFormats">Element 8017 dateTimeFormats</a></td> 8018 </tr> 8019 <tr> 8020 <td><a name="Calendar_Fields" href="#Calendar_Fields" id= 8021 "Calendar_Fields">5.9.2 Calendar Fields</a></td> 8022 <td>3 <a href="tr35-dates.html#Calendar_Fields">Calendar 8023 Fields</a></td> 8024 </tr> 8025 <tr> 8026 <td>5.9.3 <a name="Timezone_Names" href="#Timezone_Names" 8027 id="Timezone_Names">Time Zone Names</a></td> 8028 <td>5 <a href="tr35-dates.html#Time_Zone_Names">Time Zone 8029 Names</a></td> 8030 </tr> 8031 <tr> 8032 <td><a name="Supplemental_Calendar_Data" href= 8033 "#Supplemental_Calendar_Data" id= 8034 "Supplemental_Calendar_Data">C.5 Supplemental Calendar 8035 Data</a></td> 8036 <td>4 <a href= 8037 "tr35-dates.html#Supplemental_Calendar_Data">Supplemental 8038 Calendar Data</a></td> 8039 </tr> 8040 <tr> 8041 <td><a name="Supplemental_Timezone_Data" href= 8042 "#Supplemental_Timezone_Data" id= 8043 "Supplemental_Timezone_Data">C.7 Supplemental Time Zone 8044 Data</a></td> 8045 <td>6 <a href= 8046 "tr35-dates.html#Supplemental_Time_Zone_Data">Supplemental 8047 Time Zone Data</a></td> 8048 </tr> 8049 <tr> 8050 <td><a name="Calendar_Preference_Data" href= 8051 "#Calendar_Preference_Data" id= 8052 "Calendar_Preference_Data">C.15 Calendar Preference 8053 Data</a></td> 8054 <td>4.2 <a href= 8055 "tr35-dates.html#Calendar_Preference_Data">Calendar 8056 Preference Data</a></td> 8057 </tr> 8058 <tr> 8059 <td><a name="DayPeriodRules" href="#DayPeriodRules" id= 8060 "DayPeriodRules">C.17 DayPeriod Rules</a></td> 8061 <td>4.5 <a href="tr35-dates.html#Day_Period_Rules">Day 8062 Period Rules</a></td> 8063 </tr> 8064 <tr> 8065 <td><a name="Date_Format_Patterns" href= 8066 "#Date_Format_Patterns" id="Date_Format_Patterns">Appendix 8067 F: Date Format Patterns</a></td> 8068 <td>8 <a href="tr35-dates.html#Date_Format_Patterns">Date 8069 Format Patterns</a></td> 8070 </tr> 8071 <tr> 8072 <td><a name="Date_Field_Symbol_Table" href= 8073 "#Date_Field_Symbol_Table" id= 8074 "Date_Field_Symbol_Table">Date Field Symbol Table</a></td> 8075 <td><a href="tr35-dates.html#Date_Field_Symbol_Table">Date 8076 Field Symbol Table</a></td> 8077 </tr> 8078 <tr> 8079 <td><a name="Localized_Pattern_Characters" href= 8080 "#Localized_Pattern_Characters" id= 8081 "Localized_Pattern_Characters">F.1 Localized Pattern 8082 Characters (deprecated)</a></td> 8083 <td>8.1 <a href= 8084 "tr35-dates.html#Localized_Pattern_Characters">Localized 8085 Pattern Characters (deprecated)</a></td> 8086 </tr> 8087 <tr> 8088 <td><a name="Time_Zone_Fallback" href="#Time_Zone_Fallback" 8089 id="Time_Zone_Fallback">Appendix J: Time Zone Display 8090 Names</a></td> 8091 <td>7 <a href="tr35-dates.html#Using_Time_Zone_Names">Using 8092 Time Zone Names</a></td> 8093 </tr> 8094 <tr> 8095 <td><a name="fallbackFormat" href="#fallbackFormat" id= 8096 "fallbackFormat"><b>fallbackFormat</b>:</a></td> 8097 <td><a href= 8098 "tr35-dates.html#fallbackFormat"><b>fallbackFormat</b>:</a></td> 8099 </tr> 8100 <tr> 8101 <td>O.4 Parsing Dates and Times</td> 8102 <td>9 <a href="tr35-dates.html#Parsing_Dates_Times">Parsing 8103 Dates and Times</a></td> 8104 </tr> 8105 </table> 8106 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 8107 <caption> 8108 <a href="#Part_5_Links" name="Part_5_Links" id= 8109 "Part_5_Links">Part 5 Links</a>: <a href= 8110 "tr35-collation.html">Collation</a> (sorting, searching, 8111 grouping) 8112 </caption> 8113 <tr> 8114 <th>Old section</th> 8115 <th>Section in new part</th> 8116 </tr> 8117 <tr> 8118 <td>5.14 <a name="Collation_Elements" href= 8119 "#Collation_Elements" id="Collation_Elements">Collation 8120 Elements</a></td> 8121 <td>3 <a href= 8122 "tr35-collation.html#Collation_Tailorings">Collation 8123 Tailorings</a></td> 8124 </tr> 8125 <tr> 8126 <td>5.14.1 <a name="Collation_Version" href= 8127 "#Collation_Version" id= 8128 "Collation_Version">Version</a></td> 8129 <td>3.1 <a href= 8130 "tr35-collation.html#Collation_Version">Version</a></td> 8131 </tr> 8132 <tr> 8133 <td>5.14.2 <a name="Collation_Element" href= 8134 "#Collation_Element" id="Collation_Element">Collation 8135 Element</a></td> 8136 <td>3.2 <a href= 8137 "tr35-collation.html#Collation_Element">Collation 8138 Element</a></td> 8139 </tr> 8140 <tr> 8141 <td>5.14.3 <a name="Setting_Options" href= 8142 "#Setting_Options" id="Setting_Options">Setting 8143 Options</a></td> 8144 <td>3.3 <a href= 8145 "tr35-collation.html#Setting_Options">Setting 8146 Options</a></td> 8147 </tr> 8148 <tr> 8149 <td>Table <a name="Collation_Settings" href= 8150 "#Collation_Settings" id="Collation_Settings">Collation 8151 Settings</a></td> 8152 <td>Table <a href= 8153 "tr35-collation.html#Collation_Settings">Collation 8154 Settings</a></td> 8155 </tr> 8156 <tr> 8157 <td>5.14.4 <a name="Rules" href="#Rules" id= 8158 "Rules">Collation Rule Syntax</a></td> 8159 <td>3.4 <a href="tr35-collation.html#Rules">Collation Rule 8160 Syntax</a></td> 8161 </tr> 8162 <tr> 8163 <td>5.14.5 <a name="Orderings" href="#Orderings" id= 8164 "Orderings">Orderings</a></td> 8165 <td>3.5 <a href= 8166 "tr35-collation.html#Orderings">Orderings</a></td> 8167 </tr> 8168 <tr> 8169 <td>5.14.6 <a name="Contractions" href="#Contractions" id= 8170 "Contractions">Contractions</a></td> 8171 <td>3.6 <a href= 8172 "tr35-collation.html#Contractions">Contractions</a></td> 8173 </tr> 8174 <tr> 8175 <td>5.14.7 <a name="Expansions" href="#Expansions" id= 8176 "Expansions">Expansions</a></td> 8177 <td>3.7 <a href= 8178 "tr35-collation.html#Expansions">Expansions</a></td> 8179 </tr> 8180 <tr> 8181 <td>5.14.8 <a name="Context_Before" href="#Context_Before" 8182 id="Context_Before">Context Before</a></td> 8183 <td>3.8 <a href= 8184 "tr35-collation.html#Context_Before">Context 8185 Before</a></td> 8186 </tr> 8187 <tr> 8188 <td>5.14.9 <a name="Placing_Characters_Before_Others" href= 8189 "#Placing_Characters_Before_Others" id= 8190 "Placing_Characters_Before_Others">Placing Characters 8191 Before Others</a></td> 8192 <td>3.9 <a href= 8193 "tr35-collation.html#Placing_Characters_Before_Others">Placing 8194 Characters Before Others</a></td> 8195 </tr> 8196 <tr> 8197 <td>5.14.10 <a name="Logical_Reset_Positions" href= 8198 "#Logical_Reset_Positions" id= 8199 "Logical_Reset_Positions">Logical Reset Positions</a></td> 8200 <td>3.10 <a href= 8201 "tr35-collation.html#Logical_Reset_Positions">Logical Reset 8202 Positions</a></td> 8203 </tr> 8204 <tr> 8205 <td>5.14.11 <a name="Special_Purpose_Commands" href= 8206 "#Special_Purpose_Commands" id= 8207 "Special_Purpose_Commands">Special-Purpose 8208 Commands</a></td> 8209 <td>3.11 <a href= 8210 "tr35-collation.html#Special_Purpose_Commands">Special-Purpose 8211 Commands</a></td> 8212 </tr> 8213 <tr> 8214 <td>5.14.12 <a name="Script_Reordering" href= 8215 "#Script_Reordering" id="Script_Reordering">Collation 8216 Reordering</a></td> 8217 <td>3.12 <a href= 8218 "tr35-collation.html#Script_Reordering">Collation 8219 Reordering</a></td> 8220 </tr> 8221 <tr> 8222 <td>5.14.13 <a name="Case_Parameters" href= 8223 "#Case_Parameters" id="Case_Parameters">Case 8224 Parameters</a></td> 8225 <td>3.13 <a href="tr35-collation.html#Case_Parameters">Case 8226 Parameters</a></td> 8227 </tr> 8228 <tr> 8229 <td>Definition: <a name="UncasedExceptions" href= 8230 "#UncasedExceptions" id= 8231 "UncasedExceptions">UncasedExceptions</a></td> 8232 <td>removed: see 3.13 <a href= 8233 "tr35-collation.html#Case_Parameters">Case 8234 Parameters</a></td> 8235 </tr> 8236 <tr> 8237 <td>Definition: <a name="LowerExceptions" href= 8238 "#LowerExceptions" id= 8239 "LowerExceptions">LowerExceptions</a></td> 8240 <td>removed: see 3.13 <a href= 8241 "tr35-collation.html#Case_Parameters">Case 8242 Parameters</a></td> 8243 </tr> 8244 <tr> 8245 <td>Definition: <a name="UpperExceptions" href= 8246 "#UpperExceptions" id= 8247 "UpperExceptions">UpperExceptions</a></td> 8248 <td>removed: see 3.13 <a href= 8249 "tr35-collation.html#Case_Parameters">Case 8250 Parameters</a></td> 8251 </tr> 8252 <tr> 8253 <td>5.14.14 <a name="Visibility" href="#Visibility" id= 8254 "Visibility">Visibility</a></td> 8255 <td>3.14 <a href= 8256 "tr35-collation.html#Visibility">Visibility</a></td> 8257 </tr> 8258 </table> 8259 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 8260 <caption> 8261 <a href="#Part_6_Links" name="Part_6_Links" id= 8262 "Part_6_Links">Part 6 Links</a>: <a href= 8263 "tr35-info.html">Supplemental</a> (supplemental data) 8264 </caption> 8265 <tr> 8266 <th>Old section</th> 8267 <th>Section in new part</th> 8268 </tr> 8269 <tr> 8270 <td>C <a name="Supplemental_Data" href="#Supplemental_Data" 8271 id="Supplemental_Data">Supplemental Data</a></td> 8272 <td>Introduction <a href= 8273 "tr35-info.html#Supplemental_Data">Supplemental 8274 Data</a></td> 8275 </tr> 8276 <tr> 8277 <td>C.2 <a name="Supplemental_Territory_Containment" href= 8278 "#Supplemental_Territory_Containment" id= 8279 "Supplemental_Territory_Containment">Supplemental Territory 8280 Containment</a></td> 8281 <td>1.1 <a href= 8282 "tr35-info.html#Supplemental_Territory_Containment">Supplemental 8283 Territory Containment</a></td> 8284 </tr> 8285 <tr> 8286 <td>C.4 <a name="Supplemental_Territory_Information" href= 8287 "#Supplemental_Territory_Information" id= 8288 "Supplemental_Territory_Information">Supplemental Territory 8289 Information</a></td> 8290 <td>1.2 <a href= 8291 "tr35-info.html#Supplemental_Territory_Information">Supplemental 8292 Territory Information</a></td> 8293 </tr> 8294 <tr> 8295 <td>C.3 <a name="Supplemental_Language_Data" href= 8296 "#Supplemental_Language_Data" id= 8297 "Supplemental_Language_Data">Supplemental Language 8298 Data</a></td> 8299 <td>2 <a href= 8300 "tr35-info.html#Supplemental_Language_Data">Supplemental 8301 Language Data</a></td> 8302 </tr> 8303 <tr> 8304 <td>C.9 <a name="Supplemental_Code_Mapping" href= 8305 "#Supplemental_Code_Mapping" id= 8306 "Supplemental_Code_Mapping">Supplemental Code 8307 Mapping</a></td> 8308 <td>4 <a href= 8309 "tr35-info.html#Supplemental_Code_Mapping">Supplemental 8310 Code Mapping</a></td> 8311 </tr> 8312 <tr> 8313 <td>C.12 <a name="Telephone_Code_Data" href= 8314 "#Telephone_Code_Data" id="Telephone_Code_Data">Telephone 8315 Code Data</a></td> 8316 <td>5 <a href= 8317 "tr35-info.html#Telephone_Code_Data">Telephone Code 8318 Data</a></td> 8319 </tr> 8320 <tr> 8321 <td>C.14 <a name="Postal_Code_Validation" href= 8322 "#Postal_Code_Validation" id= 8323 "Postal_Code_Validation">Postal Code Validation</a></td> 8324 <td>6 <a href= 8325 "tr35-info.html#Postal_Code_Validation">Postal Code 8326 Validation</a></td> 8327 </tr> 8328 <tr> 8329 <td>C.8 <a name="Supplemental_Character_Fallback_Data" 8330 href="#Supplemental_Character_Fallback_Data" id= 8331 "Supplemental_Character_Fallback_Data">Supplemental 8332 Character Fallback Data</a></td> 8333 <td>7 <a href= 8334 "tr35-info.html#Supplemental_Character_Fallback_Data">Supplemental 8335 Character Fallback Data</a></td> 8336 </tr> 8337 <tr> 8338 <td>M <a name="Coverage_Levels" href="#Coverage_Levels" id= 8339 "Coverage_Levels">Coverage Levels</a></td> 8340 <td>8 <a href="tr35-info.html#Coverage_Levels">Coverage 8341 Levels</a></td> 8342 </tr> 8343 <tr> 8344 <td>5.20 <a name="Metadata_Elements" href= 8345 "tr35-info.html#Metadata_Elements" id= 8346 "Metadata_Elements">Metadata Elements</a></td> 8347 <td>10 <a href="tr35-info.html#Metadata_Elements">Locale 8348 Metadata Element</a></td> 8349 </tr> 8350 <tr> 8351 <td>P <a name="Appendix_Supplemental_Metadata" href= 8352 "tr35-info.html#Appendix_Supplemental_Metadata" id= 8353 "Appendix_Supplemental_Metadata">Supplemental 8354 Metadata</a><br> 8355 P.1 <a name="Supplemental_Alias_Information" href= 8356 "tr35-info.html#Supplemental_Alias_Information" id= 8357 "Supplemental_Alias_Information">Supplemental Alias 8358 Information</a><br> 8359 P.2 <a name="Supplemental_Deprecated_Information" href= 8360 "tr35-info.html#Supplemental_Deprecated_Information" id= 8361 "Supplemental_Deprecated_Information">Supplemental 8362 Deprecated Information</a><br> 8363 P.3 <a name="Default_Content" href= 8364 "tr35-info.html#Default_Content" id= 8365 "Default_Content">Default Content</a></td> 8366 <td>9 <a href= 8367 "tr35-info.html#Appendix_Supplemental_Metadata">Supplemental 8368 Metadata</a><br> 8369 9.1 <a href= 8370 "tr35-info.html#Supplemental_Alias_Information">Supplemental 8371 Alias Information</a><br> 8372 9.2 <a href= 8373 "tr35-info.html#Supplemental_Deprecated_Information">Supplemental 8374 Deprecated Information</a><br> 8375 9.3 <a href="tr35-info.html#Default_Content">Default 8376 Content</a></td> 8377 </tr> 8378 </table> 8379 <table cellspacing="0" cellpadding="2" border="1" width="100%"> 8380 <caption> 8381 <a href="#Part_7_Links" name="Part_7_Links" id= 8382 "Part_7_Links">Part 7 Links</a>: <a href= 8383 "tr35-keyboards.html">Keyboards</a> (keyboard mappings) 8384 </caption> 8385 <tr> 8386 <th>Old section</th> 8387 <th>Section in new part</th> 8388 </tr> 8389 <tr> 8390 <td>S <a name="Keyboards" href="#Keyboards" id= 8391 "Keyboards">Keyboards</a></td> 8392 <td>1 <a href= 8393 "tr35-keyboards.html#Keyboards">Keyboards</a></td> 8394 </tr> 8395 <tr> 8396 <td>S <a name="Goals_and_Nongoals" href= 8397 "#Goals_and_Nongoals" id="Goals_and_Nongoals">Goals and 8398 Nongoals</a></td> 8399 <td><a href="tr35-keyboards.html#Goals_and_Nongoals">Goals 8400 and Nongoals</a></td> 8401 </tr> 8402 <tr> 8403 <td>S <a name="File_and_Dir_Structure" href= 8404 "#File_and_Dir_Structure" id="File_and_Dir_Structure">File 8405 and Directory Structure</a></td> 8406 <td><a href= 8407 "tr35-keyboards.html#File_and_Dir_Structure">File and 8408 Directory Structure</a></td> 8409 </tr> 8410 <tr> 8411 <td>S <a name="Element_Heirarchy_Layout_File" href= 8412 "#Element_Heirarchy_Layout_File" id= 8413 "Element_Heirarchy_Layout_File">Element Hierarchy - Layout 8414 File</a></td> 8415 <td><a href= 8416 "tr35-keyboards.html#Element_Heirarchy_Layout_File">Element 8417 Hierarchy - Layout File</a></td> 8418 </tr> 8419 <tr> 8420 <td>S <a name="Element_Heirarchy_Platform_File" href= 8421 "#Element_Heirarchy_Platform_File" id= 8422 "Element_Heirarchy_Platform_File">Element Hierarchy - 8423 Platform File</a></td> 8424 <td><a href= 8425 "tr35-keyboards.html#Element_Heirarchy_Platform_File">Element 8426 Hierarchy - Platform File</a></td> 8427 </tr> 8428 <tr> 8429 <td>S <a name="Invariants" href="#Invariants" id= 8430 "Invariants">Invariants</a></td> 8431 <td><a href= 8432 "tr35-keyboards.html#Invariants">Invariants</a></td> 8433 </tr> 8434 <tr> 8435 <td>S <a name="Data_Sources" href="#Data_Sources" id= 8436 "Data_Sources">Data Sources</a></td> 8437 <td><a href="tr35-keyboards.html#Data_Sources">Data 8438 Sources</a></td> 8439 </tr> 8440 <tr> 8441 <td>S <a name="Keyboard_IDs" href="#Keyboard_IDs" id= 8442 "Keyboard_IDs">Keyboard IDs</a></td> 8443 <td><a href="tr35-keyboards.html#Keyboard_IDs">Keyboard 8444 IDs</a></td> 8445 </tr> 8446 <tr> 8447 <td>S <a name="Platform_Behaviors_in_Edge_Cases" href= 8448 "#Platform_Behaviors_in_Edge_Cases" id= 8449 "Platform_Behaviors_in_Edge_Cases">Platform Behaviors in 8450 Edge Cases</a></td> 8451 <td><a href= 8452 "tr35-keyboards.html#Platform_Behaviors_in_Edge_Cases">Platform 8453 Behaviors in Edge Cases</a></td> 8454 </tr> 8455 <tr> 8456 <td>S <a name="Element_Keyboard" href="#Element_Keyboard" 8457 id="Element_Keyboard">Element: keyboard</a></td> 8458 <td><a href="tr35-keyboards.html#Element_Keyboard">Element: 8459 keyboard</a></td> 8460 </tr> 8461 <tr> 8462 <td>S <a name="Element_version" href="#Element_version" id= 8463 "Element_version">Element: version</a></td> 8464 <td><a href="tr35-keyboards.html#Element_version">Element: 8465 version</a></td> 8466 </tr> 8467 <tr> 8468 <td>S <a name="Element_generation" href= 8469 "#Element_generation" id="Element_generation">Element: 8470 generation</a></td> 8471 <td><a href= 8472 "tr35-keyboards.html#Element_generation">Element: 8473 generation</a></td> 8474 </tr> 8475 <tr> 8476 <td>S <a name="Element_names" href="#Element_names" id= 8477 "Element_names">Element: names</a></td> 8478 <td><a href="tr35-keyboards.html#Element_names">Element: 8479 names</a></td> 8480 </tr> 8481 <tr> 8482 <td>S <a name="Element_name" href="#Element_name" id= 8483 "Element_name">Element: name</a></td> 8484 <td><a href="tr35-keyboards.html#Element_name">Element: 8485 name</a></td> 8486 </tr> 8487 <tr> 8488 <td>S <a name="Element_settings" href="#Element_settings" 8489 id="Element_settings">Element: settings</a></td> 8490 <td><a href="tr35-keyboards.html#Element_settings">Element: 8491 settings</a></td> 8492 </tr> 8493 <tr> 8494 <td>S <a name="Element_keyMap" href="#Element_keyMap" id= 8495 "Element_keyMap">Element: keyMap</a></td> 8496 <td><a href="tr35-keyboards.html#Element_keyMap">Element: 8497 keyMap</a></td> 8498 </tr> 8499 <tr> 8500 <td>S <a name="Element_map" href="#Element_map" id= 8501 "Element_map">Element: map</a></td> 8502 <td><a href="tr35-keyboards.html#Element_map">Element: 8503 map</a></td> 8504 </tr> 8505 <tr> 8506 <td>S <a name="Element_transforms" href= 8507 "#Element_transforms" id="Element_transforms">Element: 8508 transforms</a></td> 8509 <td><a href= 8510 "tr35-keyboards.html#Element_transforms">Element: 8511 transforms</a></td> 8512 </tr> 8513 <tr> 8514 <td>S <a name="Element_transform" href="#Element_transform" 8515 id="Element_transform">Element: transform</a></td> 8516 <td><a href= 8517 "tr35-keyboards.html#Element_transform">Element: 8518 transform</a></td> 8519 </tr> 8520 <tr> 8521 <td>S <a name="Element_platform" href="#Element_platform" 8522 id="Element_platform">Element: platform</a></td> 8523 <td><a href="tr35-keyboards.html#Element_platform">Element: 8524 platform</a></td> 8525 </tr> 8526 <tr> 8527 <td>S <a name="Element_hardwareMap" href= 8528 "#Element_hardwareMap" id="Element_hardwareMap">Element: 8529 hardwareMap</a></td> 8530 <td><a href= 8531 "tr35-keyboards.html#Element_hardwareMap">Element: 8532 hardwareMap</a></td> 8533 </tr> 8534 <tr> 8535 <td>S <a name="Principles_for_Keyboard_Ids" href= 8536 "#Principles_for_Keyboard_Ids" id= 8537 "Principles_for_Keyboard_Ids">Principles for Keyboard 8538 Ids</a></td> 8539 <td><a href= 8540 "tr35-keyboards.html#Principles_for_Keyboard_Ids">Principles 8541 for Keyboard Ids</a></td> 8542 </tr> 8543 </table> 8544 <hr> 8545 8546 <h2><a href="#LocaleId_Canonicalization" name="LocaleId_Canonicalization">Annex C. LocaleId Canonicalization</a></h2> 8547 <p> </p> 8548 <p>The languageAlias, scriptAlias, territoryAlias, and variantAlias elements are used as rules to transform an input <em>source localeId</em>. The first step is to transform the <em>languageId</em> portion of the localeId. <br> 8549 </p> 8550 <blockquote>Note: in the following discussion, the separator '-' is used. That is also used in examples of XML alias data, even though for compatibility reasons that alias data actually uses '_' as a separator. The processing can also be applied to syntax while maintaining the separator '_', <em>mutatis mutandis</em>. CLDR also uses “territory” and “region” interchangeably.</blockquote> 8551 <h3 >Definitions</h3> 8552 <h4 >1. Multimap interpretation</h4> 8553 <p>Interpret each languageId as a multimap from a <em>fieldId</em> (language, script, region, variants) to a <strong>set</strong> of field values.</p> 8554 <p><em>Examples:</em></p> 8555 <a ></a><a ></a> 8556 <table class='simple'> 8557 <tbody> 8558 <tr> 8559 <td colspan="1" rowspan="2"><p> </p> 8560 <p><strong>Source</strong></p></td> 8561 <td colspan="4" rowspan="1"><p><strong>Fields</strong></p></td> 8562 </tr> 8563 <tr> 8564 <td colspan="1" rowspan="1"><p><strong>Language</strong></p></td> 8565 <td colspan="1" rowspan="1"><p><strong>Script</strong></p></td> 8566 <td colspan="1" rowspan="1"><p><strong>Region</strong></p></td> 8567 <td colspan="1" rowspan="1"><p><strong>Variants</strong></p></td> 8568 </tr> 8569 <tr> 8570 <td colspan="1" rowspan="1"><p>en-GB</p></td> 8571 <td colspan="1" rowspan="1"><p>{en}</p></td> 8572 <td colspan="1" rowspan="1"><p>{}</p></td> 8573 <td colspan="1" rowspan="1"><p>{GB}</p></td> 8574 <td colspan="1" rowspan="1"><p>{}</p></td> 8575 </tr> 8576 <tr> 8577 <td colspan="1" rowspan="1"><p>und-GB</p></td> 8578 <td colspan="1" rowspan="1"><p>{}</p></td> 8579 <td colspan="1" rowspan="1"><p>{}</p></td> 8580 <td colspan="1" rowspan="1"><p>{GB}</p></td> 8581 <td colspan="1" rowspan="1"><p>{}</p></td> 8582 </tr> 8583 <tr> 8584 <td colspan="1" rowspan="1"><p>ja-Latn-YU-hepburn-heploc</p></td> 8585 <td colspan="1" rowspan="1"><p>{ja}</p></td> 8586 <td colspan="1" rowspan="1"><p>{Latn}</p></td> 8587 <td colspan="1" rowspan="1"><p>{YU}</p></td> 8588 <td colspan="1" rowspan="1"><p>{hepburn, heploc}</p></td> 8589 </tr> 8590 </tbody> 8591 </table> 8592 <p> </p> 8593 <ul> 8594 <li>This can be represented as an abbreviated format: {L={ja}, S={Latn}, R={YU}, V={hepburn, heploc}}, skipping empty sets.</li> 8595 <li>“und” is a special language code that is treated as an empty set.</li> 8596 <li>Of course, only the Variants can contain more than one item: the others are either empty or contain exactly 1 item.</li> 8597 </ul> 8598 <h4 >2. Alias elements</h4> 8599 <p>For the languageAlias elements, the <em>type</em> and <em>replacements</em> are languageIds.</p> 8600 <p>For the script-, territory- (aka region), and variant- Alias elements, the type and replacements are interpreted as a languageIds, <em>after</em> prefixing with “und-”. Thus</p> 8601 <code><territoryAlias type="AN" replacement="CW SX BQ" reason="deprecated"/></code> 8602 <p>is interpreted as:</p> 8603 <code><territoryAlias type="und-AN" replacement="und-CW und-SX und-BQ" reason="deprecated"/></code> 8604 <p>Note that for the case of territoryAlias, there may be multiple replacement values separated by spaces in the text (such as replacement="und-CW und-SX und-BQ"); other rules only ever have a single replacement value.</p> 8605 <p> </p> 8606 <h4 >3. Matches</h4> 8607 <p>A rule matches a source if and only for all fields, each <em>source</em> field ⊇ <em>type</em> field.</p> 8608 <blockquote> 8609 <p><em>Examples:</em></p> 8610 <p>source=“ja-heploc-hepburn” and type=”und-hepburn”</p> 8611 <table class='simple'> 8612 <tbody> 8613 <tr> 8614 <td colspan="1" rowspan="1"><p>{ja} ⊇ {} </p></td> 8615 <td colspan="1" rowspan="1"><p>success, und = {}</p></td> 8616 </tr> 8617 <tr> 8618 <td colspan="1" rowspan="1"><p>{hepburn, heploc} ⊇ {hepburn}</p></td> 8619 <td colspan="1" rowspan="1"><p><strong>success</strong></p></td> 8620 </tr> 8621 </tbody> 8622 </table> 8623 <p>so the rule matches the source. (Note that order of variants is immaterial to matching)</p> 8624 <p> </p> 8625 <p> </p> 8626 <p>source=“ja-hepburn” and type=”und-hepburn-heploc”</p> 8627 <table class='simple'> 8628 <tbody> 8629 <tr> 8630 <td colspan="1" rowspan="1"><p>{ja} ⊇ {} </p></td> 8631 <td colspan="1" rowspan="1"><p>success, und = {}</p></td> 8632 </tr> 8633 <tr> 8634 <td colspan="1" rowspan="1"><p>{hepburn} ⊉ {hepburn, heploc}</p></td> 8635 <td colspan="1" rowspan="1"><p><strong>failure</strong></p></td> 8636 </tr> 8637 </tbody> 8638 </table> 8639 <p>so the rule does not match the source.</p></blockquote> 8640 <h4 >4. Replacement</h4> 8641 <p>A matching rule can be used to transform the source fields as follows</p> 8642 <ul> 8643 <li>if type.field ≠ {} 8644 <ul> 8645 <li>source.field = (source.field - type.field) ∪ replacement.field</li> 8646 </ul> 8647 </li> 8648 8649 <li>else if source.field = {} and replacement.field ≠ {} 8650 <ul> 8651 <li>source.field = replacement.field</li> 8652 </ul> 8653 </li> 8654 </ul> 8655 <p><em>Example:</em></p> 8656 <blockquote><p>source=ja-Latn-fonipa-hepburn-heploc</p> 8657 <p>rule =”<languageAlias type="und-hepburn-heploc"</p> 8658 <p>replacement="und-alalc97">”</p> 8659 <p> </p> 8660 <p>result=”ja-Latn-alalc97-fonipa” // note that CLDR canonical order of variants is alphabetical</p></blockquote> 8661 <h5 >Territory Exception</h5> 8662 <p>If the field = territory, and the replacement.field has more than one value, then look up the most likely territory* for the base language code (and script, if there is one). If that likely territory is in the list of replacements, use it. Otherwise, use the first territory in the list.</p> 8663 <p><em>Example:</em></p> 8664 <blockquote><p>source=ja-Latn-fonipa-hepburn-heploc</p> 8665 <p>rule =”<languageAlias type="und-hepburn-heploc"</p> 8666 <p>replacement="und-alalc97">”</p> 8667 <p> </p> 8668 <p>result=”ja-Latn-alalc97-fonipa” <em>// note that CLDR canonical order of variants is alphabetical</em></p> 8669 </blockquote> 8670 <h4>5. Canonicalizing Syntax</h4> 8671 <p>To canonicalize the syntax of <em>source</em>: </p> 8672 <ul> 8673 <li>Initial Script Subtag 8674 <ul> 8675 <li>If the first subtag has 4 letters, prepend the source with "und-"</li> 8676 <li>Note: These are only for specialized use.</li> 8677 </ul> 8678 </li> 8679 <li>Casing 8680 <ul> 8681 <li>Put any script subtag into title case (eg, Hant)</li> 8682 <li>Put any region subtag int uppercase (eg, DE)</li> 8683 <li>Put all other subtags into lowercase (eg, en, fonipa)</li> 8684 </ul> 8685 </li> 8686 <li>Order 8687 <ul> 8688 <li>Put any variants into alphabetical order (eg, en-fonipa-scouse, not en-scouse-fonipa)</li> 8689 <li>Put any extensions into alphabetical order by their singleton (eg, en-t-xxx-u-yyy, not en-u-yyy-t-xxx)</li> 8690 <li>Put all attributes into alphabetical order.</li> 8691 <li>Put all <keywords, tfields> pairs into alphabetical order of their keys, within their respective extensions.</li> 8692 <li>Remove any type or tfield value of "true"</li> 8693 </ul> 8694 </li> 8695 <li>Separator 8696 <ul> 8697 <li>Replace '_' by '-' </li> 8698 </ul> 8699 </li> 8700 </ul> 8701 <h3 >Preprocessing</h3> 8702 <p>The data from supplementalMetadata is (logically) preprocessed as follows.</p> 8703 <ol start="1"> 8704 <li>Load the rules from supplementalMetadata.xml, replacing '_' by '-', and adding “und-” as described in <em>Definition 2. Alias Elements</em>.</li> 8705 <li>Capture all languageAlias rules where the <em>type</em> is an invalid languageId into a set of <strong>BCP47 LegacyRules</strong>. Example: 8706 <ol> 8707 <li><languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy"/></li> 8708 </ol> 8709 </li> 8710 <li>Discard all rules where the <em>type</em> is an invalid languageId. Examples are 8711<ol> 8712 <li><languageAlias type="i-mingo" replacement="see-x-i-mingo" reason="legacy"/></li> 8713 <li><territoryAlias type="und-AAA" replacement="und-AA" reason="overlong"/></li> 8714 </ol> 8715 </li> 8716 <li>Change the <em>type</em> and <em>replacement</em> values in the remaining rules into multimap rules, as per <em>Definition 1. Multimap Interpretation</em>. 8717 <ol> 8718 <li>Note that the “und” value disappears.</li> 8719 </ol> 8720 </li> 8721 8722 <li>Order the set of rules by 8723 <ol> 8724 <li>the size of the union of all field value sets, with largest size first</li> 8725 <li>and then alphabetically by field.</li> 8726 </ol> 8727 </li> 8728 8729 <li>The result is the set of <strong>Alias Rules</strong></li> 8730 </ol> 8731 <p> </p> 8732 <p>So using the examples above, we get the following order:</p> 8733 <table class='simple'> 8734 <tbody> 8735 <tr> 8736 <td colspan="1" rowspan="1"><p><strong>languageId</strong></p></td> 8737 <td colspan="1" rowspan="1"><p><strong>size of union</strong></p></td> 8738 <td colspan="1" rowspan="1"><p><strong>Alpha</strong></p></td> 8739 </tr> 8740 <tr> 8741 <td colspan="1" rowspan="1"><p>{V={hepburn, heploc}}</p></td> 8742 <td colspan="1" rowspan="1"><p>2</p></td> 8743 <td colspan="1" rowspan="1"><p>n/a</p></td> 8744 </tr> 8745 <tr> 8746 <td colspan="1" rowspan="1"><p>{L={en}, R={GB}}</p></td> 8747 <td colspan="1" rowspan="1"><p>2</p></td> 8748 <td colspan="1" rowspan="2"><p>en < fr</p></td> 8749 </tr> 8750 <tr> 8751 <td colspan="1" rowspan="1"><p>{L={fr}, R={CA}}</p></td> 8752 <td colspan="1" rowspan="1"><p>2</p></td> 8753 </tr> 8754 <tr> 8755 <td colspan="1" rowspan="1"><p>{R={CA}}</p></td> 8756 <td colspan="1" rowspan="1"><p>1</p></td> 8757 <td colspan="1" rowspan="1"><p>n/a</p></td> 8758 </tr> 8759 </tbody> 8760 </table> 8761 <p> </p> 8762 <blockquote><strong>Note: </strong>The secondary sort order in Preprocessing step 5.2 is only to ensure determinant results when two rules “of the same length” could apply.</blockquote> 8763 <h3 >Processing LanguageIds</h3> 8764 <p>To canonicalize a given <em>source</em>:</p> 8765 <ol start="1"> 8766 <li>Canonicalize the syntax of <em>source</em> as per <em>Definition 5. Canonicalizing Syntax</em>.</li> 8767 <li>Where the <em>source</em> could be an arbitrary BCP 47 language tag, first process as follows: 8768<ol> 8769 <li>If the source is identical to one of the types in the BCP47 LegacyRules, replace the entire source by the replacement value.</li> 8770 <li>Else if there is an extlang subtag, then apply Step 3 of <a href="https://www.google.com/url?q=https://tools.ietf.org/html/bcp47%23section-4.5&sa=D&ust=1600829915065000&usg=AOvVaw12vD5EzoVl3VFzEyrECMj-">https://tools.ietf.org/html/bcp47#section-4.5</a> to remove the extlang subtag (possibly adjusting the language subtag). 8771 <ol> 8772 <li>Don’t apply any of the other canonicalization steps in that section, however.</li> 8773 </ol> 8774 </li> 8775 <li>Else if the first subtag is "x", prefix by "und-".</li> 8776 <li><strong>Note: </strong>there are currently no valid 4-letter primary language subtags. While it is extremely unlikely that BCP47 would ever register them, if so then <i>languageAlias</i> mappings will be supplied for them, mapping to defined CLDR language subtags (from the idStatus="reserved" set).</li> 8777 </ol> 8778 </li> 8779 <li>Find the first matching rule in <strong>Alias Rules</strong> (from <strong>Preprocessing</strong>) 8780<ol> 8781 <li>If there are none, return <em>source</em></li> 8782 </ol> 8783 </li> 8784 <li>Transform <em>source</em> according to that rule</li> 8785 <li>loop (goto #3)</li> 8786 </ol> 8787 <h2 >Processing LocaleIds</h2> 8788 <p>The canonicalization of localeIds is done by first canonicalizing the languageId portion, then handling extensions in the following way:</p> 8789 <ol start="1"> 8790 <li>Replace any <em>tlang</em> languageId value by its canonicalization.</li> 8791 <li>Use the bcp47 data to replace keys, types, tfields, and tvalues by their canonical forms. See <strong>Section 3.6.4 U Extension Data Files</strong> and <strong>Section 3.7.1 T Extension Data Files</strong>. The matches are in the alias attribute value, while the canonical replacement is in the name attribute value. For example: 8792 <ol> 8793 <li>Because of the following bcp47 data:<br> 8794 <code><key name="ms"…>…<type name="uksystem" … alias="imperial" … />…</key></code></li> 8795 <li>We get the following transformation:<br> 8796 <code>en-u-ms-imperial ⇒ en-u-ms-uksystem</code></li> 8797 </ol> 8798 </li> 8799 8800 <li>If there is an 'sd' or 'rg' key, replace any subdivision alias in its value in the same way, using subdivisionAlias data.</li> 8801 </ol> 8802 <h2 >Optimizations</h2> 8803 <p>The above algorithm is a logical statement of the process, but would obviously not be directly suited to production code. Production-level code can use many optimizations for efficiency while achieving the same result. For example, the Alias Rules can be further preprocessed to avoid indefinite looping, instead doing a rule lookup once per subtag. As another example, the small number of <strong>Territory Exceptions</strong> can be preprocessed to avoid the likely subtags processing.</p> 8804 <p> </p> 8805 8806 <hr> 8807 <h2><a name="References" href="#References" id= 8808 "References">References</a></h2> 8809 <table cellpadding="4" cellspacing="0" class="noborder" border= 8810 "0"> 8811 <tr> 8812 <th class="noborder" width="148">Ancillary Information</th> 8813 <td class="noborder" width="730"><i>To properly localize, 8814 parse, and format data requires ancillary information, 8815 which is not expressed in Locale Data Markup Language. Some 8816 of the formats for values used in Locale Data Markup 8817 Language are constructed according to external 8818 specifications. The sources for this data and/or formats 8819 include the following:<br> 8820 </i></td> 8821 </tr> 8822 <tr> 8823 <td class="noborder" width="148">[<a name="Bugs" href= 8824 "#Bugs" id="Bugs">Bugs</a>]</td> 8825 <td class="noborder" width="730">CLDR Bug Reporting 8826 form<br> 8827 <a href= 8828 "http://cldr.unicode.org/index/bug-reports">http://cldr.unicode.org/index/bug-reports</a></td> 8829 </tr> 8830 <tr> 8831 <td class="noborder" width="148">[<a name="Charts" href= 8832 "#Charts" id="Charts">Charts</a>]</td> 8833 <td class="noborder" width="730">The online code charts can 8834 be found at <a href= 8835 "https://unicode.org/charts/">https://unicode.org/charts/</a> 8836 An index to character names with links to the corresponding 8837 chart is found at <a href= 8838 "https://unicode.org/charts/charindex.html">https://unicode.org/charts/charindex.html</a></td> 8839 </tr> 8840 <tr> 8841 <td class="noborder" width="148">[<a name="DUCET" href= 8842 "#DUCET" id="DUCET">DUCET</a>]</td> 8843 <td class="noborder" width="730">The Default Unicode 8844 Collation Element Table (DUCET)<br> 8845 For the base-level collation, of which all the collation 8846 tables in this document are tailorings.<br> 8847 <a href= 8848 "https://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table"> 8849 https://unicode.org/reports/tr10/#Default_Unicode_Collation_Element_Table</a></td> 8850 </tr> 8851 <tr> 8852 <td class="noborder" width="148">[<a name="FAQ" href="#FAQ" 8853 id="FAQ">FAQ</a>]</td> 8854 <td class="noborder" valign="top" width="730">Unicode 8855 Frequently Asked Questions<br> 8856 <a href= 8857 "https://unicode.org/faq/">https://unicode.org/faq/<br></a> 8858 <i>For answers to common questions on technical 8859 issues.</i></td> 8860 </tr> 8861 <tr> 8862 <td class="noborder" width="148">[<a name="FCD" href="#FCD" 8863 id="FCD">FCD</a>]</td> 8864 <td class="noborder" width="730">As defined in UTN #5 8865 Canonical Equivalences in Applications<br> 8866 <a href= 8867 "https://unicode.org/notes/tn5/">https://unicode.org/notes/tn5/</a></td> 8868 </tr> 8869 <tr> 8870 <td class="noborder" width="148">[<a name="Glossary" href= 8871 "#Glossary" id="Glossary">Glossary</a>]</td> 8872 <td class="noborder" width="730">Unicode Glossary<a href= 8873 "https://unicode.org/glossary/"><br> 8874 https://unicode.org/glossary/<br></a> <i>For explanations of 8875 terminology used in this and other documents.</i></td> 8876 </tr> 8877 <tr> 8878 <td class="noborder" width="148">[<a name="JavaChoice" 8879 href="#JavaChoice" id="JavaChoice">JavaChoice</a>]</td> 8880 <td class="noborder" width="730">Java ChoiceFormat<br> 8881 <a href= 8882 "https://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html"> 8883 https://docs.oracle.com/javase/7/docs/api/java/text/ChoiceFormat.html</a></td> 8884 </tr> 8885 <tr> 8886 <td class="noborder" width="148">[<a name="Olson" href= 8887 "#Olson" id="Olson">Olson</a>]</td> 8888 <td class="noborder" width="730">The <i>TZ</i>ID Database 8889 (aka Olson timezone database)<br> 8890 Time zone and daylight savings information.<br> 8891 <a href= 8892 "https://www.iana.org/time-zones">https://www.iana.org/time-zones</a><br> 8893 8894 For archived data, see <br> 8895 <a href= 8896 "ftp://ftp.iana.org/tz/releases/">ftp://ftp.iana.org/tz/releases/</a></td> 8897 </tr> 8898 <tr> 8899 <td class="noborder" width="148">[<a name="Reports" href= 8900 "#Reports" id="Reports">Reports</a>]</td> 8901 <td class="noborder" width="730">Unicode Technical 8902 Reports<br> 8903 <a href= 8904 "https://unicode.org/reports/">https://unicode.org/reports/<br> 8905 </a> <i>For information on the status and development 8906 process for technical reports, and for a list of technical 8907 reports.</i></td> 8908 </tr> 8909 <tr> 8910 <td class="noborder" width="148">[<a name="Unicode" href= 8911 "#Unicode" id="Unicode">Unicode</a>]</td> 8912 <td class="noborder" width="730">The Unicode Consortium, <i>The Unicode Standard, Version 13.0.0</i><br> 8913 (Mountain View, CA: The Unicode Consortium, 2020. ISBN 978-1-936213-26-9)<br> 8914 <a href="https://www.unicode.org/versions/Unicode13.0.0/">https://www.unicode.org/versions/Unicode13.0.0/</a> 8915 </td> 8916 </tr> 8917 <tr> 8918 <td class="noborder" width="148">[<a name="Versions" href= 8919 "#Versions" id="Versions">Versions</a>]</td> 8920 <td class="noborder" width="730">Versions of the Unicode 8921 Standard<br> 8922 <a href= 8923 "https://www.unicode.org/versions/">https://www.unicode.org/versions/</a><br> 8924 8925 <i>For information on version numbering, and citing and 8926 referencing the Unicode Standard, the Unicode Character 8927 Database, and Unicode Technical Reports.</i></td> 8928 </tr> 8929 <tr> 8930 <td class="noborder" width="148">[<a name="XPath" href= 8931 "#XPath" id="XPath">XPath</a>]</td> 8932 <td class="noborder" width="730"><a href= 8933 "https://www.w3.org/TR/xpath/">https://www.w3.org/TR/xpath/</a></td> 8934 </tr> 8935 <tr> 8936 <th class="noborder" width="148">Other Standards</th> 8937 <td class="noborder" width="730"><i>Various standards 8938 define codes that are used as keys or values in Locale Data 8939 Markup Language. These include:</i></td> 8940 </tr> 8941 <tr> 8942 <td class="noborder">[<a name="BCP47" href="#BCP47" id= 8943 "BCP47">BCP47</a>]</td> 8944 <td class="noborder"> 8945 <a href= 8946 "https://www.rfc-editor.org/rfc/bcp/bcp47.txt">https://www.rfc-editor.org/rfc/bcp/bcp47.txt</a> 8947 <p>The Registry<br> 8948 <a href= 8949 "https://www.iana.org/assignments/language-subtag-registry"> 8950 https://www.iana.org/assignments/language-subtag-registry</a></p> 8951 </td> 8952 </tr> 8953 <tr> 8954 <td class="noborder" width="148">[<a name="ISO639" href= 8955 "#ISO639" id="ISO639">ISO639</a>]</td> 8956 <td class="noborder" width="730">ISO Language Codes<br> 8957 <a href= 8958 "https://www.loc.gov/standards/iso639-2/">https://www.loc.gov/standards/iso639-2/</a><br> 8959 8960 Actual List<br> 8961 <a href= 8962 "https://www.loc.gov/standards/iso639-2/langcodes.html">https://www.loc.gov/standards/iso639-2/langcodes.html</a></td> 8963 </tr> 8964 <tr> 8965 <td class="noborder" width="148">[<a name="ISO1000" href= 8966 "#ISO1000" id="ISO1000">ISO1000</a>]</td> 8967 <td class="noborder" width="730">ISO 1000: SI units and 8968 recommendations for the use of their multiples and of 8969 certain other units, International Organization for 8970 Standardization, 1992.<br> 8971 <a href= 8972 "https://www.iso.org/iso/catalogue_detail?csnumber=5448">https://www.iso.org/iso/catalogue_detail?csnumber=5448</a></td> 8973 </tr> 8974 <tr> 8975 <td class="noborder" width="148">[<a name="ISO3166" href= 8976 "#ISO3166" id="ISO3166">ISO3166</a>]</td> 8977 <td class="noborder" width="730">ISO Region Codes<br> 8978 <a href= 8979 "https://www.iso.org/iso/country_codes">https://www.iso.org/iso/country_codes</a><br> 8980 8981 Actual List<br> 8982 <a href= 8983 "https://www.iso.org/obp/ui/#search">https://www.iso.org/obp/ui/#search</a></td> 8984 </tr> 8985 <tr> 8986 <td class="noborder" width="148">[<a name="ISO4217" href= 8987 "#ISO4217" id="ISO4217">ISO4217</a>]</td> 8988 <td class="noborder" width="730"> 8989 ISO Currency Codes<br> 8990 <a href= 8991 "https://www.iso.org/iso/home/standards/currency_codes.htm"> 8992 https://www.iso.org/iso/home/standards/currency_codes.htm</a> 8993 <p><i>(Note that as of this point, there are significant 8994 problems with this list. The supplemental data file 8995 contains the best compendium of currency information 8996 available.)</i></p> 8997 </td> 8998 </tr> 8999 <tr> 9000 <td class="noborder" width="148">[<a name="ISO8601" href= 9001 "#ISO8601" id="ISO8601">ISO8601</a>]</td> 9002 <td class="noborder" width="730">ISO Date and Time 9003 Format<br> 9004 <a href= 9005 "https://www.iso.org/iso/iso8601">https://www.iso.org/iso/iso8601</a></td> 9006 </tr> 9007 <tr> 9008 <td class="noborder" width="148">[<a name="ISO15924" href= 9009 "#ISO15924" id="ISO15924">ISO15924</a>]</td> 9010 <td class="noborder" width="730">ISO Script Codes<br> 9011 <a href= 9012 "https://www.unicode.org/iso15924/index.html">https://www.unicode.org/iso15924/index.html</a><br> 9013 9014 Actual List<br> 9015 <a href= 9016 "https://www.unicode.org/iso15924/codelists.html">https://www.unicode.org/iso15924/codelists.html</a></td> 9017 </tr> 9018 <tr> 9019 <td class="noborder" width="148">[<a name="LOCODE" href= 9020 "#LOCODE" id="LOCODE">LOCODE</a>]</td> 9021 <td class="noborder" width="730">United Nations Code for 9022 Trade and Transport Locations, commonly known as 9023 "UN/LOCODE"<br> 9024 <a href= 9025 "https://www.unece.org/cefact/locode/welcome.html">https://www.unece.org/cefact/locode/welcome.html</a><br> 9026 9027 Download at: <a href= 9028 "https://www.unece.org/cefact/codesfortrade/codes_index.htm"> https://www.unece.org/cefact/codesfortrade/codes_index.htm</a></td> 9029 </tr> 9030 <tr> 9031 <td class="noborder" width="148">[<a name="RFC6067" href= 9032 "#RFC6067" id="RFC6067">RFC6067</a>]</td> 9033 <td class="noborder" width="730">BCP 47 Extension U<br> 9034 <a href= 9035 "https://www.ietf.org/rfc/rfc6067.txt">https://www.ietf.org/rfc/rfc6067.txt</a></td> 9036 </tr> 9037 <tr> 9038 <td class="noborder" width="148">[<a name="RFC6497" href= 9039 "#RFC6497" id="RFC6497">RFC6497</a>]</td> 9040 <td class="noborder" width="730">BCP 47 Extension T - 9041 Transformed Content<br> 9042 <a href= 9043 "https://www.ietf.org/rfc/rfc6497.txt">https://www.ietf.org/rfc/rfc6497.txt</a></td> 9044 </tr> 9045 <tr> 9046 <td class="noborder" width="148">[<a name="UNM49" href= 9047 "#UNM49" id="UNM49">UNM49</a>]</td> 9048 <td class="noborder" width="730"> 9049 UN M.49: UN Statistics Division 9050 <p>Country or area & region codes<br> 9051 <a href= 9052 "https://unstats.un.org/unsd/methods/m49/m49.htm">https://unstats.un.org/unsd/methods/m49/m49.htm</a></p> 9053 <p>Composition of macro geographical (continental) 9054 regions, geographical sub-regions, and selected economic 9055 and other groupings<br> 9056 <a href= 9057 "https://unstats.un.org/unsd/methods/m49/m49regin.htm">https://unstats.un.org/unsd/methods/m49/m49regin.htm</a></p> 9058 </td> 9059 </tr> 9060 <tr> 9061 <td class="noborder" width="148">[<a name="XMLSchema" href= 9062 "#XMLSchema" id="XMLSchema">XML Schema</a>]</td> 9063 <td class="noborder" width="730">W3C XML Schema<br> 9064 <a href= 9065 "https://www.w3.org/XML/Schema">https://www.w3.org/XML/Schema</a></td> 9066 </tr> 9067 <tr> 9068 <th class="noborder" width="148">General</th> 9069 <td class="noborder" width="730"><i>The following are 9070 general references from the text:</i></td> 9071 </tr> 9072 <tr> 9073 <td class="noborder" width="148">[<a name="ByType" href= 9074 "#ByType" id="ByType">ByType</a>]</td> 9075 <td class="noborder" width="730">CLDR Comparison Charts<br> 9076 <a href= 9077 "https://www.unicode.org/cldr/comparison_charts.html">https://www.unicode.org/cldr/comparison_charts.html</a></td> 9078 </tr> 9079 <tr> 9080 <td class="noborder" width="148">[<a name="Calendars" href= 9081 "#Calendars" id="Calendars">Calendars</a>]</td> 9082 <td class="noborder" width="730">Calendrical Calculations: 9083 The Millennium Edition by Edward M. Reingold, Nachum 9084 Dershowitz; Cambridge University Press; Book and CD-ROM 9085 edition (July 1, 2001); ISBN: 0521777526. Note that the 9086 algorithms given in this book are copyrighted.</td> 9087 </tr> 9088 <tr> 9089 <td class="noborder" width="148">[<a name="Comparisons" 9090 href="#Comparisons" id="Comparisons">Comparisons</a>]</td> 9091 <td class="noborder" width="730">Comparisons between locale 9092 data from different sources<br> 9093 <a href= 9094 "https://unicode-org.github.io/cldr-staging/charts/38/supplemental/dtd_deltas.html">https://unicode-org.github.io/cldr-staging/charts/38/supplemental/dtd_deltas.html</a></td> 9095 </tr> 9096 <tr> 9097 <td class="noborder" width="148">[<a name="CurrencyInfo" 9098 href="#CurrencyInfo" id= 9099 "CurrencyInfo">CurrencyInfo</a>]</td> 9100 <td class="noborder" width="730">UNECE Currency Data<br> 9101 <a href= 9102 "https://www.currency-iso.org/en/home/tables.html">https://www.currency-iso.org/en/home/tables.html</a></td> 9103 </tr> 9104 <tr> 9105 <td class="noborder" width="148">[<a name="DataFormats" 9106 href="#DataFormats" id="DataFormats">DataFormats</a>]</td> 9107 <td class="noborder" width="730">CLDR Translation 9108 Guidelines<br> 9109 <a href= 9110 "http://cldr.unicode.org/translation">http://cldr.unicode.org/translation</a></td> 9111 </tr> 9112 <tr> 9113 <td class="noborder" width="148">[<a name="LDML" href= 9114 "#LDML" id="LDML">Example</a>]</td> 9115 <td class="noborder" width="730">A sample in Locale Data 9116 Markup Language<br> 9117 <a href= 9118 "https://unicode.org/cldr/dtd/1.1/ldml-example.xml">https://unicode.org/cldr/dtd/1.1/ldml-example.xml</a></td> 9119 </tr> 9120 <tr> 9121 <td class="noborder" width="148">[<a name="ICUCollation" 9122 href="#ICUCollation" id= 9123 "ICUCollation">ICUCollation</a>]</td> 9124 <td class="noborder" width="730">ICU rule syntax<br> 9125 <a href= 9126 "https://unicode-org.github.io/icu/userguide/collation/customization/"> 9127 https://unicode-org.github.io/icu/userguide/collation/customization/</a></td> 9128 </tr> 9129 <tr> 9130 <td class="noborder" width="148">[<a name="ICUTransforms" 9131 href="#ICUTransforms" id= 9132 "ICUTransforms">ICUTransforms</a>]</td> 9133 <td class="noborder" width="730">Transforms<br> 9134 <a href= 9135 "https://unicode-org.github.io/icu/userguide/transforms/"> 9136 https://unicode-org.github.io/icu/userguide/transforms/</a><br> 9137 9138 Transforms Demo<br> 9139 <a href= 9140 "http://demo.icu-project.org/icu-bin/translit/">http://demo.icu-project.org/icu-bin/translit/</a></td> 9141 </tr> 9142 <tr> 9143 <td class="noborder" width="148">[<a name="ICUUnicodeSet" 9144 href="#ICUUnicodeSet" id= 9145 "ICUUnicodeSet">ICUUnicodeSet</a>]</td> 9146 <td class="noborder" width="730">ICU UnicodeSet<br> 9147 <a href= 9148 "https://unicode-org.github.io/icu/userguide/strings/unicodeset.html">https://unicode-org.github.io/icu/userguide/strings/unicodeset.html<br> 9149 </a> API<br> 9150 <a href= 9151 "https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html"> 9152 https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/UnicodeSet.html</a></td> 9153 </tr> 9154 <tr> 9155 <td class="noborder" width="148">[<a name="ITUE164" href= 9156 "#ITUE164" id="ITUE164">ITUE164</a>]</td> 9157 <td class="noborder" width="730">International 9158 Telecommunication Union: List Of ITU Recommendation E.164 9159 Assigned Country Codes<br> 9160 available at <a href= 9161 "https://www.itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2"> 9162 https://www.itu.int/opb/publications.aspx?parent=T-SP&view=T-SP2</a></td> 9163 </tr> 9164 <tr> 9165 <td class="noborder" width="148">[<a name="LocaleExplorer" 9166 href="#LocaleExplorer" id= 9167 "LocaleExplorer">LocaleExplorer</a>]</td> 9168 <td class="noborder" width="730">ICU Locale Explorer<br> 9169 <a href= 9170 "http://demo.icu-project.org/icu-bin/locexp">http://demo.icu-project.org/icu-bin/locexp</a></td> 9171 </tr> 9172 <tr> 9173 <td class="noborder" width="148">[<a name="localeProject" 9174 href="#localeProject" id= 9175 "localeProject">LocaleProject</a>]</td> 9176 <td class="noborder" width="730">Common Locale Data 9177 Repository Project<br> 9178 <a href= 9179 "https://unicode.org/cldr/">https://unicode.org/cldr/</a></td> 9180 </tr> 9181 <tr> 9182 <td class="noborder" width="148">[<a name="NamingGuideline" 9183 href="#NamingGuideline" id= 9184 "NamingGuideline">NamingGuideline</a>]</td> 9185 <td class="noborder" width="730">OpenI18N Locale Naming 9186 Guideline<br> 9187 formerly at 9188 https://www.openi18n.org/docs/text/LocNameGuide-V10.txt</td> 9189 </tr> 9190 <tr> 9191 <td class="noborder" width="148">[<a name="RBNF" href= 9192 "#RBNF" id="RBNF">RBNF</a>]</td> 9193 <td class="noborder" width="730">Rule-Based Number 9194 Format<br> 9195 <a href= 9196 "https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html"> 9197 https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1RuleBasedNumberFormat.html</a></td> 9198 </tr> 9199 <tr> 9200 <td class="noborder" width="148">[<a name="RBBI" href= 9201 "#RBBI" id="RBBI">RBBI</a>]</td> 9202 <td class="noborder" width="730">Rule-Based Break 9203 Iterator<br> 9204 <a href= 9205 "https://unicode-org.github.io/icu/userguide/boundaryanalysis"> 9206 https://unicode-org.github.io/icu/userguide/boundaryanalysis</a></td> 9207 </tr> 9208 <tr> 9209 <td class="noborder" width="148">[<a name="UCAChart" href= 9210 "#UCAChart" id="UCAChart">UCAChart</a>]</td> 9211 <td class="noborder" width="730">Collation Chart<a href= 9212 "https://unicode.org/charts/collation/"><br> 9213 https://unicode.org/charts/collation/</a></td> 9214 </tr> 9215 <tr> 9216 <td class="noborder" width="148">[<a name="UTCInfo" href= 9217 "#UTCInfo" id="UTCInfo">UTCInfo</a>]</td> 9218 <td class="noborder" width="730">NIST Time and Frequency 9219 Division Home Page<br> 9220 <a href="https://tf.nist.gov/">https://tf.nist.gov/<br></a> 9221 U.S. Naval Observatory: What is Universal Time?<br> 9222 <a href= 9223 "https://www.usno.navy.mil/USNO/time/master-clock/systems-of-time">https://www.usno.navy.mil/USNO/time/master-clock/systems-of-time</a></td> 9224 </tr> 9225 <tr> 9226 <td class="noborder" width="148">[<a name="WindowsCulture" 9227 href="#WindowsCulture" id= 9228 "WindowsCulture">WindowsCulture</a>]</td> 9229 <td class="noborder" width="730">Windows Culture Info 9230 (with mappings from [<a href= 9231 "#BCP47">BCP47</a>]-style codes to LCIDs)<br> 9232 <a href= 9233 "https://msdn.microsoft.com/en-us/library/system.globalization.cultureinfo(vs.71).aspx"> 9234 http://msdn2.microsoft.com/en-us/library/system.globalization.cultureinfo(vs.71).aspx</a></td> 9235 </tr> 9236 </table> 9237 <h2><a name="Acknowledgments" href="#Acknowledgments" id= 9238 "Acknowledgments">Acknowledgments</a></h2> 9239 <p>Special thanks to the following people for their continuing 9240 overall contributions to the CLDR project, and for their 9241 specific contributions in the following areas. These 9242 descriptions only touch on the many contributions that they 9243 have made.</p> 9244 <ul> 9245 <li>Mark 9246 Davis for creating the initial version of LDML, and 9247 adding to and maintaining this specification, and for his 9248 work on the LDML code and tests, much of the supplemental 9249 data and overall structure, and transforms and 9250 keyboards.</li> 9251 <li>John Emmons for the POSIX conversion tool and 9252 metazones.</li> 9253 <li>Deborah Goldsmith for her contributions to LDML 9254 architecture and this specification.</li> 9255 <li>Chris Hansten for coordinating and managing data 9256 submissions and vetting.</li> 9257 <li>Erkki Kolehmainen and his team for their work on 9258 Finnish.</li> 9259 <li>Steven R. Loomis for development of the survey tool and 9260 database management.</li> 9261 <li>Peter Nugent for his contributions to the POSIX tool and 9262 from Open Office, and for coordinating and managing data 9263 submissions and vetting.</li> 9264 <li>George Rhoten for his work on currencies.</li> 9265 <li>Roozbeh Pournader (روزبه پورنادر) for his work on South 9266 Asian countries.</li> 9267 <li>Ram Viswanadha (రఘురామ్ విశ్వనాధ) for all of his work on 9268 LDML code and data integration, and for coordinating and 9269 managing data submissions and vetting.</li> 9270 <li>Vladimir Weinstein (Владимир Вајнштајн) for his work on 9271 collation.</li> 9272 <li>Yoshito Umaoka (馬岡 由人) for his work on the timezone 9273 architecture.</li> 9274 <li>Rick McGowan for his work gathering language, script and 9275 region data.</li> 9276 <li>Xiaomei Ji (吉晓梅) for her work on time intervals and 9277 plural formatting.</li> 9278 <li>David Bertoni for his contributions to the conversion 9279 tools.</li> 9280 <li>Mike Tardif for reviewing this specification and for 9281 coordinating and vetting data submissions.</li> 9282 <li>Peter Edberg for work on this specification, 9283 monthPatterns, cyclicNameSets, contextTransforms and other 9284 items.</li> 9285 <li>Raymond Wainman and Cibu Johny for their work on 9286 keyboards.</li> 9287 <li>Jennifer Chye for her contributions to the conversion 9288 tools.</li> 9289 <li>Markus Scherer for a major rewrite of Part 5, Collation.</li> 9290 <li><a href="https://www.sffc.xyz/">Shane Carr</a> for his work on numbers and measurement units.</li> 9291 <li>Robin Leroy for his work on compact plurals: Part 3, Section 5, <a href="tr35-numbers.html#Language_Plural_Rules">Language Plural 9292 Rules</a></li> 9293 </ul> 9294 <p>Other contributors to CLDR are listed on the <a href= 9295 "https://www.unicode.org/cldr/">CLDR Project Page</a>.</p> 9296 9297 9298 <h2><a name="Modifications" href="#Modifications" id= 9299 "Modifications">Modifications</a></h2> 9300 9301 <p><b>Revision 61</b></p> 9302 <ul> 9303 <li><b>Reissued</b> for CLDR 38.</li> 9304 9305 <li><strong>Part 1: <a href="tr35.html#Contents">Core</a> (languages, locales, basic structure)</strong> 9306 <ul> 9307 <li><strong>Section 3.2.1 <a href="#Canonical_Unicode_Locale_Identifiers">Canonical Unicode Locale Identifiers</a></strong>: replaced text by a reference to <strong>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></strong> 9308 <li><strong>Section 3.3.1 <a href= 9309 "#BCP_47_Language_Tag_Conversion" >BCP 47 Language Tag 9310 Conversion</a>:</strong> replaced text by a reference to <strong>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></strong></li> 9311 <li><strong>Section 3.6.1 <a href="#Key_And_Type_Definitions_" >Key And Type Definitions</a></strong>: 9312 added new key “dx”, for <a href="#UnicodeDictionaryBreakExclusionIdentifier" >Unicode Dictionary Break Exclusion Identifier</a>.</li> 9313 <li><strong>Section 3.6.4 <a href="#Unicode_Locale_Extension_Data_Files" >U Extension Data Files</a></strong>: 9314 added description of <a href="#SCRIPT_CODE" >SCRIPT_CODE</a> value for key “dx”.</li> 9315 <li><strong>Section 4.1.2 <a href="#Lateral_Inheritance">Lateral Inheritance</a>: </strong>specified lateral inheritance in more detail, added case and gender.</li> 9316 <li><strong>Annex C. <a href="#LocaleId_Canonicalization" >LocaleId Canonicalization</a></strong> 9317 <ul> 9318 <li>Added new Annex, replacing text in <strong>Section 3.2.1 <a href="#Canonical_Unicode_Locale_Identifiers">Canonical Unicode Locale Identifiers</a></strong> and <strong>Section 3.3.1 <a href= 9319 "#BCP_47_Language_Tag_Conversion" >BCP 47 Language Tag 9320 Conversion</a></strong></li> 9321 <li>Cleans up ambiguities in the previous specification of canonicalization. (This was done in concert with fixes to the alias data to work better with the specification.)</li> 9322 </ul> 9323 </li> 9324 </ul> 9325 </li> 9326 <li><strong>Part 2: <a href="tr35-general.html#Contents">General</a> (display names &transforms, etc.)</strong> 9327 <ul> 9328 <li><strong>Section 6 <a href="tr35-general.html#Unit_Elements">Unit Elements</a></strong> 9329 <ul> 9330 <li>Added new element compoundUnitPattern1</li> 9331 <li>Added case attribute to compoundUnitPattern</li> 9332 <li>Provided full description of compound unit components</li> 9333 </ul> 9334 </li> 9335 9336 <li><strong>Section 14.2 <a href="tr35-general.html#Character_Labels">Annotations Character Labels</a></strong> 9337 <ul> 9338 <li>Added new characterLabelPattern type attribute values subscript and superscript.</li> 9339 </ul> 9340 </li> 9341 9342 <li><strong>Section 16 <a href="tr35-general.html#Grammatical_Derivations">Grammatical Derivations</a></strong> — new</li> 9343 </ul> 9344 </li> 9345 <li><strong>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a> (number & currency formatting)</strong> 9346 <ul> 9347 <li><strong>Section 2.3 <a href="tr35-numbers.html#Number_Symbols">Number Symbols</a>:</strong> 9348 added approximatelySign.</li> 9349 <li><strong>Section 2.6 <a href="tr35-numbers.html#Minimal_Pairs">Minimal Pairs</a>:</strong> added case and 9350 gender minimal pairs. Removed the alt/draft ATTLIST since those are documented elsewhere and just obfuscate 9351 the text.</li> 9352 <li><strong>Section 5 <a href="tr35-numbers.html#Language_Plural_Rules">Language Plural Rules</a>:</strong> 9353 added the 'e' operand for use in certain compact number formatting.</li> 9354 </ul> 9355 </li> 9356 <li><strong>Part 6: <a href="tr35-info.html#Contents">Supplemental</a> (supplemental data)</strong> 9357 <ul> 9358 <li><strong>Section 14 <a href="tr35-info.html#Unit_Preferences">Unit Preferences</a></strong>: defined the 9359 userPreferences skeleton more precisely.</li> 9360 </ul> 9361 </li> 9362 <li><strong>Throughout: </strong>Where possible, use “legacy” (for language tag or unit) instead of “grandfathered”.</li> 9363 </ul> 9364 9365 9366 <p> </p> 9367 9368 <p>Modifications in previous versions are listed in those 9369 respective versions. Click on <strong>Previous Version</strong> 9370 in the header until you get to the desired version.</p> 9371 <hr> 9372 <p class="copyright">Copyright © 2001–2020 Unicode, Inc. All 9373 Rights Reserved. The Unicode Consortium makes no expressed or 9374 implied warranty of any kind, and assumes no liability for 9375 errors or omissions. No liability is assumed for incidental and 9376 consequential damages in connection with or arising out of the 9377 use of the information or programs contained or accompanying 9378 this technical report. The Unicode <a href= 9379 "https://unicode.org/copyright.html">Terms of Use</a> apply.</p> 9380 <p class="copyright">Unicode and the Unicode logo are 9381 trademarks of Unicode, Inc., and are registered in some 9382 jurisdictions.</p> 9383 </div> 9384</body> 9385</html> 9386