1 2 3 4Internet Engineering Task Force M. Davis 5Internet-Draft Google 6Intended status: Informational A. Phillips 7Expires: June 7, 2012 Lab126 8 Y. Umaoka 9 IBM 10 C. Falk 11 Infinite Automata 12 December 5, 2011 13 14 15 BCP 47 Extension T - Transformed Content 16 draft-davis-t-langtag-ext-07 17 18Abstract 19 20 This document specifies an Extension to BCP 47 which provides subtags 21 for specifying the source language or script of transformed content, 22 including content that has been transliterated, transcribed, or 23 translated, or in some other way influenced by the source. It also 24 provides for additional information used for identification. 25 26Status of this Memo 27 28 This Internet-Draft is submitted in full conformance with the 29 provisions of BCP 78 and BCP 79. 30 31 Internet-Drafts are working documents of the Internet Engineering 32 Task Force (IETF). Note that other groups may also distribute 33 working documents as Internet-Drafts. The list of current Internet- 34 Drafts is at http://datatracker.ietf.org/drafts/current/. 35 36 Internet-Drafts are draft documents valid for a maximum of six months 37 and may be updated, replaced, or obsoleted by other documents at any 38 time. It is inappropriate to use Internet-Drafts as reference 39 material or to cite them other than as "work in progress." 40 41 This Internet-Draft will expire on June 7, 2012. 42 43Copyright Notice 44 45 Copyright (c) 2011 IETF Trust and the persons identified as the 46 document authors. All rights reserved. 47 48 This document is subject to BCP 78 and the IETF Trust's Legal 49 Provisions Relating to IETF Documents 50 (http://trustee.ietf.org/license-info) in effect on the date of 51 publication of this document. Please review these documents 52 53 54 55Davis, et al. Expires June 7, 2012 [Page 1] 56 57Internet-Draft BCP 47 Extension T December 2011 58 59 60 carefully, as they describe your rights and restrictions with respect 61 to this document. 62 63 64Table of Contents 65 66 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 67 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 4 68 2. BCP47 Required Information . . . . . . . . . . . . . . . . . . 4 69 2.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . . 4 70 2.2. Structure . . . . . . . . . . . . . . . . . . . . . . . . 6 71 2.3. Canonicalization . . . . . . . . . . . . . . . . . . . . . 7 72 2.4. BCP47 Registration Form . . . . . . . . . . . . . . . . . 8 73 2.5. Field Definitions . . . . . . . . . . . . . . . . . . . . 8 74 2.6. Registration of Field Subtags . . . . . . . . . . . . . . 10 75 2.7. Registration of Additional Fields . . . . . . . . . . . . 10 76 2.8. Committee Responses to Registration Proposals . . . . . . 11 77 2.9. Machine-Readable Data . . . . . . . . . . . . . . . . . . 11 78 3. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 79 4. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 80 5. Security Considerations . . . . . . . . . . . . . . . . . . . 14 81 6. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 82 6.1. Normative References . . . . . . . . . . . . . . . . . . . 14 83 6.2. Informative References . . . . . . . . . . . . . . . . . . 14 84 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111Davis, et al. Expires June 7, 2012 [Page 2] 112 113Internet-Draft BCP 47 Extension T December 2011 114 115 1161. Introduction 117 118 [BCP47] permits the definition and registration of language tag 119 extensions "that contain a language component and are compatible with 120 applications that understand language tags". This document defines 121 an extension for specifying the source of content that has been 122 transformed, including text that has been transliterated, 123 transcribed, or translated, or in some other way influenced by the 124 source. It may be used in queries to request content that has been 125 transformed. The "singleton" identifier for this extension is 't'. 126 127 Language tags, as defined by [BCP47], are useful for identifying the 128 language of content. There are mechanisms for specifying variant 129 subtags for special purposes. However, these variants are 130 insufficient for specifying content that has undergone 131 transformations, including content that has been transliterated, 132 transcribed, or translated. The correct interpretation of the 133 content may depend upon knowledge of the conventions used for the 134 transformation. 135 136 Suppose that Italian or Russian cities on a map are transcribed for 137 Japanese users. Each name needs to be transliterated into katakana 138 using rules appropriate for the specific source and target language. 139 When tagging such data, it is important to be able to indicate not 140 only the resulting content language ("ja" in this case), but also the 141 source language. 142 143 Transforms such as transliterations may vary depending not only on 144 the basis of the source and target script, but also on the source and 145 target language. Thus the Russian <U+041F U+0443 U+0442 U+0438 146 U+043D> (which corresponds to the Cyrillic <PE, U, TE, I, EN>) 147 transliterates into "Putin" in English but "Poutine" in French. The 148 identifier could be used to indicate a desired mechanical 149 transformation in an API, or could be used to tag data that has been 150 converted (mechanically or by hand) according to a transliteration 151 method. 152 153 In addition, many different conventions have arisen for how to 154 transform text, even between the same languages and scripts. For 155 example, "Gaddafi" is commonly transliterated from Arabic to English 156 as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y). Some examples of 157 standardized conventions used for transcribing or transliterating 158 text include: 159 160 a. United Nations Group of Experts on Geographical Names (UNGEGN) 161 162 b. US Library of Congress (LOC) 163 164 165 166 167Davis, et al. Expires June 7, 2012 [Page 3] 168 169Internet-Draft BCP 47 Extension T December 2011 170 171 172 c. US Board on Geographic Names (BGN) 173 174 d. Korean Ministry of Culture, Sports and Tourism (MCST) 175 176 e. International Organization for Standardization (ISO) 177 178 The usage of this extension is not limited to formal transformations, 179 and may include other instances where the content is in some other 180 way influenced by the source. For example, this extension could be 181 used to designate a request for a speech recognizer that is tailored 182 specifically for 2nd-language speakers who are 1st-language speakers 183 of a particular language (e.g. a recognizer for "English spoken with 184 a Chinese accent"). 185 1861.1. Requirements Language 187 188 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 189 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this 190 document are to be interpreted as described in RFC 2119. 191 192 1932. BCP47 Required Information 194 1952.1. Overview 196 197 Identification of transformed content can be done using the 't' 198 extension defined in this document. This extension is formed by the 199 't' singleton followed by a sequence of subtags that would form a 200 language tag as defined by [BCP47]. This allows for the source 201 language or script to be specified to the degree of precision 202 required. There are restrictions on the sequence of subtags. They 203 MUST form a regular, valid, canonical language tag, and MUST neither 204 include extensions nor private use sequences introduced by the 205 singleton 'x'. Where only the script is relevant (such as 206 identifying a script-script transliteration) then 'und' is used for 207 the primary language subtag. 208 209 For example: 210 211 +---------------------+---------------------------------------------+ 212 | Language Tag | Description | 213 +---------------------+---------------------------------------------+ 214 | ja-t-it | The content is Japanese, transformed from | 215 | | Italian. | 216 | ja-Kana-t-it | The content is Japanese Katakana, | 217 | | transformed from Italian. | 218 219 220 221 222 223Davis, et al. Expires June 7, 2012 [Page 4] 224 225Internet-Draft BCP 47 Extension T December 2011 226 227 228 | und-Latn-t-und-cyrl | The content is in the Latin script, | 229 | | transformed from the Cyrillic script. | 230 +---------------------+---------------------------------------------+ 231 232 Note that the sequence of subtags governed by 't' cannot contain a 233 singleton (a single-character subtag), because that would start a new 234 extension. For example, the tag "ja-t-i-ami" does not indicate that 235 the source is in "i-ami", because "i-ami" is not a regular language 236 tag in [BCP47]. That tag would express an empty 't' extension 237 followed by an 'i' extension. 238 239 The 't' extension is not intended for use in structured data that 240 already provides separate source and target language identifiers. 241 For example, this is the case in localization interchange formats 242 such as XLIFF. In such cases, it would be inappropriate to use "ja- 243 t-it" for the target language tag because the source language tag 244 "it" would already be present in the data. Instead one would use the 245 language tag "ja". 246 247 As noted earlier, it is sometimes necessary to indicate additional 248 information about a transformation. This additional information is 249 optionally supplied after the source in a series of one or more 250 fields, where each field consists of a field separator subtag 251 followed by one or more non-separator subtags. Each field separator 252 subtag consists of a single letter followed by a single digit. 253 254 A transformation mechanism is an optional field that indicates the 255 specification used for the transformation, such as "UNGEGN" for the 256 the United Nations Group of Experts on Geographical Names 257 transliterations and transcriptions. It uses the 'm0' field 258 separator followed by certain subtags. 259 260 For example: 261 262 +------------------------------------+------------------------------+ 263 | Language Tag | Description | 264 +------------------------------------+------------------------------+ 265 | und-Cyrl-t-und-latn-m0-ungegn-2007 | the content is in Cyrillic, | 266 | | transformed from Latn, | 267 | | according to a UNGEGN | 268 | | specification dated 2007. | 269 +------------------------------------+------------------------------+ 270 271 The field separator subtags such as 'm0' were chosen because they are 272 short, visually distinctive, and cannot occur in a language subtag 273 (outside of an extension and after 'x'), thus eliminating the 274 potential for collision or confusion with the source language tag. 275 276 277 278 279Davis, et al. Expires June 7, 2012 [Page 5] 280 281Internet-Draft BCP 47 Extension T December 2011 282 283 284 The field subtags are defined by Section 3 [1] of Unicode Technical 285 Standard #35: Unicode Locale Data Markup Language [UTS35] (LDML), the 286 main specification for the Unicode Common Locale Data Repository 287 (CLDR) project. As required by BCP 47, subtags follow the language 288 tag ABNF and other rules for the formation of language tags and 289 subtags, are restricted to the ASCII letters and digits, are not case 290 sensitive, and do not exceed eight characters in length. 291 292 EDITORIAL NOTE: This new facility has been accepted by the Unicode 293 CLDR committee for incorporation into the next versions of CLDR and 294 LDML, parallel with the structure of the 'u' extension [RFC6067], for 295 which it is already the maintaining authority. The data and 296 specification will be available by the time this internet draft has 297 been approved. 298 299 The LDML specification is available over the Internet and at no cost, 300 and is available via a royalty-free license at 301 http://unicode.org/copyright.html. LDML is versioned, and each 302 version of LDML is numbered, dated, and stable. Extension subtags, 303 once defined by LDML, are never retracted or substantially changed in 304 meaning. 305 306 The maintaining authority for the 't' extension is the Unicode 307 Consortium: 308 309 +---------------+---------------------------------------------------+ 310 | Item | Value | 311 +---------------+---------------------------------------------------+ 312 | Name | Unicode Consortium | 313 | Contact Email | cldr-contact@unicode.org | 314 | Discussion | cldr-users@unicode.org | 315 | List Email | | 316 | URL Location | cldr.unicode.org | 317 | Specification | Unicode Technical Standard #35 Unicode Locale | 318 | | Data Markup Language (LDML), | 319 | | http://unicode.org/reports/tr35/ | 320 | Section | Section 3 Unicode Language and Locale Identifiers | 321 +---------------+---------------------------------------------------+ 322 3232.2. Structure 324 325 The subtags in the 't' extension are of the following form: 326 327 328 329 330 331 332 333 334 335Davis, et al. Expires June 7, 2012 [Page 6] 336 337Internet-Draft BCP 47 Extension T December 2011 338 339 340 t-ext= "t" ; Extension 341 (("-" lang *("-" field)) ; Source + optional field(s) 342 / 1*("-" field)) ; Field(s) only (no source) 343 344 lang= language ; BCP47, with restrictions 345 ["-" script] 346 ["-" region] 347 *("-" variant) 348 349 field= sep 1*("-" 3*8alphanum) ; With restrictions 350 351 sep= ALPHA DIGIT ; Subtag separators 352 alphanum= ALPHA / DIGIT 353 354 where <language>, <script>, <region>, and <variant> rules are 355 specified in [BCP47], <ALPHA> and <DIGIT> rules - in [RFC5234]. 356 357 Description and restrictions: 358 359 a. The 't' extension MUST have at least one subtag. 360 361 b. The 't' extension normally starts with a source language tag, 362 which MUST be a regular, canonical language tag as specified by 363 [BCP47]. Tags described by the 'irregular' production in BCP 47 364 MUST NOT be used to form the language tag. The source language 365 tag MAY be omitted: some field values do not require it. 366 367 c. There is optionally a sequence of fields, where each field has a 368 separator followed by a sequence of one or more subtags. Two 369 identical field separators MUST NOT be present in the language 370 tag. 371 372 d. The order of the fields in a 't' extension is not significant. 373 The order of subtags within a field is significant. (See 374 Section 2.3 Canonicalization.) 375 376 e. The 't' subtag fields are defined by Section 3 [1] of Unicode 377 Technical Standard #35: Unicode Locale Data Markup Language 378 [UTS35]. 379 3802.3. Canonicalization 381 382 As required by [BCP47], the use of uppercase or lowercase letters is 383 not significant in the subtags used in this extension. The canonical 384 form for all subtags in the extension is lowercase, with the fields 385 ordered by the separators, alphabetically. The order of subtags 386 within a field is significant, and MUST NOT be changed in the process 387 of canonicalizing. 388 389 390 391Davis, et al. Expires June 7, 2012 [Page 7] 392 393Internet-Draft BCP 47 Extension T December 2011 394 395 3962.4. BCP47 Registration Form 397 398 Per RFC 5646, Section 3.7 [BCP47]: 399 400 %% 401 Identifier: t 402 Description: Specifying Transformed Content 403 Comments: Subtags for the identification of content that has been 404 transformed, including but not limited to: 405 transliteration, transcription, and translation. 406 Added: 2010-mm-dd 407 RFC: [TBD] 408 Authority: Unicode Consortium 409 Contact_Email: cldr-contact@unicode.org 410 Mailing_List: cldr-users@unicode.org 411 URL: http://www.unicode.org/Public/cldr/latest/core.zip 412 %% 413 4142.5. Field Definitions 415 416 Assignment of 't' field subtags is determined by the Unicode CLDR 417 Technical Committee, in accordance with the policies and procedures 418 in http://www.unicode.org/consortium/tc-procedures.html, and subject 419 to the Unicode Consortium Policies on 420 http://www.unicode.org/policies/policies.html. 421 422 Assignments that can be made by successive versions of LDML [UTS35] 423 by the Unicode Consortium without requiring a new RFC include: 424 425 o The allocation of new field separator subtags for use after the 426 't' extension. 427 428 o The allocation of subtags valid after a field separator subtag. 429 430 o The addition of subtag aliases and descriptions. 431 432 o The modification of subtag descriptions. 433 434 Changes to the syntax or meaning of the 't' extension would require a 435 new RFC that obsoletes this document; such an RFC would break 436 stability, and would thus be contrary to the policies of the Unicode 437 Consortium. 438 439 At the time this document was published, one field was specified in 440 [UTS35]: the transform mechanism. That field is summarized here: 441 442 a. The transform mechanism consists of a sequence of subtags 443 starting with the 'm0' separator followed by one or more 444 445 446 447Davis, et al. Expires June 7, 2012 [Page 8] 448 449Internet-Draft BCP 47 Extension T December 2011 450 451 452 mechanism subtags. Each mechanism subtag has a length of 3 to 8 453 alphanumeric characters. The sequence as a whole provides an 454 identification of the specification for the transform, such as 455 the mechanism subtag 'ungegn' in "und-Cyrl-t-und-latn-m0-ungegn". 456 In many cases, only one mechanism subtag is necessary, but 457 multiple subtags MAY be defined in [UTS35] where necessary. 458 459 b. Any purely numeric subtag is a representation of a date in the 460 Gregorian calendar. It MAY occur in any mechanism field, but it 461 SHOULD only be used where necessary. If it does occur: 462 463 * it MUST occur as the final subtag in the field 464 465 * it MUST NOT be the only subtag in the field 466 467 * it MUST only consist of a sequence of digits of the form YYYY, 468 YYYYMM, or YYYYMMDD 469 470 * it SHOULD be as short as possible 471 472 Note: The format is related to that of [RFC3339], but is not the 473 same. The RFC 3339 full-date won't work because it uses hyphens. 474 The offset ("Z") is not used because the date is a publication 475 date (aka 'floating date'). For more information, see Section 476 3.3, Floating Time in [W3C-TimeZones]. 477 478 c. Examples: 479 480 * 20110623 represents June 23rd, 2011. 481 482 * There are 3 dated versions of the UNGEGN transliteration 483 specification for Hebrew to Latin. They can be represented by 484 the following language tags: 485 486 + und-Hebr-t-und-Latn-m0-ungegn-1972 487 488 + und-Hebr-t-und-Latn-m0-ungegn-1977 489 490 + und-Hebr-t-und-Latn-m0-ungegn-2007 491 492 * Suppose that the BGN transliteration specification for 493 Cyrillic to Latin had three versions, dated June 11th, 1999; 494 Dec 30th, 1999; and May 1st, 2011. In that case, the 495 corresponding first two DATE subtags would require months to 496 be distinctive (199906 and 199912), but the last subtag would 497 only require the year (2011). 498 499 500 501 502 503Davis, et al. Expires June 7, 2012 [Page 9] 504 505Internet-Draft BCP 47 Extension T December 2011 506 507 508 d. Some mechanisms may use a versioning system that is not 509 distinguished by date, or not by date alone. In the latter case, 510 the version will be of a form specified by [UTS35] for that 511 mechanism. For example, if the mechanism XXX uses versions of 512 the form v21a, then a tag could look like "ja-t-it-m0-xxx-v21a". 513 If there are multiple subversions distinguished by date, then a 514 tag could look like "ja-t-it-m0-xxx-v21a-2007". 515 516 A language tag with the 't' extension MAY be used to request a 517 specific transform of content. In such a case, the recipient SHOULD 518 return content that corresponds as closely as feasible to the 519 requested transform, including the specification of the mechanism. 520 For example, if the request is ja-t-it-m0-xxx-v21a-2007, and the 521 recipient has content corresponding to both ja-t-it-m0-xxx-v21a and 522 ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred. 523 As is the case for language matching as discussed in [BCP47], 524 different implementations MAY have different measures of "closeness". 525 5262.6. Registration of Field Subtags 527 528 Registration of transform mechanisms is requested by filing a ticket 529 at cldr.unicode.org [2]. The proposal in the ticket MUST contain the 530 following information: 531 532 +-------------+-----------------------------------------------------+ 533 | Item | Description | 534 +-------------+-----------------------------------------------------+ 535 | Subtag | The proposed mechanism subtag (or subtag sequence). | 536 | Description | A description of the proposed mechanism; that | 537 | | description MUST be sufficient to distinguish it | 538 | | from other mechanisms in use. | 539 | Version | If versioning for the mechanism is not done | 540 | | according to date, then a description of the | 541 | | versioning conventions used for the mechanism. | 542 +-------------+-----------------------------------------------------+ 543 544 Proposals for clarifications of descriptions or additional aliases 545 may also be requested by filing a ticket. 546 547 The committee MAY define a template for submissions that requests 548 more information, if it is found that such information would be 549 useful in evaluating proposals. 550 5512.7. Registration of Additional Fields 552 553 In the event that it proves necessary to add an additional field 554 (such as 'm2'), it can be requested by filing a ticket at 555 cldr.unicode.org [2]. The proposal in the ticket MUST contain a full 556 557 558 559Davis, et al. Expires June 7, 2012 [Page 10] 560 561Internet-Draft BCP 47 Extension T December 2011 562 563 564 description of the proposed field semantics and subtag syntax, and 565 MUST be conform to the ABNF syntax for "field" presented in 566 Section 2.2. 567 5682.8. Committee Responses to Registration Proposals 569 570 The committee MUST post each proposal publicly within 2 weeks after 571 reception, to allow for comments. The committee must respond 572 publicly to each proposal within 4 weeks after reception. 573 574 The response MAY: 575 576 o request more information or clarification 577 578 o accept the proposal, optionally with modifications to the subtag 579 or description 580 581 o reject the proposal, because of significant objections raised on 582 the mailing list or due to problems with constraints in this 583 document or in [UTS35] 584 585 Accepted tickets result in a new entry in the machine-readable CLDR 586 BCP47 data, or in the case of a clarified description, modifications 587 to the description attribute value for an existing entry. 588 5892.9. Machine-Readable Data 590 591 EDITORIAL NOTE: The following parallels the structure used for the 592 'u' extension [RFC6067], for which the Unicode Consortium is the 593 maintaining authority. The data and specification will be available 594 by the time this internet draft has been approved. The description 595 field is in the process of being added to CLDR. 596 597 Beginning with CLDR version 1.7.2, machine-readable files are 598 available listing the data defined for BCP47 extensions for each 599 successive version of [UTS35]. These releases are listed on 600 http://cldr.unicode.org/index/downloads. Each release has an 601 associated data directory of the form 602 "http://unicode.org/Public/cldr/<version>", where "<version>" is 603 replaced by the release number. For example, for version 1.7.2, the 604 "core.zip" file is located at 605 http://unicode.org/Public/cldr/1.7.2/core.zip [3]. The most recent 606 version is always identified by the version "latest" and can be 607 accessed by the URL in Section 2.4. 608 609 Inside the "core.zip" file, the directory "common/bcp47" contains the 610 data files listing the valid attributes, keys, and types for each 611 successive version of [UTS35]. Each data file list the keys and 612 613 614 615Davis, et al. Expires June 7, 2012 [Page 11] 616 617Internet-Draft BCP 47 Extension T December 2011 618 619 620 types relevant to that topic. For example, mechanism.xml contains 621 the subtags (types) for the 't' mechanisms. 622 623 The XML structure lists the keys, such as <key extension="t" 624 name="m0" alias="collation" description="Transliteration extension 625 mechanism">, with subelements for the types, such as <type 626 name="ungegn" description="United Nations Group of Experts on 627 Geographical Names"/>. The currently defined attributes for the 628 mechanisms include: 629 630 +-------------+-------------------------------+---------------------+ 631 | Attribute | Description | Examples | 632 +-------------+-------------------------------+---------------------+ 633 | name | The name of the mechanism, | UNGEGN, ALALC | 634 | | limited to 3-8 characters (or | | 635 | | sequences of them). | | 636 | description | A description of the name, | United Nations | 637 | | with all and only that | Group of Experts on | 638 | | information necessary to | Geographical Names; | 639 | | distinguish one name from | American Library | 640 | | others with which it might be | Association-Library | 641 | | confused. Descriptions are | of Congress | 642 | | not intended to provide | | 643 | | general background | | 644 | | information. | | 645 | since | Indicates the first version | 1.9, 2.0.1 | 646 | | of CLDR where the name | | 647 | | appears. (Required for new | | 648 | | items.) | | 649 | alias | Alternative name of the key | | 650 | | or type, not limited in | | 651 | | number of characters. | | 652 | | Aliases are intended for | | 653 | | backwards compatibility, not | | 654 | | to provide all possible | | 655 | | alternate names or | | 656 | | designations. (Optional) | | 657 +-------------+-------------------------------+---------------------+ 658 659 The file for the transform extension is "transform.xml". The initial 660 version of that file contains the following information. 661 662 663 664 665 666 667 668 669 670 671Davis, et al. Expires June 7, 2012 [Page 12] 672 673Internet-Draft BCP 47 Extension T December 2011 674 675 676 <key extension="t" name="m0" description= 677 "Transliteration extension mechanism"/> 678 <type name="ungegn" description= 679 "United Nations Group of Experts on Geographical Names"/> 680 <type name="alaloc" description= 681 "American Library Association-Library of Congress"/> 682 <type name="bgn" description= 683 "US Board on Geographic Names"/> 684 <type name="mcst" description= 685 "Korean Ministry of Culture, Sports and Tourism"/> 686 <type name="iso" description= 687 "International Organization for Standardization"/> 688 <type name="din" description= 689 "Deutsches Institut fuer Normung"/> 690 <type name="gost" description= 691 "Euro-Asian Council for Standardization, Metrology 692 and Certification"/> 693 </key> 694 695 To get the version information in XML when working with the data 696 files, the XML parser must be validating. When the 'core.zip' file 697 is unzipped, the 'dtd' directory will be at the same level as the 698 'bcp47' directory; that is required for correct validation. For each 699 release after CLDR 1.8, types introduced in that release are also 700 marked in the data files by the XML attribute "since", such as in the 701 following example: 702 <type name="adp" since="1.9"/> 703 704 The data is also currently maintained in a source code repository, 705 with each release tagged, for viewing directly without unzipping. 706 For example, see: 707 708 o http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/ 709 710 o http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/ 711 712 For more information, see 713 http://cldr.unicode.org/index/bcp47-extension. 714 715 7163. Acknowledgements 717 718 Thanks to John Emmons and the rest of the Unicode CLDR Technical 719 Committee for their work in developing the BCP 47 subtags for LDML. 720 721 722 723 724 725 726 727Davis, et al. Expires June 7, 2012 [Page 13] 728 729Internet-Draft BCP 47 Extension T December 2011 730 731 7324. IANA Considerations 733 734 This document will require IANA to insert the record of Section 2.4 735 into the Language Extensions Registry, according to Section 3.7, 736 Extensions and the Extensions Registry of "Tags for Identifying 737 Languages" in [BCP47]. Per Section 5.2 of [BCP47], there might be 738 occasional (rare) requests by the Unicode Consortium (the "Authority" 739 listed in the record) for maintenance of this record. Changes that 740 can be submitted to IANA without the publication of a new RFC are 741 limited to modification of the Comments, Contact_Email, Mailing_List, 742 and URL fields. Any such requested changes MUST use the domain 743 'unicode.org' in any new addresses or URIs, MUST explicitly cite this 744 document (so that IANA can reference these requirements), and MUST 745 originate from the 'unicode.org' domain. The domain or authority can 746 only be changed via a new RFC. 747 748 This document does not require IANA to create or maintain a new 749 registry or otherwise impact IANA. 750 751 7525. Security Considerations 753 754 The security considerations for this extension are the same as those 755 for [BCP47]. See RFC 5646, Section 6, Security Considerations 756 [BCP47]. 757 758 7596. References 760 7616.1. Normative References 762 763 [BCP47] Davis, M., Ed. and A. Phillips, Ed., "Tags for the 764 Identification of Language (BCP47)", September 2009. 765 766 [RFC5234] Crocker, Ed., "Augmented BNF for Syntax Specifications: 767 ABNF", 2008. 768 769 [RFC6067] Davis, M., Ed., Phillips, A., Ed., and Y. Umaoka, Ed., 770 "BCP 47 Extension U", September 2010. 771 772 [UTS35] Davis, M., "Unicode Technical Standard #35: Locale Data 773 Markup Language (LDML)", December 2007, 774 <http://www.unicode.org/reports/tr35/>. 775 7766.2. Informative References 777 778 [RFC3339] Klyne, Ed. and Newman, Ed., "Date and Time on the 779 Internet: Timestamps", 2002. 780 781 782 783Davis, et al. Expires June 7, 2012 [Page 14] 784 785Internet-Draft BCP 47 Extension T December 2011 786 787 788 [W3C-TimeZones] 789 Phillips, Ed., "W3C Working Group Note: Working with Time 790 Zones", July 2011, 791 <http://www.w3.org/TR/2011/NOTE-timezone-20110705/>. 792 793 [ldml-registry] 794 "Registry for Common Locale Data Repository tag elements", 795 September 2009. 796 797URIs 798 799 [1] <http://unicode.org/reports/tr35/> 800 801 [2] <http://cldr.unicode.org/> 802 803 [3] <http://unicode.org/Public/cldr/1.7.2/> 804 805 806Authors' Addresses 807 808 Mark Davis 809 Google 810 811 Email: mark@macchiato.com 812 813 814 Addison Phillips 815 Lab126 816 817 Email: addison@lab126.com 818 819 820 Yoshito Umaoka 821 IBM 822 823 Email: yoshito_umaoka@us.ibm.com 824 825 826 Courtney Falk 827 Infinite Automata 828 829 Email: court@infiauto.com 830 831 832 833 834 835 836 837 838 839Davis, et al. Expires June 7, 2012 [Page 15] 840 841