1<?xml version="1.0" encoding="US-ASCII"?> 2<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [ 3<!ENTITY RFC4646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4646.xml"> 4<!ENTITY rfc5646 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5646.xml"> 5]> 6<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?> 7<?rfc strict="yes" ?> 8<?rfc toc="yes"?> 9<?rfc tocdepth="4"?> 10<?rfc symrefs="yes"?> 11<?rfc sortrefs="yes" ?> 12<?rfc compact="yes" ?> 13<?rfc subcompact="no" ?> 14<rfc category="info" docName="draft-davis-t-langtag-ext-08" ipr="trust200902" 15 submissionType="independent" 16> 17 <front> 18 19 20 <title abbrev="BCP 47 Extension T">BCP 47 Extension T - Transformed Content</title> 21 22 <author fullname="Mark Davis" initials="M.E." surname="Davis"> 23 <organization>Google</organization> 24 <address> 25 <email>mark@macchiato.com</email> 26 </address> 27 </author> 28 29 <author fullname="Addison Phillips" initials="A" surname="Phillips"> 30 <organization>Lab126</organization> 31 <address> 32 <email>addison@lab126.com</email> 33 </address> 34 </author> 35 36 <author initials="Y" surname="Umaoka" fullname="Yoshito Umaoka"> 37 <organization abbrev="IBM">IBM</organization> 38 <address> 39 <email>yoshito_umaoka@us.ibm.com</email> 40 </address> 41 </author> 42 43 <author initials="C" surname="Falk" fullname="Courtney Falk"> 44 <organization abbrev="Infinite Automata">Infinite Automata</organization> 45 <address> 46 <email>court@infiauto.com</email> 47 </address> 48 </author> 49 50 <date month="December" year="2011" day="6" /> 51 52 53 54 <!-- Meta-data Declarations --> 55 56 <area>General</area> 57 58 <workgroup>Internet Engineering Task Force</workgroup> 59 60 61 62 <keyword>locale</keyword> 63 <keyword>bcp 47</keyword> 64 65 <!-- Keywords will be incorporated into HTML output files in a meta tag 66 but they have no effect on text or nroff output. If you submit your draft 67 to the RFC Editor, the keywords will be used for the search engine. --> 68 69 <abstract> 70 <t> 71 This document specifies an Extension to BCP 47 72 which provides 73 subtags 74 for specifying the source language or script of transformed 75 content, 76 including content 77 that 78 has been transliterated, transcribed, or 79 translated, or in some other way influenced by the source. It also provides for additional information used for 80 identification. 81 </t> 82 </abstract> 83 </front> 84 85 <middle> 86 <section title="Introduction"> 87 <t> 88 <xref target="BCP47"></xref> 89 permits the definition and registration of language tag extensions 90 "that contain a language component and are compatible with 91 applications that 92 understand language tags". This document defines an 93 extension for 94 specifying the source of content that has been transformed, 95 including text that has been transliterated, transcribed, or 96 translated, or in some other way influenced by the source. 97 It may be used in queries to request content that has been 98 transformed. 99 The "singleton" identifier for this extension is 't'. 100 </t> 101 <t> 102 Language tags, as defined by 103 <xref target="BCP47"></xref>, are useful for identifying the language of content. 104 There are 105 mechanisms for specifying variant subtags for special purposes. 106 However, these variants are insufficient for specifying content that has 107 undergone 108 transformations, 109 including content that has been 110 transliterated, 111 transcribed, or 112 translated. 113 The correct interpretation of the content may depend upon knowledge of the conventions used for the transformation. 114 </t> 115 <t> 116 Suppose that Italian or Russian 117 cities on a map are transcribed for Japanese users. Each name needs to be 118 transliterated into katakana using rules appropriate for the specific 119 source and target language. When tagging such data, it is important 120 to be able to indicate not only the resulting content language ("ja" 121 in this case), but also the source language.</t> 122 <t>Transforms such as transliterations may vary depending not only on the 123 basis of the source and target script, but also on the source and target language. 124 Thus the 125 Russian <U+041F U+0443 U+0442 U+0438 U+043D> (which corresponds to 126 the Cyrillic <PE, U, TE, I, EN>) transliterates into "Putin" in 127 English but "Poutine" in French. The identifier could be used to indicate 128 a desired mechanical transformation in an API, or could be used to tag 129 data that has been converted (mechanically or by hand) according to a 130 transliteration method.</t> 131 <t> 132 In addition, many different conventions have arisen for how to transform text, even between the same languages and scripts. 133 For example, "Gaddafi" is commonly transliterated from Arabic to English as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y). 134 Some examples of standardized conventions used for transcribing or transliterating text include: 135 <list style="letters"> 136 <t>United Nations Group of Experts on Geographical Names (UNGEGN)</t> 137 <t>US Library of Congress (LOC)</t> 138 <t>US Board on Geographic Names (BGN)</t> 139 <t>Korean Ministry of Culture, Sports and Tourism (MCST)</t> 140 <t>International Organization for Standardization (ISO)</t> 141 </list> 142 </t> 143 <t>The usage of this extension is not limited to formal transformations, 144 and may include other instances where the content is in some other way influenced by the source. 145 For example, this extension could be used to designate a request for a speech recognizer 146 that is tailored specifically for 2nd-language speakers who are 147 1st-language speakers of a particular language (e.g. a recognizer for "English spoken with a Chinese accent").</t> 148 <section title="Requirements Language"> 149 <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 150 NOT", 151 "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" 152 in this 153 document are to be interpreted as described in RFC 2119.</t> 154 </section> 155 </section> 156 157 158 159 <?rfc needLines="8" ?> 160 161 <section title="BCP47 Required Information"> 162 <section title="Overview"> 163 <t> 164 Identification of transformed content can be done using the 't' extension 165 defined in this document. 166 This extension is formed by the 't' 167 singleton followed by a sequence of subtags that would form a 168 language tag as defined by 169 <xref target="BCP47"></xref>. 170 This allows for the source language or script to be specified to 171 the degree of precision required. 172 There are restrictions on the 173 sequence of subtags. 174 They MUST form a regular, valid, canonical 175 language 176 tag, and MUST neither include extensions nor private use 177 sequences introduced by the 178 singleton 179 'x'. 180 Where only the script is 181 relevant (such as identifying 182 a 183 script-script 184 transliteration) then 185 'und' is used for the primary language subtag. 186 </t> 187 <t>For example:</t> 188 <texttable> 189 <ttcol>Language Tag</ttcol> 190 191 <ttcol>Description</ttcol> 192 193 <c>ja-t-it</c> 194 195 <c>The content is Japanese, transformed from Italian.</c> 196 197 <c>ja-Kana-t-it</c> 198 199 <c>The content is Japanese Katakana, transformed from Italian.</c> 200 201 <c>und-Latn-t-und-cyrl</c> 202 203 <c>The content is in the Latin script, transformed from the Cyrillic 204 script.</c> 205 206 </texttable> 207 <t> 208 Note that the sequence of subtags governed by 't' cannot contain a 209 singleton (a single-character subtag), because that would start a 210 new extension. 211 For example, the tag "ja-t-i-ami" 212 does not indicate 213 that the source is in "i-ami", because "i-ami" is not a 214 regular 215 language tag in 216 <xref target="BCP47"></xref>. That tag would express an empty 't' extension followed by an 'i' 217 extension. 218 </t> 219 <t>The 't' extension is not intended for use in structured data that already provides 220 separate source and target language identifiers. 221 For example, this is the case in localization interchange formats such as XLIFF. 222 In such cases, it would be inappropriate to use "ja-t-it" for the target language tag because the source language tag 223 "it" would already be present in the data. Instead one would use the language tag "ja". 224 </t> 225 <t>As noted earlier, it is sometimes necessary to indicate additional 226 information about a transformation. 227 This additional information is optionally supplied after the source in a series of one or more fields, 228 where each field consists of a field separator subtag followed by one or more non-separator subtags. 229 Each field separator subtag consists of a single letter followed by a single digit. 230 </t> 231 <t>A transformation mechanism is an optional field that indicates 232 the 233 specification used for the transformation, such as "UNGEGN" for 234 the 235 the United Nations Group of Experts on 236 Geographical 237 Names 238 transliterations and transcriptions. It uses the 'm0' field separator followed by certain subtags. 239 </t> 240 <t>For example:</t> 241 <texttable> 242 <ttcol>Language Tag</ttcol> 243 244 <ttcol>Description</ttcol> 245 246 <c>und-Cyrl-t-und-latn-m0-ungegn-2007</c> 247 248 <c>the content is in Cyrillic, transformed from Latn, according 249 to a 250 UNGEGN specification dated 2007.</c> 251 252 </texttable> 253 <t>The field separator subtags such as 'm0' were chosen because they are 254 short, visually distinctive, 255 and cannot occur in a language subtag 256 (outside of an extension and 257 after 'x'), 258 thus eliminating the 259 potential for collision or confusion with the 260 source language tag.</t> 261 <t> 262 The field subtags are defined by 263 <eref target="http://unicode.org/reports/tr35/">Section 3</eref> 264 of 265 <xref target="UTS35">Unicode 266 Technical Standard #35: Unicode Locale Data 267 Markup Language</xref> (LDML), the main specification for the Unicode 268 Common Locale Data Repository (CLDR) project. 269 As required by BCP 47, subtags follow the language tag ABNF and 270 other rules for the formation of language tags and subtags, are 271 restricted to the ASCII letters and digits, are not case sensitive, 272 and do not exceed eight characters in length. 273 </t> 274 <t> 275 EDITORIAL NOTE: This new facility has been accepted by the Unicode 276 CLDR committee for incorporation into the next versions of CLDR and LDML, parallel 277 with the structure of the 'u' extension 278 <xref target="RFC6067"></xref>, 279 for which it is already the maintaining authority. 280 The data and 281 specification will be available by the time this internet 282 draft has 283 been 284 approved. 285 </t> 286 <t>The LDML specification is available over the Internet and at no cost, and 287 is 288 available via a royalty-free license at 289 http://unicode.org/copyright.html. LDML is versioned, and each 290 version of LDML is numbered, dated, and stable. Extension subtags, 291 once 292 defined by LDML, are never retracted or substantially changed in meaning. </t> 293 <t>The maintaining authority for the 't' extension is 294 the Unicode 295 Consortium:</t> 296 297 <texttable> 298 <ttcol>Item</ttcol> 299 300 <ttcol>Value</ttcol> 301 302 <c>Name</c> 303 304 <c>Unicode Consortium</c> 305 306 <c>Contact Email</c> 307 308 <c>cldr-contact@unicode.org</c> 309 310 <c>Discussion List Email</c> 311 312 <c>cldr-users@unicode.org</c> 313 314 <c>URL Location</c> 315 316 <c>cldr.unicode.org</c> 317 318 <c>Specification</c> 319 320 <c>Unicode Technical Standard #35 Unicode Locale Data Markup 321 Language (LDML), http://unicode.org/reports/tr35/</c> 322 <c>Section</c> 323 324 <c>Section 3 Unicode Language and Locale Identifiers</c> 325 </texttable> 326 </section> 327 <section title="Structure" anchor="structure"> 328 <t>The subtags in the 't' extension are of the following form:</t> 329<figure> 330<artwork type='abnf'> 331t-ext= "t" ; Extension 332 (("-" lang *("-" field)) ; Source + optional field(s) 333 / 1*("-" field)) ; Field(s) only (no source) 334 335lang= language ; BCP47, with restrictions 336 ["-" script] 337 ["-" region] 338 *("-" variant) 339 340field= sep 1*("-" 3*8alphanum) ; With restrictions 341 342sep= ALPHA DIGIT ; Subtag separators 343alphanum= ALPHA / DIGIT 344</artwork> 345</figure> 346 <t>where <language>, <script>, <region>, and <variant> rules are specified in <xref target="BCP47"></xref>, 347 <ALPHA> and <DIGIT> rules - in <xref target="RFC5234"></xref>.</t> 348 <t>Description and restrictions: 349 <list style="letters"> 350 <t>The 't' extension MUST have at least one subtag.</t> 351 <t> 352 The 't' extension normally starts with a source language tag, 353 which MUST be a regular, canonical language tag as specified by 354 <xref target="BCP47"></xref>. 355 Tags described by the 'irregular' production in BCP 47 MUST NOT 356 be 357 used to form the language tag. 358 The source language tag MAY be 359 omitted: some field values do not 360 require it. 361 </t> 362 <t>There is optionally a sequence of fields, where each field has a 363 separator followed by a sequence of one or more subtags. 364 Two identical field 365 separators MUST NOT be present in the language tag.</t> 366 <t> 367 The order of the fields in a 't' extension is not significant. The order of subtags within a field is significant. 368 (See 369 <xref target='canonicalization' /> 370 Canonicalization.) 371 </t> 372 <t> 373 The 't' subtag fields are defined by 374 <eref target="http://unicode.org/reports/tr35/">Section 3</eref> 375 of 376 <xref target="UTS35">Unicode 377 Technical Standard #35: Unicode Locale 378 Data Markup Language</xref>. 379 </t> 380 </list> 381 </t> 382 </section> 383 <section title="Canonicalization" anchor="canonicalization"> 384 <t>As required by 385 <xref target="BCP47"></xref>, the use of uppercase or lowercase letters is not significant in 386 the subtags used in this extension. The canonical form for all 387 subtags in the extension is lowercase, with the fields ordered by 388 the separators, alphabetically. 389 The order of subtags within a field is significant, and MUST NOT be changed in the process of canonicalizing.</t> 390 </section> 391 <section title="BCP47 Registration Form" anchor="regform"> 392 <t> 393 Per 394 <xref target="BCP47">RFC 5646, Section 3.7</xref>: 395 </t> 396 <figure> 397 <artwork> 398%% 399Identifier: t 400Description: Specifying Transformed Content 401Comments: Subtags for the identification of content that has been 402transformed, including but not limited to: 403transliteration, transcription, and translation. 404Added: 2010-mm-dd 405RFC: [TBD] 406Authority: Unicode Consortium 407Contact_Email: cldr-contact@unicode.org 408Mailing_List: cldr-users@unicode.org 409URL: http://www.unicode.org/Public/cldr/latest/core.zip 410%% </artwork> 411 </figure> 412 413 </section> 414 <section title="Field Definitions" anchor="summary"> 415 <t>Assignment of 't' field subtags is determined by the Unicode CLDR 416 Technical Committee, in accordance with the policies and procedures 417 in 418 <eref target="http://www.unicode.org/consortium/tc-procedures.html">http://www.unicode.org/consortium/tc-procedures.html</eref>, 419 and subject to the Unicode Consortium Policies on 420 <eref target="http://www.unicode.org/policies/policies.html">http://www.unicode.org/policies/policies.html</eref>.</t> 421 <t> 422 Assignments that can be made by successive versions of 423 <xref target="UTS35">LDML</xref> 424 by the Unicode Consortium without requiring a new RFC include: 425 <list style="symbols"> 426 <t>The 427 allocation of new field separator subtags for use after the 't' extension.</t> 428 <t>The allocation of subtags valid after a field separator subtag.</t> 429 <t>The addition of subtag aliases and descriptions. </t> 430 <t>The modification of subtag descriptions.</t> 431 </list> 432 Changes to the syntax or meaning of the 't' extension would require a new 433 RFC that obsoletes this document; such an RFC would break stability, and 434 would thus be contrary to the policies of the Unicode Consortium. 435 </t> 436 <t> 437 At the time this document was published, one field was specified in 438 <xref target="UTS35"></xref>: the transform mechanism. 439 That field is summarized here: 440 <list style="letters"> 441 <t> 442 The transform mechanism consists of a sequence of 443 subtags 444 starting 445 with the 'm0' separator followed by one or more 446 mechanism subtags. 447 Each mechanism subtag has a length of 3 to 8 448 alphanumeric 449 characters. 450 The sequence as a whole provides an 451 identification of the 452 specification 453 for the transform, 454 such as the 455 mechanism subtag 'ungegn' in 456 "und-Cyrl-t-und-latn-m0-ungegn". 457 In 458 many cases, only one mechanism subtag is necessary, but 459 multiple 460 subtags MAY be defined in 461 <xref target="UTS35"></xref> 462 where necessary. 463 </t> 464 <t> 465 Any purely numeric subtag is a representation of a date in the 466 Gregorian calendar. 467 It MAY occur in any mechanism field, but it SHOULD only be used where necessary. 468 If it does occur: 469 <list style="symbols"> 470 <t>it MUST occur as the final subtag in the field</t> 471 <t>it MUST NOT be the only subtag in the field</t> 472 <t>it MUST only consist of a sequence of digits of the form YYYY, 473 YYYYMM, or YYYYMMDD</t> 474 <t>it SHOULD be as short as possible</t> 475 <t>Note: The format is related to that of <xref target="RFC3339"></xref>, but is not the same. 476 The RFC 3339 full-date won't work because it uses hyphens. The offset ("Z") is not used 477 because the date is a publication date (aka 'floating date'). For more information, see 478 Section 3.3, Floating Time in 479 <xref target="W3C-TimeZones"></xref>.</t> 480 </list> 481 Examples: 482 <list style="symbols"> 483 <t>20110623 represents June 23rd, 2011.</t> 484 <t>There are 3 dated versions of the UNGEGN transliteration 485 specification for Hebrew to Latin. They can be represented by the following language tags: 486 <list style="symbols"> 487 <t>und-Hebr-t-und-Latn-m0-ungegn-1972</t> 488 <t>und-Hebr-t-und-Latn-m0-ungegn-1977</t> 489 <t>und-Hebr-t-und-Latn-m0-ungegn-2007</t> 490 </list> 491 </t> 492 <t>Suppose that the BGN transliteration 493 specification for Cyrillic to Latin had three versions, 494 dated 495 June 11th, 1999; Dec 30th, 1999; and May 1st, 2011. 496 In that 497 case, the corresponding first two DATE subtags would require 498 months 499 to be distinctive (199906 and 199912), but the last 500 subtag 501 would only 502 require the year (2011).</t> 503 </list> 504 </t> 505 <t> 506 Some mechanisms may use a versioning system that is not 507 distinguished by date, or not by date alone. 508 In the latter case, 509 the version will be of a form specified by 510 <xref target="UTS35"></xref> 511 for that mechanism. 512 For example, if the mechanism XXX uses 513 versions of the form v21a, 514 then a tag could look like 515 "ja-t-it-m0-xxx-v21a". If there are 516 multiple subversions 517 distinguished by date, 518 then a tag could look like 519 "ja-t-it-m0-xxx-v21a-2007". 520 </t> 521 </list> 522 523 </t> 524 <t>A language tag with the 't' extension MAY be used to request a specific transform of content. 525 In such a case, the recipient SHOULD return content that corresponds 526 as closely as feasible to the requested transform, including the specification of the mechanism. 527 For example, if the request is ja-t-it-m0-xxx-v21a-2007, 528 and the recipient has content corresponding to both ja-t-it-m0-xxx-v21a and ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred. 529 As is the case for language matching as discussed in <xref target="BCP47"></xref>, 530 different implementations MAY have different measures of "closeness".</t> 531 </section> 532 <section title="Registration of Field Subtags" anchor="registration"> 533 <t>Registration of transform mechanisms is requested by filing a ticket at 534 <eref target="http://cldr.unicode.org/">cldr.unicode.org</eref>. 535 The proposal in the ticket MUST contain the following information:</t> 536 <texttable> 537 <ttcol>Item</ttcol> 538 <ttcol>Description</ttcol> 539 <c>Subtag</c> 540 <c>The proposed mechanism subtag (or subtag sequence).</c> 541 <c>Description</c> 542 <c>A description of the proposed mechanism; that description MUST be sufficient to distinguish it from other mechanisms in use.</c> 543 <c>Version</c> 544 <c>If versioning for the mechanism is not done according to date, then a description of the versioning conventions used for the mechanism.</c> 545 </texttable> 546 <t>Proposals for clarifications of descriptions or additional aliases may also be requested by filing a ticket.</t> 547 <t>The committee MAY define a template for submissions that requests more information, 548 if it is found that such information would be useful in evaluating proposals.</t> 549 </section> 550 <section title="Registration of Additional Fields" anchor="field-registration"> 551 <t>In the event that it proves necessary to add an additional field (such as 'm2'), 552 it can be requested by filing a ticket at 553 <eref target="http://cldr.unicode.org/">cldr.unicode.org</eref>. 554 The proposal in the ticket MUST contain a full description of the 555 proposed field semantics and subtag syntax, 556 and MUST be conform to the ABNF syntax for "field" presented in <xref target="structure" />.</t> 557 </section> 558 <section title="Committee Responses to Registration Proposals" anchor="committee-responses"> 559 <t>The committee MUST post each proposal publicly within 2 weeks after reception, 560 to allow for comments. The committee must respond publicly to each proposal within 4 weeks after reception.</t> 561 <t>The response MAY: 562 <list style="symbols"> 563 <t>request more information or clarification</t> 564 <t>accept the proposal, optionally with modifications to the subtag or description</t> 565 <t>reject the proposal, because of significant objections raised on the mailing list or 566 due to problems with constraints in this document or in <xref target="UTS35"></xref></t> 567 </list> 568 </t> 569 <t>Accepted tickets result in a new entry in the machine-readable CLDR BCP47 data, 570 or in the case of a clarified description, 571 modifications to the description attribute value for an existing entry.</t> 572 </section> 573 <section title="Machine-Readable Data" anchor="machine-readable"> 574 <t> 575 EDITORIAL NOTE: The following parallels the structure used for the 576 'u' extension 577 <xref target="RFC6067"></xref>, 578 for which the Unicode Consortium is the maintaining authority. 579 The 580 data and 581 specification will be available by the time this internet 582 draft has 583 been 584 approved. The description field is in the process of being added to CLDR. 585 </t> 586 <t> 587 Beginning with CLDR version 1.7.2, machine-readable files are 588 available listing the data defined for BCP47 extensions for each 589 successive version of 590 <xref target="UTS35"></xref>. These releases are listed on 591 <eref target="http://cldr.unicode.org/index/downloads">http://cldr.unicode.org/index/downloads</eref>. 592 Each release has an associated data directory of the form 593 "http://unicode.org/Public/cldr/<version>", where 594 "<version>" is replaced by the release number. For example, 595 for version 1.7.2, the "core.zip" file is located at 596 <eref target="http://unicode.org/Public/cldr/1.7.2/">http://unicode.org/Public/cldr/1.7.2/core.zip</eref>. 597 The most 598 recent version is always identified by the version "latest" and can 599 be accessed by the URL in 600 <xref target="regform"></xref>.</t> 601 <t>Inside the "core.zip" file, the directory "common/bcp47" contains the 602 data files listing the valid attributes, keys, and types for each successive version of <xref target="UTS35"></xref>. 603 Each data file list the keys and types relevant to that topic. For example, mechanism.xml contains the subtags (types) for the 't' mechanisms.</t> 604 <t>The XML structure lists the keys, such as <key extension="t" name="m0" alias="collation" description="Transliteration extension mechanism">, with subelements for the types, 605 such as <type name="ungegn" description="United Nations Group of Experts on Geographical Names"/>. The currently defined attributes for the mechanisms include:</t> 606 <texttable> 607 <ttcol>Attribute</ttcol> 608 <ttcol>Description</ttcol> 609 <ttcol>Examples</ttcol> 610 611 <c>name</c> 612 <c>The name of the mechanism, limited to 3-8 characters (or sequences of them).</c> 613 <c>UNGEGN, ALALC</c> 614 615 <c>description</c> 616 <c>A description of the name, with all and only that information necessary to distinguish one name 617 from others with which it might be confused. Descriptions are not intended to provide general background information.</c> 618 <c>United Nations Group of Experts on Geographical Names; American Library Association-Library of Congress</c> 619 620 <c>since</c> 621 <c>Indicates the first version of CLDR where the name appears. (Required for new items.)</c> 622 <c>1.9, 2.0.1</c> 623 624 <c>alias</c> 625 <c>Alternative name of the key or type, not limited in number of characters. Aliases are intended for backwards compatibility, 626 not to provide all possible alternate names or designations. (Optional)</c> 627 <c></c> 628 629 </texttable> 630 <t>The file for the transform extension is "transform.xml". 631 The initial version of that file contains the following information.</t> 632 <figure><artwork> 633<key extension="t" name="m0" description= 634 "Transliteration extension mechanism"/> 635 <type name="ungegn" description= 636 "United Nations Group of Experts on Geographical Names"/> 637 <type name="alaloc" description= 638 "American Library Association-Library of Congress"/> 639 <type name="bgn" description= 640 "US Board on Geographic Names"/> 641 <type name="mcst" description= 642 "Korean Ministry of Culture, Sports and Tourism"/> 643 <type name="iso" description= 644 "International Organization for Standardization"/> 645 <type name="din" description= 646 "Deutsches Institut fuer Normung"/> 647 <type name="gost" description= 648 "Euro-Asian Council for Standardization, Metrology 649 and Certification"/> 650</key> 651 </artwork></figure> 652 <t> 653 To get the version information in XML when working with the data 654 files, the XML parser must be validating. When the 'core.zip' file 655 is unzipped, the 'dtd' directory will be at the same level as the 656 'bcp47' directory; that is required for correct validation. For 657 each release after CLDR 1.8, types introduced in that release are 658 also marked in the data files by the XML attribute "since", such as 659 in the following example: 660 <figure> 661 <artwork><type name="adp" since="1.9"/> </artwork> 662 </figure> 663 </t> 664 <t> 665 The data is also currently maintained in a source code repository, 666 with each release tagged, for viewing directly without unzipping. 667 For example, see: 668 <list style="symbols"> 669 <t>http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/</t> 670 <t>http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/</t> 671 </list> 672 </t> 673 <t>For more information, see 674 <eref target="http://cldr.unicode.org/index/bcp47-extension">http://cldr.unicode.org/index/bcp47-extension</eref>.</t> 675 </section> 676 </section> 677 <section anchor="Acknowledgements" title="Acknowledgements"> 678 <t>Thanks to John Emmons and the rest of the Unicode 679 CLDR Technical 680 Committee for their work in developing the BCP 47 subtags 681 for LDML.</t> 682 </section> 683 684 <section anchor="IANA" title="IANA Considerations"> 685 <t> 686 This document will require IANA to insert the record of 687 <xref target="regform"></xref> 688 into the Language Extensions Registry, according to 689 Section 3.7, 690 Extensions and the Extensions Registry of "Tags for 691 Identifying 692 Languages" in 693 <xref target="BCP47"></xref>. Per Section 5.2 of 694 <xref target="BCP47"></xref>, there might be occasional (rare) requests by the Unicode 695 Consortium (the "Authority" listed in the record) for maintenance of 696 this record. Changes that can be submitted to IANA without the 697 publication of a new RFC are limited to modification of the 698 Comments, Contact_Email, Mailing_List, and URL fields. Any such 699 requested changes MUST use the domain 'unicode.org' in any new 700 addresses or URIs, MUST explicitly cite this document (so that IANA 701 can reference these requirements), and MUST originate from the 702 'unicode.org' domain. The domain or authority can only be changed 703 via a new RFC. 704 </t> 705 <t>This document does not require IANA to create or maintain a new 706 registry or otherwise impact IANA.</t> 707 </section> 708 709 <section anchor="Security" title="Security Considerations"> 710 <t> 711 The security considerations for this extension are the same as those 712 for 713 <xref target="BCP47"></xref>. See 714 <xref target="BCP47">RFC 5646, Section 6, Security Considerations</xref>. 715 </t> 716 </section> 717 </middle> 718 719 720 721 <back> 722 <references title="Normative References"> 723 <reference anchor="UTS35" target="http://www.unicode.org/reports/tr35/"> 724 <front> 725 <title abbrev="LDML"> 726 Unicode Technical Standard #35: Locale Data 727 Markup Language (LDML) 728 </title> 729 <author initials="M" surname="Davis" fullname="Mark Davis"> 730 <organization>Unicode Consortium</organization> 731 </author> 732 <date day="21" month="December" year="2007" /> 733 </front> 734 </reference> 735 <reference anchor="BCP47"> 736 <front> 737 <title abbrev="BCP47">Tags for the Identification of Language (BCP47)</title> 738 <author initials="M.E." surname="Davis" fullname="Mark Davis" 739 role="editor"> 740 <organization>Google</organization> 741 </author> 742 <author initials="A." surname="Phillips" fullname="Addison Phillips" 743 role="editor"> 744 <organization>Lab126</organization> 745 </author> 746 <date month="September" year="2009" /> 747 </front> 748 </reference> 749 <reference anchor="RFC6067"> 750 <front> 751 <title abbrev="RFC6067">BCP 47 Extension U</title> 752 <author initials="M.E." surname="Davis" fullname="Mark Davis" 753 role="editor"> 754 <organization>Google 755 </organization> 756 </author> 757 <author initials="A." surname="Phillips" fullname="Addison Phillips" 758 role="editor"> 759 <organization>Lab126</organization> 760 </author> 761 <author initials="Y." surname="Umaoka" fullname="Yoshito Umaoka" 762 role="editor"> 763 <organization>IBM</organization> 764 </author> 765 <date month="September" year="2010" /> 766 </front> 767 </reference> 768 <reference anchor="RFC5234"> 769 <front> 770 <title>Augmented BNF for Syntax Specifications: ABNF</title> 771 <author surname="Crocker" fullname="Dave Crocker" 772 role="editor"> 773 <organization>International Organization for Standardization</organization> 774 </author> 775 <date year="2008" /> 776 <abstract> 777 <t> Internet technical specifications often need to define a formal 778 syntax. Over the years, a modified version of Backus-Naur Form 779 (BNF), called Augmented BNF (ABNF), has been popular among many 780 Internet specifications. The current specification documents ABNF. 781 It balances compactness and simplicity with reasonable 782 representational power. The differences between standard BNF and 783 ABNF involve naming rules, repetition, alternatives, order- 784 independence, and value ranges. This specification also supplies 785 additional rule definitions and encoding for a core lexical analyzer 786 of the type common to several Internet specifications.</t> 787 </abstract> 788 </front> 789 </reference> 790 </references> 791 <references title="Informative References"> 792 <reference anchor="ldml-registry"> 793 <front> 794 <title>Registry for Common Locale Data Repository tag elements</title> 795 <author fullname="Unicode Consortium"></author> 796 <date year="2009" month="September" /> 797 </front> 798 </reference> 799 <reference anchor="W3C-TimeZones" target="http://www.w3.org/TR/2011/NOTE-timezone-20110705/"> 800 <front> 801 <title>W3C Working Group Note: Working with Time Zones</title> 802 <author surname="Phillips" fullname="Addison Phillips" role="editor"> 803 <organization>W3C</organization> 804 </author> 805 <date year="2011" month="July" /> 806 </front> 807 </reference> 808 <reference anchor="RFC3339"> 809 <front> 810 <title>Date and Time on the Internet: Timestamps</title> 811 <author surname="Klyne" fullname="Graham Klyne" 812 role="editor"> 813 <organization>Clearswift Corporation</organization> 814 </author> 815 <author surname="Newman" fullname="Chris Newman" 816 role="editor"> 817 <organization>Sun Microsystems</organization> 818 </author> 819 <date year="2002" /> 820 <abstract> 821 <t> This document specifies an Internet standards track protocol for the 822 Internet community, and requests discussion and suggestions for 823 improvements. Please refer to the current edition of the "Internet 824 Official Protocol Standards" (STD 1) for the standardization state 825 and status of this protocol. Distribution of this memo is unlimited. 826 </t> 827 </abstract> 828 </front> 829 </reference> 830 </references> 831 832 833 </back> 834</rfc> 835