• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1
2
3
4Internet Engineering Task Force                                 M. Davis
5Internet-Draft                                                    Google
6Intended status: Informational                               A. Phillips
7Expires: June 7, 2012                                             Lab126
8                                                               Y. Umaoka
9                                                                     IBM
10                                                                 C. Falk
11                                                       Infinite Automata
12                                                        December 5, 2011
13
14
15                BCP 47 Extension T - Transformed Content
16                      draft-davis-t-langtag-ext-07
17
18Abstract
19
20   This document specifies an Extension to BCP 47 which provides subtags
21   for specifying the source language or script of transformed content,
22   including content that has been transliterated, transcribed, or
23   translated, or in some other way influenced by the source.  It also
24   provides for additional information used for identification.
25
26Status of this Memo
27
28   This Internet-Draft is submitted in full conformance with the
29   provisions of BCP 78 and BCP 79.
30
31   Internet-Drafts are working documents of the Internet Engineering
32   Task Force (IETF).  Note that other groups may also distribute
33   working documents as Internet-Drafts.  The list of current Internet-
34   Drafts is at http://datatracker.ietf.org/drafts/current/.
35
36   Internet-Drafts are draft documents valid for a maximum of six months
37   and may be updated, replaced, or obsoleted by other documents at any
38   time.  It is inappropriate to use Internet-Drafts as reference
39   material or to cite them other than as "work in progress."
40
41   This Internet-Draft will expire on June 7, 2012.
42
43Copyright Notice
44
45   Copyright (c) 2011 IETF Trust and the persons identified as the
46   document authors.  All rights reserved.
47
48   This document is subject to BCP 78 and the IETF Trust's Legal
49   Provisions Relating to IETF Documents
50   (http://trustee.ietf.org/license-info) in effect on the date of
51   publication of this document.  Please review these documents
52
53
54
55Davis, et al.             Expires June 7, 2012                  [Page 1]
56
57Internet-Draft             BCP 47 Extension T              December 2011
58
59
60   carefully, as they describe your rights and restrictions with respect
61   to this document.
62
63
64Table of Contents
65
66   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
67     1.1.  Requirements Language  . . . . . . . . . . . . . . . . . .  4
68   2.  BCP47 Required Information . . . . . . . . . . . . . . . . . .  4
69     2.1.  Overview . . . . . . . . . . . . . . . . . . . . . . . . .  4
70     2.2.  Structure  . . . . . . . . . . . . . . . . . . . . . . . .  6
71     2.3.  Canonicalization . . . . . . . . . . . . . . . . . . . . .  7
72     2.4.  BCP47 Registration Form  . . . . . . . . . . . . . . . . .  8
73     2.5.  Field Definitions  . . . . . . . . . . . . . . . . . . . .  8
74     2.6.  Registration of Field Subtags  . . . . . . . . . . . . . . 10
75     2.7.  Registration of Additional Fields  . . . . . . . . . . . . 10
76     2.8.  Committee Responses to Registration Proposals  . . . . . . 11
77     2.9.  Machine-Readable Data  . . . . . . . . . . . . . . . . . . 11
78   3.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
79   4.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 14
80   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
81   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
82     6.1.  Normative References . . . . . . . . . . . . . . . . . . . 14
83     6.2.  Informative References . . . . . . . . . . . . . . . . . . 14
84   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 15
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111Davis, et al.             Expires June 7, 2012                  [Page 2]
112
113Internet-Draft             BCP 47 Extension T              December 2011
114
115
1161.  Introduction
117
118   [BCP47] permits the definition and registration of language tag
119   extensions "that contain a language component and are compatible with
120   applications that understand language tags".  This document defines
121   an extension for specifying the source of content that has been
122   transformed, including text that has been transliterated,
123   transcribed, or translated, or in some other way influenced by the
124   source.  It may be used in queries to request content that has been
125   transformed.  The "singleton" identifier for this extension is 't'.
126
127   Language tags, as defined by [BCP47], are useful for identifying the
128   language of content.  There are mechanisms for specifying variant
129   subtags for special purposes.  However, these variants are
130   insufficient for specifying content that has undergone
131   transformations, including content that has been transliterated,
132   transcribed, or translated.  The correct interpretation of the
133   content may depend upon knowledge of the conventions used for the
134   transformation.
135
136   Suppose that Italian or Russian cities on a map are transcribed for
137   Japanese users.  Each name needs to be transliterated into katakana
138   using rules appropriate for the specific source and target language.
139   When tagging such data, it is important to be able to indicate not
140   only the resulting content language ("ja" in this case), but also the
141   source language.
142
143   Transforms such as transliterations may vary depending not only on
144   the basis of the source and target script, but also on the source and
145   target language.  Thus the Russian <U+041F U+0443 U+0442 U+0438
146   U+043D> (which corresponds to the Cyrillic <PE, U, TE, I, EN>)
147   transliterates into "Putin" in English but "Poutine" in French.  The
148   identifier could be used to indicate a desired mechanical
149   transformation in an API, or could be used to tag data that has been
150   converted (mechanically or by hand) according to a transliteration
151   method.
152
153   In addition, many different conventions have arisen for how to
154   transform text, even between the same languages and scripts.  For
155   example, "Gaddafi" is commonly transliterated from Arabic to English
156   as any of (G/Q/K/Kh)a(d/dh/dd/dhdh/th/zz)af(i/y).  Some examples of
157   standardized conventions used for transcribing or transliterating
158   text include:
159
160   a.  United Nations Group of Experts on Geographical Names (UNGEGN)
161
162   b.  US Library of Congress (LOC)
163
164
165
166
167Davis, et al.             Expires June 7, 2012                  [Page 3]
168
169Internet-Draft             BCP 47 Extension T              December 2011
170
171
172   c.  US Board on Geographic Names (BGN)
173
174   d.  Korean Ministry of Culture, Sports and Tourism (MCST)
175
176   e.  International Organization for Standardization (ISO)
177
178   The usage of this extension is not limited to formal transformations,
179   and may include other instances where the content is in some other
180   way influenced by the source.  For example, this extension could be
181   used to designate a request for a speech recognizer that is tailored
182   specifically for 2nd-language speakers who are 1st-language speakers
183   of a particular language (e.g. a recognizer for "English spoken with
184   a Chinese accent").
185
1861.1.  Requirements Language
187
188   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
189   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
190   document are to be interpreted as described in RFC 2119.
191
192
1932.  BCP47 Required Information
194
1952.1.  Overview
196
197   Identification of transformed content can be done using the 't'
198   extension defined in this document.  This extension is formed by the
199   't' singleton followed by a sequence of subtags that would form a
200   language tag as defined by [BCP47].  This allows for the source
201   language or script to be specified to the degree of precision
202   required.  There are restrictions on the sequence of subtags.  They
203   MUST form a regular, valid, canonical language tag, and MUST neither
204   include extensions nor private use sequences introduced by the
205   singleton 'x'.  Where only the script is relevant (such as
206   identifying a script-script transliteration) then 'und' is used for
207   the primary language subtag.
208
209   For example:
210
211   +---------------------+---------------------------------------------+
212   | Language Tag        | Description                                 |
213   +---------------------+---------------------------------------------+
214   | ja-t-it             | The content is Japanese, transformed from   |
215   |                     | Italian.                                    |
216   | ja-Kana-t-it        | The content is Japanese Katakana,           |
217   |                     | transformed from Italian.                   |
218
219
220
221
222
223Davis, et al.             Expires June 7, 2012                  [Page 4]
224
225Internet-Draft             BCP 47 Extension T              December 2011
226
227
228   | und-Latn-t-und-cyrl | The content is in the Latin script,         |
229   |                     | transformed from the Cyrillic script.       |
230   +---------------------+---------------------------------------------+
231
232   Note that the sequence of subtags governed by 't' cannot contain a
233   singleton (a single-character subtag), because that would start a new
234   extension.  For example, the tag "ja-t-i-ami" does not indicate that
235   the source is in "i-ami", because "i-ami" is not a regular language
236   tag in [BCP47].  That tag would express an empty 't' extension
237   followed by an 'i' extension.
238
239   The 't' extension is not intended for use in structured data that
240   already provides separate source and target language identifiers.
241   For example, this is the case in localization interchange formats
242   such as XLIFF.  In such cases, it would be inappropriate to use "ja-
243   t-it" for the target language tag because the source language tag
244   "it" would already be present in the data.  Instead one would use the
245   language tag "ja".
246
247   As noted earlier, it is sometimes necessary to indicate additional
248   information about a transformation.  This additional information is
249   optionally supplied after the source in a series of one or more
250   fields, where each field consists of a field separator subtag
251   followed by one or more non-separator subtags.  Each field separator
252   subtag consists of a single letter followed by a single digit.
253
254   A transformation mechanism is an optional field that indicates the
255   specification used for the transformation, such as "UNGEGN" for the
256   the United Nations Group of Experts on Geographical Names
257   transliterations and transcriptions.  It uses the 'm0' field
258   separator followed by certain subtags.
259
260   For example:
261
262   +------------------------------------+------------------------------+
263   | Language Tag                       | Description                  |
264   +------------------------------------+------------------------------+
265   | und-Cyrl-t-und-latn-m0-ungegn-2007 | the content is in Cyrillic,  |
266   |                                    | transformed from Latn,       |
267   |                                    | according to a UNGEGN        |
268   |                                    | specification dated 2007.    |
269   +------------------------------------+------------------------------+
270
271   The field separator subtags such as 'm0' were chosen because they are
272   short, visually distinctive, and cannot occur in a language subtag
273   (outside of an extension and after 'x'), thus eliminating the
274   potential for collision or confusion with the source language tag.
275
276
277
278
279Davis, et al.             Expires June 7, 2012                  [Page 5]
280
281Internet-Draft             BCP 47 Extension T              December 2011
282
283
284   The field subtags are defined by Section 3 [1] of Unicode Technical
285   Standard #35: Unicode Locale Data Markup Language [UTS35] (LDML), the
286   main specification for the Unicode Common Locale Data Repository
287   (CLDR) project.  As required by BCP 47, subtags follow the language
288   tag ABNF and other rules for the formation of language tags and
289   subtags, are restricted to the ASCII letters and digits, are not case
290   sensitive, and do not exceed eight characters in length.
291
292   EDITORIAL NOTE: This new facility has been accepted by the Unicode
293   CLDR committee for incorporation into the next versions of CLDR and
294   LDML, parallel with the structure of the 'u' extension [RFC6067], for
295   which it is already the maintaining authority.  The data and
296   specification will be available by the time this internet draft has
297   been approved.
298
299   The LDML specification is available over the Internet and at no cost,
300   and is available via a royalty-free license at
301   http://unicode.org/copyright.html.  LDML is versioned, and each
302   version of LDML is numbered, dated, and stable.  Extension subtags,
303   once defined by LDML, are never retracted or substantially changed in
304   meaning.
305
306   The maintaining authority for the 't' extension is the Unicode
307   Consortium:
308
309   +---------------+---------------------------------------------------+
310   | Item          | Value                                             |
311   +---------------+---------------------------------------------------+
312   | Name          | Unicode Consortium                                |
313   | Contact Email | cldr-contact@unicode.org                          |
314   | Discussion    | cldr-users@unicode.org                            |
315   | List Email    |                                                   |
316   | URL Location  | cldr.unicode.org                                  |
317   | Specification | Unicode Technical Standard #35 Unicode Locale     |
318   |               | Data Markup Language (LDML),                      |
319   |               | http://unicode.org/reports/tr35/                  |
320   | Section       | Section 3 Unicode Language and Locale Identifiers |
321   +---------------+---------------------------------------------------+
322
3232.2.  Structure
324
325   The subtags in the 't' extension are of the following form:
326
327
328
329
330
331
332
333
334
335Davis, et al.             Expires June 7, 2012                  [Page 6]
336
337Internet-Draft             BCP 47 Extension T              December 2011
338
339
340   t-ext=    "t"                      ; Extension
341             (("-" lang *("-" field)) ; Source + optional field(s)
342             / 1*("-" field))         ; Field(s) only (no source)
343
344   lang=     language                 ; BCP47, with restrictions
345             ["-" script]
346             ["-" region]
347             *("-" variant)
348
349   field=    sep 1*("-" 3*8alphanum)  ; With restrictions
350
351   sep=      ALPHA DIGIT              ; Subtag separators
352   alphanum= ALPHA / DIGIT
353
354   where <language>, <script>, <region>, and <variant> rules are
355   specified in [BCP47], <ALPHA> and <DIGIT> rules - in [RFC5234].
356
357   Description and restrictions:
358
359   a.  The 't' extension MUST have at least one subtag.
360
361   b.  The 't' extension normally starts with a source language tag,
362       which MUST be a regular, canonical language tag as specified by
363       [BCP47].  Tags described by the 'irregular' production in BCP 47
364       MUST NOT be used to form the language tag.  The source language
365       tag MAY be omitted: some field values do not require it.
366
367   c.  There is optionally a sequence of fields, where each field has a
368       separator followed by a sequence of one or more subtags.  Two
369       identical field separators MUST NOT be present in the language
370       tag.
371
372   d.  The order of the fields in a 't' extension is not significant.
373       The order of subtags within a field is significant.  (See
374       Section 2.3 Canonicalization.)
375
376   e.  The 't' subtag fields are defined by Section 3 [1] of Unicode
377       Technical Standard #35: Unicode Locale Data Markup Language
378       [UTS35].
379
3802.3.  Canonicalization
381
382   As required by [BCP47], the use of uppercase or lowercase letters is
383   not significant in the subtags used in this extension.  The canonical
384   form for all subtags in the extension is lowercase, with the fields
385   ordered by the separators, alphabetically.  The order of subtags
386   within a field is significant, and MUST NOT be changed in the process
387   of canonicalizing.
388
389
390
391Davis, et al.             Expires June 7, 2012                  [Page 7]
392
393Internet-Draft             BCP 47 Extension T              December 2011
394
395
3962.4.  BCP47 Registration Form
397
398   Per RFC 5646, Section 3.7 [BCP47]:
399
400   %%
401   Identifier: t
402   Description: Specifying Transformed Content
403   Comments: Subtags for the identification of content that has been
404   transformed, including but not limited to:
405   transliteration, transcription, and translation.
406   Added: 2010-mm-dd
407   RFC: [TBD]
408   Authority: Unicode Consortium
409   Contact_Email: cldr-contact@unicode.org
410   Mailing_List: cldr-users@unicode.org
411   URL: http://www.unicode.org/Public/cldr/latest/core.zip
412   %%
413
4142.5.  Field Definitions
415
416   Assignment of 't' field subtags is determined by the Unicode CLDR
417   Technical Committee, in accordance with the policies and procedures
418   in http://www.unicode.org/consortium/tc-procedures.html, and subject
419   to the Unicode Consortium Policies on
420   http://www.unicode.org/policies/policies.html.
421
422   Assignments that can be made by successive versions of LDML [UTS35]
423   by the Unicode Consortium without requiring a new RFC include:
424
425   o  The allocation of new field separator subtags for use after the
426      't' extension.
427
428   o  The allocation of subtags valid after a field separator subtag.
429
430   o  The addition of subtag aliases and descriptions.
431
432   o  The modification of subtag descriptions.
433
434   Changes to the syntax or meaning of the 't' extension would require a
435   new RFC that obsoletes this document; such an RFC would break
436   stability, and would thus be contrary to the policies of the Unicode
437   Consortium.
438
439   At the time this document was published, one field was specified in
440   [UTS35]: the transform mechanism.  That field is summarized here:
441
442   a.  The transform mechanism consists of a sequence of subtags
443       starting with the 'm0' separator followed by one or more
444
445
446
447Davis, et al.             Expires June 7, 2012                  [Page 8]
448
449Internet-Draft             BCP 47 Extension T              December 2011
450
451
452       mechanism subtags.  Each mechanism subtag has a length of 3 to 8
453       alphanumeric characters.  The sequence as a whole provides an
454       identification of the specification for the transform, such as
455       the mechanism subtag 'ungegn' in "und-Cyrl-t-und-latn-m0-ungegn".
456       In many cases, only one mechanism subtag is necessary, but
457       multiple subtags MAY be defined in [UTS35] where necessary.
458
459   b.  Any purely numeric subtag is a representation of a date in the
460       Gregorian calendar.  It MAY occur in any mechanism field, but it
461       SHOULD only be used where necessary.  If it does occur:
462
463       *  it MUST occur as the final subtag in the field
464
465       *  it MUST NOT be the only subtag in the field
466
467       *  it MUST only consist of a sequence of digits of the form YYYY,
468          YYYYMM, or YYYYMMDD
469
470       *  it SHOULD be as short as possible
471
472       Note: The format is related to that of [RFC3339], but is not the
473       same.  The RFC 3339 full-date won't work because it uses hyphens.
474       The offset ("Z") is not used because the date is a publication
475       date (aka 'floating date').  For more information, see Section
476       3.3, Floating Time in [W3C-TimeZones].
477
478   c.  Examples:
479
480       *  20110623 represents June 23rd, 2011.
481
482       *  There are 3 dated versions of the UNGEGN transliteration
483          specification for Hebrew to Latin.  They can be represented by
484          the following language tags:
485
486          +  und-Hebr-t-und-Latn-m0-ungegn-1972
487
488          +  und-Hebr-t-und-Latn-m0-ungegn-1977
489
490          +  und-Hebr-t-und-Latn-m0-ungegn-2007
491
492       *  Suppose that the BGN transliteration specification for
493          Cyrillic to Latin had three versions, dated June 11th, 1999;
494          Dec 30th, 1999; and May 1st, 2011.  In that case, the
495          corresponding first two DATE subtags would require months to
496          be distinctive (199906 and 199912), but the last subtag would
497          only require the year (2011).
498
499
500
501
502
503Davis, et al.             Expires June 7, 2012                  [Page 9]
504
505Internet-Draft             BCP 47 Extension T              December 2011
506
507
508   d.  Some mechanisms may use a versioning system that is not
509       distinguished by date, or not by date alone.  In the latter case,
510       the version will be of a form specified by [UTS35] for that
511       mechanism.  For example, if the mechanism XXX uses versions of
512       the form v21a, then a tag could look like "ja-t-it-m0-xxx-v21a".
513       If there are multiple subversions distinguished by date, then a
514       tag could look like "ja-t-it-m0-xxx-v21a-2007".
515
516   A language tag with the 't' extension MAY be used to request a
517   specific transform of content.  In such a case, the recipient SHOULD
518   return content that corresponds as closely as feasible to the
519   requested transform, including the specification of the mechanism.
520   For example, if the request is ja-t-it-m0-xxx-v21a-2007, and the
521   recipient has content corresponding to both ja-t-it-m0-xxx-v21a and
522   ja-t-it-m0-xxx-v21b-2009, then the v21a version would be preferred.
523   As is the case for language matching as discussed in [BCP47],
524   different implementations MAY have different measures of "closeness".
525
5262.6.  Registration of Field Subtags
527
528   Registration of transform mechanisms is requested by filing a ticket
529   at cldr.unicode.org [2].  The proposal in the ticket MUST contain the
530   following information:
531
532   +-------------+-----------------------------------------------------+
533   | Item        | Description                                         |
534   +-------------+-----------------------------------------------------+
535   | Subtag      | The proposed mechanism subtag (or subtag sequence). |
536   | Description | A description of the proposed mechanism; that       |
537   |             | description MUST be sufficient to distinguish it    |
538   |             | from other mechanisms in use.                       |
539   | Version     | If versioning for the mechanism is not done         |
540   |             | according to date, then a description of the        |
541   |             | versioning conventions used for the mechanism.      |
542   +-------------+-----------------------------------------------------+
543
544   Proposals for clarifications of descriptions or additional aliases
545   may also be requested by filing a ticket.
546
547   The committee MAY define a template for submissions that requests
548   more information, if it is found that such information would be
549   useful in evaluating proposals.
550
5512.7.  Registration of Additional Fields
552
553   In the event that it proves necessary to add an additional field
554   (such as 'm2'), it can be requested by filing a ticket at
555   cldr.unicode.org [2].  The proposal in the ticket MUST contain a full
556
557
558
559Davis, et al.             Expires June 7, 2012                 [Page 10]
560
561Internet-Draft             BCP 47 Extension T              December 2011
562
563
564   description of the proposed field semantics and subtag syntax, and
565   MUST be conform to the ABNF syntax for "field" presented in
566   Section 2.2.
567
5682.8.  Committee Responses to Registration Proposals
569
570   The committee MUST post each proposal publicly within 2 weeks after
571   reception, to allow for comments.  The committee must respond
572   publicly to each proposal within 4 weeks after reception.
573
574   The response MAY:
575
576   o  request more information or clarification
577
578   o  accept the proposal, optionally with modifications to the subtag
579      or description
580
581   o  reject the proposal, because of significant objections raised on
582      the mailing list or due to problems with constraints in this
583      document or in [UTS35]
584
585   Accepted tickets result in a new entry in the machine-readable CLDR
586   BCP47 data, or in the case of a clarified description, modifications
587   to the description attribute value for an existing entry.
588
5892.9.  Machine-Readable Data
590
591   EDITORIAL NOTE: The following parallels the structure used for the
592   'u' extension [RFC6067], for which the Unicode Consortium is the
593   maintaining authority.  The data and specification will be available
594   by the time this internet draft has been approved.  The description
595   field is in the process of being added to CLDR.
596
597   Beginning with CLDR version 1.7.2, machine-readable files are
598   available listing the data defined for BCP47 extensions for each
599   successive version of [UTS35].  These releases are listed on
600   http://cldr.unicode.org/index/downloads.  Each release has an
601   associated data directory of the form
602   "http://unicode.org/Public/cldr/<version>", where "<version>" is
603   replaced by the release number.  For example, for version 1.7.2, the
604   "core.zip" file is located at
605   http://unicode.org/Public/cldr/1.7.2/core.zip [3].  The most recent
606   version is always identified by the version "latest" and can be
607   accessed by the URL in Section 2.4.
608
609   Inside the "core.zip" file, the directory "common/bcp47" contains the
610   data files listing the valid attributes, keys, and types for each
611   successive version of [UTS35].  Each data file list the keys and
612
613
614
615Davis, et al.             Expires June 7, 2012                 [Page 11]
616
617Internet-Draft             BCP 47 Extension T              December 2011
618
619
620   types relevant to that topic.  For example, mechanism.xml contains
621   the subtags (types) for the 't' mechanisms.
622
623   The XML structure lists the keys, such as <key extension="t"
624   name="m0" alias="collation" description="Transliteration extension
625   mechanism">, with subelements for the types, such as <type
626   name="ungegn" description="United Nations Group of Experts on
627   Geographical Names"/>.  The currently defined attributes for the
628   mechanisms include:
629
630   +-------------+-------------------------------+---------------------+
631   | Attribute   | Description                   | Examples            |
632   +-------------+-------------------------------+---------------------+
633   | name        | The name of the mechanism,    | UNGEGN, ALALC       |
634   |             | limited to 3-8 characters (or |                     |
635   |             | sequences of them).           |                     |
636   | description | A description of the name,    | United Nations      |
637   |             | with all and only that        | Group of Experts on |
638   |             | information necessary to      | Geographical Names; |
639   |             | distinguish one name from     | American Library    |
640   |             | others with which it might be | Association-Library |
641   |             | confused.  Descriptions are   | of Congress         |
642   |             | not intended to provide       |                     |
643   |             | general background            |                     |
644   |             | information.                  |                     |
645   | since       | Indicates the first version   | 1.9, 2.0.1          |
646   |             | of CLDR where the name        |                     |
647   |             | appears.  (Required for new   |                     |
648   |             | items.)                       |                     |
649   | alias       | Alternative name of the key   |                     |
650   |             | or type, not limited in       |                     |
651   |             | number of characters.         |                     |
652   |             | Aliases are intended for      |                     |
653   |             | backwards compatibility, not  |                     |
654   |             | to provide all possible       |                     |
655   |             | alternate names or            |                     |
656   |             | designations.  (Optional)     |                     |
657   +-------------+-------------------------------+---------------------+
658
659   The file for the transform extension is "transform.xml".  The initial
660   version of that file contains the following information.
661
662
663
664
665
666
667
668
669
670
671Davis, et al.             Expires June 7, 2012                 [Page 12]
672
673Internet-Draft             BCP 47 Extension T              December 2011
674
675
676   <key extension="t" name="m0" description=
677         "Transliteration extension mechanism"/>
678      <type name="ungegn" description=
679         "United Nations Group of Experts on Geographical Names"/>
680      <type name="alaloc" description=
681         "American Library Association-Library of Congress"/>
682      <type name="bgn" description=
683         "US Board on Geographic Names"/>
684      <type name="mcst" description=
685         "Korean Ministry of Culture, Sports and Tourism"/>
686      <type name="iso" description=
687         "International Organization for Standardization"/>
688      <type name="din" description=
689         "Deutsches Institut fuer Normung"/>
690      <type name="gost" description=
691         "Euro-Asian Council for Standardization, Metrology
692          and Certification"/>
693   </key>
694
695   To get the version information in XML when working with the data
696   files, the XML parser must be validating.  When the 'core.zip' file
697   is unzipped, the 'dtd' directory will be at the same level as the
698   'bcp47' directory; that is required for correct validation.  For each
699   release after CLDR 1.8, types introduced in that release are also
700   marked in the data files by the XML attribute "since", such as in the
701   following example:
702   <type name="adp" since="1.9"/>
703
704   The data is also currently maintained in a source code repository,
705   with each release tagged, for viewing directly without unzipping.
706   For example, see:
707
708   o  http://unicode.org/repos/cldr/tags/release-1-7-2/common/bcp47/
709
710   o  http://unicode.org/repos/cldr/tags/release-1-8/common/bcp47/
711
712   For more information, see
713   http://cldr.unicode.org/index/bcp47-extension.
714
715
7163.  Acknowledgements
717
718   Thanks to John Emmons and the rest of the Unicode CLDR Technical
719   Committee for their work in developing the BCP 47 subtags for LDML.
720
721
722
723
724
725
726
727Davis, et al.             Expires June 7, 2012                 [Page 13]
728
729Internet-Draft             BCP 47 Extension T              December 2011
730
731
7324.  IANA Considerations
733
734   This document will require IANA to insert the record of Section 2.4
735   into the Language Extensions Registry, according to Section 3.7,
736   Extensions and the Extensions Registry of "Tags for Identifying
737   Languages" in [BCP47].  Per Section 5.2 of [BCP47], there might be
738   occasional (rare) requests by the Unicode Consortium (the "Authority"
739   listed in the record) for maintenance of this record.  Changes that
740   can be submitted to IANA without the publication of a new RFC are
741   limited to modification of the Comments, Contact_Email, Mailing_List,
742   and URL fields.  Any such requested changes MUST use the domain
743   'unicode.org' in any new addresses or URIs, MUST explicitly cite this
744   document (so that IANA can reference these requirements), and MUST
745   originate from the 'unicode.org' domain.  The domain or authority can
746   only be changed via a new RFC.
747
748   This document does not require IANA to create or maintain a new
749   registry or otherwise impact IANA.
750
751
7525.  Security Considerations
753
754   The security considerations for this extension are the same as those
755   for [BCP47].  See RFC 5646, Section 6, Security Considerations
756   [BCP47].
757
758
7596.  References
760
7616.1.  Normative References
762
763   [BCP47]    Davis, M., Ed. and A. Phillips, Ed., "Tags for the
764              Identification of Language (BCP47)", September 2009.
765
766   [RFC5234]  Crocker, Ed., "Augmented BNF for Syntax Specifications:
767              ABNF", 2008.
768
769   [RFC6067]  Davis, M., Ed., Phillips, A., Ed., and Y. Umaoka, Ed.,
770              "BCP 47 Extension U", September 2010.
771
772   [UTS35]    Davis, M., "Unicode Technical Standard #35: Locale Data
773              Markup Language (LDML)", December 2007,
774              <http://www.unicode.org/reports/tr35/>.
775
7766.2.  Informative References
777
778   [RFC3339]  Klyne, Ed. and Newman, Ed., "Date and Time on the
779              Internet: Timestamps", 2002.
780
781
782
783Davis, et al.             Expires June 7, 2012                 [Page 14]
784
785Internet-Draft             BCP 47 Extension T              December 2011
786
787
788   [W3C-TimeZones]
789              Phillips, Ed., "W3C Working Group Note: Working with Time
790              Zones", July 2011,
791              <http://www.w3.org/TR/2011/NOTE-timezone-20110705/>.
792
793   [ldml-registry]
794              "Registry for Common Locale Data Repository tag elements",
795              September 2009.
796
797URIs
798
799   [1]  <http://unicode.org/reports/tr35/>
800
801   [2]  <http://cldr.unicode.org/>
802
803   [3]  <http://unicode.org/Public/cldr/1.7.2/>
804
805
806Authors' Addresses
807
808   Mark Davis
809   Google
810
811   Email: mark@macchiato.com
812
813
814   Addison Phillips
815   Lab126
816
817   Email: addison@lab126.com
818
819
820   Yoshito Umaoka
821   IBM
822
823   Email: yoshito_umaoka@us.ibm.com
824
825
826   Courtney Falk
827   Infinite Automata
828
829   Email: court@infiauto.com
830
831
832
833
834
835
836
837
838
839Davis, et al.             Expires June 7, 2012                 [Page 15]
840
841