• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2"http://www.w3.org/TR/html4/loose.dtd">
3<html>
4
5<head>
6<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
7<meta http-equiv="Content-Language" content="en-us">
8<link rel="stylesheet" href="http://www.unicode.org/reports/reports.css"
9	type="text/css">
10<title>UTS #35: Unicode LDML: General</title>
11<style type="text/css">
12<!--
13.dtd {
14	font-family: monospace;
15	font-size: 90%;
16	background-color: #CCCCFF;
17	border-style: dotted;
18	border-width: 1px;
19}
20
21.xmlExample {
22	font-family: monospace;
23	font-size: 80%
24}
25
26.blockedInherited {
27	font-style: italic;
28	font-weight: bold;
29	border-style: dashed;
30	border-width: 1px;
31	background-color: #FF0000
32}
33
34.inherited {
35	font-weight: bold;
36	border-style: dashed;
37	border-width: 1px;
38	background-color: #00FF00
39}
40
41.element {
42	font-weight: bold;
43	color: red;
44}
45
46.attribute {
47	font-weight: bold;
48	color: maroon;
49}
50
51.attributeValue {
52	font-weight: bold;
53	color: blue;
54}
55
56li, p {
57	margin-top: 0.5em;
58	margin-bottom: 0.5em
59}
60
61h2, h3, h4, table {
62	margin-top: 1.5em;
63	margin-bottom: 0.5em;
64}
65-->
66</style>
67</head>
68
69<body>
70
71	<table class="header" width="100%">
72		<tr>
73			<td class="icon"><a href="http://unicode.org"> <img
74					alt="[Unicode]" src="http://unicode.org/webscripts/logo60s2.gif"
75					width="34" height="33"
76					style="vertical-align: middle; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px; border-top-width: 0px;"></a>&nbsp;
77				<a class="bar" href="http://www.unicode.org/reports/">Technical
78					Reports</a></td>
79		</tr>
80		<tr>
81			<td class="gray">&nbsp;</td>
82		</tr>
83	</table>
84	<div class="body">
85		<h2 style="text-align: center">
86			Unicode Technical
87			Standard #35
88		</h2>
89		<h1 style="text-align: center">
90			Unicode Locale Data Markup Language (LDML)<br> Part 2: General
91		</h1>
92
93		<!-- At least the first row of this header table should be identical across the parts of this UTS. -->
94		<table border="1" cellpadding="2" cellspacing="0" class="wide">
95			<tr>
96				<td>Version</td>
97				<td>34</td>
98			</tr>
99			<tr>
100				<td>Editors</td>
101				<td>Yoshito Umaoka (<a href="mailto:yoshito_umaoka@us.ibm.com">yoshito_umaoka@us.ibm.com</a>)
102					and <a href="tr35.html#Acknowledgments">other CLDR committee
103						members</a></td>
104			</tr>
105		</table>
106
107		<p>
108			For the full header, summary, and status, see <a href="tr35.html">
109				Part 1: Core</a>
110		</p>
111
112		<h3>
113			<i>Summary</i>
114		</h3>
115		<p>
116			This document describes parts of an XML format (<i>vocabulary</i>)
117			for the exchange of structured locale data. This format is used in
118			the <a href="http://cldr.unicode.org/">Unicode Common Locale Data
119				Repository</a>.
120		</p>
121
122		<p>
123			This is a partial document, describing general parts of the LDML:
124			display names &amp; transforms, etc. For the other parts of the LDML
125			see the <a href="tr35.html">main LDML document</a> and the links
126			above.
127		</p>
128
129		<h3>
130			<i>Status</i>
131		</h3>
132
133		<!-- NOT YET APPROVED
134		<p>
135				<i class="changed">This is a<b><font color="#ff3333">
136				draft </font></b>document which may be updated, replaced, or superseded by
137				other documents at any time. Publication does not imply endorsement
138				by the Unicode Consortium. This is not a stable document; it is
139				inappropriate to cite this document as other than a work in
140				progress.
141			</i>
142		</p>
143		 END NOT YET APPROVED -->
144		<!-- APPROVED -->
145		<p>
146			<i>This document has been reviewed by Unicode members and other
147				interested parties, and has been approved for publication by the
148				Unicode Consortium. This is a stable document and may be used as
149				reference material or cited as a normative reference by other
150				specifications.</i>
151		</p>
152		<!-- END APPROVED -->
153
154		<blockquote>
155			<p>
156				<i><b>A Unicode Technical Standard (UTS)</b> is an independent
157					specification. Conformance to the Unicode Standard does not imply
158					conformance to any UTS.</i>
159			</p>
160		</blockquote>
161		<p>
162			<i>Please submit corrigenda and other comments with the CLDR bug
163				reporting form [<a href="tr35.html#Bugs">Bugs</a>]. Related
164				information that is useful in understanding this document is found
165				in the <a href="tr35.html#References">References</a>. For the latest
166				version of the Unicode Standard see [<a href="tr35.html#Unicode">Unicode</a>].
167				For a list of current Unicode Technical Reports see [<a
168				href="tr35.html#Reports">Reports</a>]. For more information about
169				versions of the Unicode Standard, see [<a href="tr35.html#Versions">Versions</a>].
170			</i>
171		</p>
172
173		<!-- This section of Parts should be identical in all of the parts of this UTS. -->
174		<h2>
175			<a name="Parts" href="#Parts">Parts</a>
176		</h2>
177		<p>The LDML specification is divided into the following parts:</p>
178		<ul class="toc">
179			<li>Part 1: <a href="tr35.html#Contents">Core</a> (languages,
180				locales, basic structure)
181			</li>
182			<li>Part 2: <a href="tr35-general.html#Contents">General</a>
183				(display names &amp; transforms, etc.)
184			</li>
185			<li>Part 3: <a href="tr35-numbers.html#Contents">Numbers</a>
186				(number &amp; currency formatting)
187			</li>
188			<li>Part 4: <a href="tr35-dates.html#Contents">Dates</a> (date,
189				time, time zone formatting)
190			</li>
191			<li>Part 5: <a href="tr35-collation.html#Contents">Collation</a>
192				(sorting, searching, grouping)
193			</li>
194			<li>Part 6: <a href="tr35-info.html#Contents">Supplemental</a>
195				(supplemental data)
196			</li>
197			<li>Part 7: <a href="tr35-keyboards.html#Contents">Keyboards</a>
198				(keyboard mappings)
199			</li>
200		</ul>
201		<h2>
202			<a name="Contents" href="#Contents">Contents of Part 2, General</a>
203		</h2>
204		<!-- START Generated TOC: CheckHtmlFiles -->
205		<ul class="toc">
206			<li>1 <a href="#Display_Name_Elements">Display Name Elements</a></li>
207			<li>2 <a href="#Layout_Elements">Layout Elements</a></li>
208			<li>3 <a href="#Character_Elements">Character Elements</a>
209				<ul class="toc">
210					<li>3.1 <a href="#Exemplars">Exemplars</a>
211						<ul class="toc">
212							<li>3.1.1 <a href="#ExemplarSyntax">Exemplar Syntax</a></li>
213							<li>3.1.2 <a href="#Restrictions">Restrictions</a></li>
214						</ul>
215					</li>
216					<li>3.2 <a href="#Character_Mapping">Mapping</a></li>
217					<li>3.3 <a href="#IndexLabels">Index Labels</a></li>
218					<li>3.4 <a href="#Ellipsis">Ellipsis</a></li>
219					<li>3.5 <a href="#Character_More_Info">More Information</a></li>
220					<li>3.6 <a href="#Character_Parse_Lenient">Parse Lenient</a></li>
221				</ul>
222			</li>
223			<li>4 <a href="#Delimiter_Elements">Delimiter Elements</a></li>
224			<li>5 <a href="#Measurement_System_Data">Measurement System
225					Data</a>
226				<ul class="toc">
227					<li>5.1 <a href="#Measurement_Elements">Measurement
228							Elements (deprecated)</a></li>
229				</ul>
230			</li>
231			<li>6 <a href="#Unit_Elements">Unit Elements</a>
232				<ul class="toc">
233					<li>6.1 <a href="#perUnitPatterns">per Unit patterns</a></li>
234					<li>6.2 <a href="#Unit_Sequences">Unit Sequences</a></li>
235					<li>6.3 <a href="#durationUnit">durationUnit</a></li>
236					<li>6.4 <a href="#coordinateUnit">coordinateUnit</a></li>
237					<li>6.5 <a href="#Territory_Based_Unit_Preferences">Territory-Based
238							Unit Preferences</a></li>
239				</ul>
240			</li>
241			<li>7 <a href="#POSIX_Elements">POSIX Elements</a></li>
242			<li>8 <a href="#Reference_Elements">Reference Element</a></li>
243			<li>9 <a href="#Segmentations">Segmentations</a>
244				<ul class="toc">
245					<li>9.1 <a href="#Segmentation_Inheritance">Segmentation
246							Inheritance</a></li>
247					<li>9.2 <a href="#Segmentation_Exceptions">Segmentation
248							Suppressions</a></li>
249				</ul>
250			</li>
251			<li>10 <a href="#Transforms">Transforms</a>
252				<ul class="toc">
253					<li>10.1 <a href="#Inheritance">Inheritance</a>
254						<ul class="toc">
255							<li>10.1.1 <a href="#Pivots">Pivots</a></li>
256						</ul>
257					</li>
258					<li>10.2 <a href="#Variants">Variants</a></li>
259					<li>10.3 <a href="#Transform_Rules_Syntax">Transform Rules
260							Syntax</a>
261						<ul class="toc">
262							<li>10.3.1 <a href="#Dual_Rules">Dual Rules</a></li>
263							<li>10.3.2 <a href="#Context">Context</a></li>
264							<li>10.3.3 <a href="#Revisiting">Revisiting</a></li>
265							<li>10.3.4 <a href="#Example">Example</a></li>
266							<li>10.3.5 <a href="#Rule_Syntax">Rule Syntax</a></li>
267							<li>10.3.6 <a href="#Transform_Rules">Transform Rules</a></li>
268							<li>10.3.7 <a href="#Variable_Definition_Rules">Variable
269									Definition Rules</a></li>
270							<li>10.3.8 <a href="#Filter_Rules">Filter Rules</a></li>
271							<li>10.3.9 <a href="#Conversion_Rules">Conversion Rules</a></li>
272							<li>10.3.10 <a
273								href="#Intermixing_Transform_Rules_and_Conversion_Rules">Intermixing
274									Transform Rules and Conversion Rules</a></li>
275							<li>10.3.11 <a href="#Inverse_Summary">Inverse Summary</a></li>
276						</ul>
277					</li>
278				</ul>
279			</li>
280			<li>11 <a href="#ListPatterns">List Patterns</a>
281				<ul class="toc">
282					<li>11.1 <a href="#List_Gender">Gender of Lists</a></li>
283				</ul>
284			</li>
285			<li>12 <a href="#Context_Transform_Elements">ContextTransform
286					Elements</a>
287				<ul class="toc">
288					<li>Table: <a
289						href="#contextTransformUsage_type_attribute_values">Element
290							contextTransformUsage type attribute values</a></li>
291				</ul>
292			</li>
293			<li>13 <a href="#Choice_Patterns">Choice Patterns</a></li>
294			<li>14 <a href="#Annotations">Annotations and Labels</a>
295			  <ul class="toc">
296			    <li>14.1 <a href="#SynthesizingNames">Synthesizing Sequence Names</a></li>
297			    <li>14.2 <a href="#Character_Labels">Annotations Character Labels</a></li>
298			    <li>14.3 <a href="#Typographic_Names">Typographic Names</a></li>
299		      </ul>
300			</li>
301		</ul>
302		<!-- END Generated TOC: CheckHtmlFiles -->
303		<h2>
304			1 <a name="Display_Name_Elements" href="#Display_Name_Elements">Display
305				Name Elements</a>
306		</h2>
307		<p class="dtd">&lt;!ELEMENT localeDisplayNames ( alias | (
308			localeDisplayPattern?, languages?, scripts?, territories?,
309			subdivisions?, variants?, keys?, types?, transformNames?,
310			measurementSystemNames?, codePatterns?, special* ) )&gt;</p>
311		<p>
312			Display names for scripts, languages, countries, currencies, and
313			variants in this locale are supplied by this element. They supply
314			localized names for these items for use in user-interfaces for
315			various purposes such as displaying menu lists, displaying a language
316			name in a dialog, and so on. Capitalization should follow the
317			conventions used in the middle of running text; the
318			&lt;contextTransforms&gt; element may be used to specify the
319			appropriate capitalization for other contexts (see <i>Section 12
320				<a href="#Context_Transform_Elements">ContextTransform Elements</a>
321			</i>). Examples are given below.
322		</p>
323
324		<blockquote>
325			<p class="note">
326				<b>Note:</b> The "<span style="color: blue">en</span>" locale may
327				contain translated names for deprecated codes for debugging
328				purposes. Translation of deprecated codes into other languages is
329				discouraged.
330			</p>
331		</blockquote>
332
333		<p>Where present, the display names must be unique; that is, two
334			distinct code would not get the same display name. (There is one
335			exception to this: in time zones, where parsing results would give
336			the same GMT offset, the standard and daylight display names can be
337			the same across different time zone IDs.)</p>
338
339		<p>
340			Any translations should follow customary practice for the locale in
341			question. For more information, see [<a href="tr35.html#DataFormats">Data
342				Formats</a>].
343		</p>
344
345		<p class="element2">&lt;localeDisplayPattern&gt;</p>
346
347		<p class="dtd">&lt;!ELEMENT localeDisplayPattern ( alias |
348			(localePattern*, localeSeparator*, localeKeyTypePattern*, special*) )
349			&gt;</p>
350
351		<p>For compound language (locale) IDs such as "pt_BR" which
352			contain additional subtags beyond the initial language code: When the
353			&lt;languages&gt; data does not explicitly specify a display name
354			such as "Brazilian Portuguese" for a given compound language ID,
355			"Portuguese (Brazil)" from the display names of the subtags.</p>
356
357		<p>It includes three sub-elements:</p>
358		<ul>
359			<li>The &lt;localePattern&gt; element specifies a pattern such
360				as "{0} ({1})" in which {0} is replaced by the display name for the
361				primary language subtag and {1} is replaced by a list of the display
362				names for the remaining subtags.</li>
363			<li>The &lt;localeSeparator&gt; element specifies a pattern such
364				as "{0}, {1}" used when appending a subtag display name to the list
365				in the &lt;localePattern&gt; subpattern {1} above. If that list
366				includes more than one display name, then &lt;localeSeparator&gt;
367				subpattern {1} represents a new display name to be appended to the
368				current list in {0}. <em>Note: Before CLDR 24, the
369					&lt;localeSeparator&gt; element specified a separator string such
370					as ", ", not a pattern.</em>
371			</li>
372			<li>The &lt;localeKeyTypePattern&gt; element specifies the
373				pattern used to display key-type pairs, such as "{0}: {1}"</li>
374		</ul>
375
376		<p>For example, for the locale identifier
377			zh_Hant_CN_co_pinyin_cu_USD, the display would be "Chinese
378			(Traditional, China, Pinyin Sort Order, Currency: USD)". The key-type
379			for co_pinyin doesn't use the localeKeyTypePattern because there is a
380			translation for the key-type in English:</p>
381
382		<blockquote>
383			<p>&lt;type type="pinyin" key="collation"&gt;Pinyin Sort
384				Order&lt;/type&gt;</p>
385		</blockquote>
386
387		<p class="element2">&lt;languages&gt;</p>
388
389		<p>
390			This contains a list of elements that provide the user-translated
391			names for language codes, as described in <i> <a
392				href="tr35.html#Unicode_Language_and_Locale_Identifiers">Section
393					3, Unicode Language and Locale Identifiers</a></i>.
394		</p>
395
396		<blockquote>
397			<pre>&lt;language type="<span style="color: blue">ab</span>"&gt;<span
398					style="color: blue">Abkhazian</span>&lt;/language&gt;
399&lt;language type="<span style="color: blue">aa</span>"&gt;<span
400					style="color: blue">Afar</span>&lt;/language&gt;
401&lt;language type="<span style="color: blue">af</span>"&gt;<span
402					style="color: blue">Afrikaans</span>&lt;/language&gt;
403&lt;language type="<span style="color: blue">sq</span>"&gt;<span
404					style="color: blue">Albanian</span>&lt;/language&gt;
405</pre>
406		</blockquote>
407		<p>There should be no expectation that the list of
408			languages with translated names be complete: there are thousands of
409			languages that could have translated names. For debugging purposes or
410			comparison, when a language display name is missing, the Description
411			field of the language subtag registry can be used to supply a
412			fallback English user-readable name.</p>
413		<p>The type can actually be any locale ID as specified above. The
414			set of which locale IDs is not fixed, and depends on the locale. For
415			example, in one language one could translate the following locale
416			IDs, and in another, fall back on the normal composition.</p>
417
418		<table border="1" cellpadding="4" cellspacing="0">
419			<tr>
420				<th width="33%">type</th>
421				<th width="33%">translation</th>
422				<th width="34%">composition</th>
423			</tr>
424			<tr>
425				<td width="33%">nl_BE</td>
426				<td width="33%">Flemish</td>
427				<td width="34%">Dutch (Belgium)</td>
428			</tr>
429			<tr>
430				<td width="33%">zh_Hans</td>
431				<td width="33%">Simplified Chinese</td>
432				<td width="34%">Chinese (Simplified)</td>
433			</tr>
434			<tr>
435				<td width="33%">en_GB</td>
436				<td width="33%">British English</td>
437				<td width="34%">English (United Kingdom)</td>
438			</tr>
439		</table>
440
441		<p>Thus when a complete locale ID is formed by composition, the
442			longest match in the language type is used, and the remaining fields
443			(if any) added using composition.</p>
444
445		<p>Alternate short forms may be provided for some languages (and
446			for territories and other display names), for example.</p>
447
448		<blockquote>
449			<pre>&lt;language type="<span style="color: blue">az</span>"&gt;<span
450					style="color: blue">Azerbaijani</span>&lt;/language&gt;
451&lt;language type="<span style="color: blue">az</span>" alt="<span
452					style="color: blue">short</span>"&gt;<span style="color: blue">Azeri</span>&lt;/language&gt;
453&lt;language type="<span style="color: blue">en_GB</span>"&gt;<span
454					style="color: blue">British English</span>&lt;/language&gt;
455&lt;language type="<span style="color: blue">en_GB</span>" alt="<span
456					style="color: blue">short</span>"&gt;<span style="color: blue">U.K. English</span>&lt;/language&gt;
457&lt;language type="<span style="color: blue">en_US</span>"&gt;<span
458					style="color: blue">American English</span>&lt;/language&gt;
459&lt;language type="<span style="color: blue">en_US</span>" alt="<span
460					style="color: blue">short</span>"&gt;<span style="color: blue">U.S. English</span>&lt;/language&gt;
461</pre>
462		</blockquote>
463
464		<p class="element2">&lt;scripts&gt;</p>
465
466		<p>
467			This element can contain an number of script elements. Each script
468			element provides the localized name for a script code, as described
469			in <i> <a
470				href="tr35.html#Unicode_Language_and_Locale_Identifiers">Section
471					3, Unicode Language and Locale Identifiers</a>
472			</i>(see also <i>UAX #24: Script Names</i> [<a
473				href="http://www.unicode.org/reports/tr41/#UAX24">UAX24</a>]). For
474			example, in the language of this locale, the name for the Latin
475			script might be "Romana", and for the Cyrillic script is "Kyrillica".
476			That would be expressed with the following.
477		</p>
478
479		<blockquote>
480			<pre>&lt;script type="<span style="color: blue">Latn</span>"&gt;<span
481					style="color: blue">Romana</span>&lt;/script&gt;
482&lt;script type="<span style="color: blue">Cyrl</span>"&gt;<span
483					style="color: blue">Kyrillica</span>&lt;/script&gt;
484</pre>
485		</blockquote>
486
487		<p>The script names are most commonly used in conjunction with a
488			language name, using the &lt;localePattern&gt; combining pattern, and
489			the default form of the script name should be suitable for such use.
490			When a script name requires a different form for stand-alone use,
491			this can be specified using the "stand-alone" alternate:</p>
492
493		<blockquote>
494			<pre>&lt;script type="<span style="color: blue">Hans</span>"&gt;<span
495					style="color: blue">Simplified</span>&lt;/script&gt;
496&lt;script type="<span style="color: blue">Hans</span>" alt="<span
497					style="color: blue">stand-alone</span>"&gt;<span
498					style="color: blue">Simplified Han</span>&lt;/script&gt;
499&lt;script type="<span style="color: blue">Hant</span>"&gt;<span
500					style="color: blue">Traditional</span>&lt;/script&gt;
501&lt;script type="<span style="color: blue">Hant</span>" alt="<span
502					style="color: blue">stand-alone</span>"&gt;<span
503					style="color: blue">Traditional Han</span>&lt;/script&gt;
504</pre>
505		</blockquote>
506
507		<p>This will produce results such as the following:</p>
508		<ul>
509			<li>Display name of language + script, using
510				&lt;localePattern&gt;: “Chinese (Simplified)”</li>
511			<li>Display name of script alone, using &lt;localePattern&gt;:
512				“Simplified Han”</li>
513		</ul>
514
515		<p class="element2">&lt;territories&gt;</p>
516
517		<p>
518			This contains a list of elements that provide the user-translated
519			names for territory codes, as described in <i> <a
520				href="tr35.html#Unicode_Language_and_Locale_Identifiers">Section
521					3, Unicode Language and Locale Identifiers</a></i>.
522		</p>
523
524		<blockquote>
525			<pre>&lt;territory type="<span style="color: blue">AD</span>"&gt;<span
526					style="color: blue">Andorra</span>&lt;/territory&gt;
527&lt;territory type="<span style="color: blue">AF</span>"&gt;<span
528					style="color: blue">Afghanistan</span>&lt;/territory&gt;
529&lt;territory type="<span style="color: blue">AL</span>"&gt;<span
530					style="color: blue">Albania</span>&lt;/territory&gt;
531&lt;territory type="<span style="color: blue">AO</span>"&gt;<span
532					style="color: blue">Angola</span>&lt;/territory&gt;
533&lt;territory type="<span style="color: blue">DZ</span>"&gt;<span
534					style="color: blue">Algeria</span>&lt;/territory&gt;
535&lt;territory type="<span style="color: blue">GB</span>"&gt;<span
536					style="color: blue">United Kingdom</span>&lt;/territory&gt;
537&lt;territory type="<span style="color: blue">GB</span>" alt="<span
538					style="color: blue">short</span>"&gt;<span style="color: blue">U.K.</span>&lt;/territory&gt;
539&lt;territory type="<span style="color: blue">US</span>"&gt;<span
540					style="color: blue">United States</span>&lt;/territory&gt;
541&lt;territory type="<span style="color: blue">US</span>" alt="<span
542					style="color: blue">short</span>"&gt;<span style="color: blue">U.S.</span>&lt;/territory&gt;
543</pre>
544		</blockquote>
545
546		<p class="element2">&lt;variants&gt;</p>
547
548		<p>
549			This contains a list of elements that provide the user-translated
550			names for the <i>variant_code</i> values described in <i> <a
551				href="tr35.html#Unicode_Language_and_Locale_Identifiers">Section
552					3, Unicode Language and Locale Identifiers</a>
553			</i>.
554		</p>
555
556		<blockquote>
557			<pre>&lt;variant type="<span style="color: blue">nynorsk</span>"&gt;<span
558					style="color: blue">Nynorsk</span>&lt;/variant&gt;
559</pre>
560		</blockquote>
561
562		<p class="element2">&lt;keys&gt;</p>
563
564		<p>
565			This contains a list of elements that provide the user-translated
566			names for the <i>key</i> values described in <i> <a
567				href="tr35.html#Unicode_Language_and_Locale_Identifiers">Section
568					3, Unicode Language and Locale Identifiers</a></i>.
569		</p>
570
571		<blockquote>
572			<pre>&lt;key type="<span style="color: blue">collation</span>"&gt;<span
573					style="color: blue">Sortierung</span>&lt;/key&gt;
574</pre>
575		</blockquote>
576
577		<p class="element2">&lt;types&gt;</p>
578
579		<p>
580			This contains a list of elements that provide the user-translated
581			names&nbsp; for the <i>type</i> values described in <i> <a
582				href="tr35.html#Unicode_Language_and_Locale_Identifiers">Section
583					3, Unicode Language and Locale Identifiers</a>
584			</i>. Since the translation of an option name may depend on the <i>key</i>
585			it is used with, the latter is optionally supplied.
586		</p>
587
588		<blockquote>
589			<pre>&lt;type type="<span style="color: blue">phonebook</span>" key="<span
590					style="color: blue">collation</span>"&gt;<span style="color: blue">Telefonbuch</span>&lt;/type&gt;
591</pre>
592		</blockquote>
593
594		<p class="element2">&lt;measurementSystemNames&gt;</p>
595
596		<p>
597			This contains a list of elements that provide the user-translated
598			names for systems of measurement. The types currently supported are "<span
599				style="color: blue">US</span>", "<span style="color: blue">metric</span>",
600			and "<span style="color: blue">UK</span>".
601		</p>
602
603		<blockquote>
604			<pre>&lt;measurementSystemName type="<span style="color: blue">US</span>"&gt;<span
605					style="color: blue">U.S.</span>&lt;/type&gt;
606</pre>
607		</blockquote>
608
609		<p class="note">
610			<b>Note:</b> In the future, we may need to add display names for the
611			particular measurement units (millimeter versus millimetre versus
612			whatever the Greek, Russian, etc are), and a message format for
613			positioning those with respect to numbers. For example, "{number}
614			{unitName}" in some languages, but "{unitName} {number}" in others.
615		</p>
616
617		<p class="element2">&lt;transformNames&gt;</p>
618
619		<p>&nbsp; </p>
620
621		<blockquote>
622			<pre>&lt;transformName type="<span style="color: blue">Numeric</span>"&gt;<span
623					style="color: blue">Numeric</span>&lt;/type&gt;
624</pre>
625		</blockquote>
626
627		<p class="element2">&lt;codePatterns&gt;</p>
628
629		<blockquote>
630			<pre>&lt;codePattern type="<span style="color: blue">language</span>"&gt;<span
631					style="color: blue">Language: {0}</span>&lt;/type&gt;
632</pre>
633		</blockquote>
634		<p class="dtd">
635			&lt;!ELEMENT subdivisions ( alias | ( subdivision | special )* ) &gt;<br>
636			&lt;!ELEMENT subdivision ( #PCDATA )&gt;
637		</p>
638		<p>Note that the subdivision names are in separate files, in the
639			subdivisions/ directory. The type values are the fully qualified
640			subdivsion names. For example:</p>
641		<p class="xmlExample">
642			&lt;subdivision type=&quot;AL-04&quot;&gt;Fier
643			County&lt;/subdivision&gt;<br> &lt;subdivision
644			type=&quot;AL-FR&quot;&gt;Fier&lt;/subdivision&gt; &lt;!-- in AL-04 :
645			Fier County --&gt;<br> &lt;subdivision
646			type=&quot;AL-LU&quot;&gt;Lushnjë&lt;/subdivision&gt; &lt;!-- in
647			AL-04 : Fier County --&gt;<br> &lt;subdivision
648			type=&quot;AL-MK&quot;&gt;Mallakastër&lt;/subdivision&gt; &lt;!-- in
649			AL-04 : Fier County --&gt;
650		</p>
651		<p>
652			See also <strong>Part 6</strong> <em>Section 2.1.1 <a
653				href="tr35-info.html#Subdivision_Containment">Subdivision
654					Containment</a></em>.
655		</p>
656
657
658		<h2>
659			2 <a name="Layout_Elements" href="#Layout_Elements">Layout
660				Elements</a>
661		</h2>
662
663
664		<p class="dtd">&lt;!ELEMENT layout ( alias | (orientation*,
665			inList*, inText*, special*) ) &gt;</p>
666		<p>This top-level element specifies general layout features. It
667			currently only has one possible element (other than &lt;special&gt;,
668			which is always permitted).</p>
669
670		<p class="dtd">
671			&lt;!ELEMENT orientation ( characterOrder*, lineOrder*, special* )
672			&gt;<br> &lt;!ELEMENT characterOrder ( #PCDATA ) &gt;<br>
673			&lt;!ELEMENT lineOrder ( #PCDATA ) &gt;
674		</p>
675
676		<p>The lineOrder and characterOrder elements specify the default
677			general ordering of lines within a page, and characters within a
678			line. The possible values are:</p>
679
680		<table>
681			<tr>
682				<th>Direction</th>
683				<th>Value</th>
684			</tr>
685			<tr>
686				<td rowspan="2">Vertical</td>
687				<td>top-to-bottom</td>
688			</tr>
689			<tr>
690				<td>bottom-to-top</td>
691			</tr>
692			<tr>
693				<td rowspan="2">Horizontal</td>
694				<td>left-to-right</td>
695			</tr>
696			<tr>
697				<td>right-to-left</td>
698			</tr>
699		</table>
700
701		<p>
702			If the value of lineOrder is one of the vertical values, then the
703			value of characterOrder must be one of the horizontal values, and
704			vice versa. For example, for English the lines are top-to-bottom, and
705			the characters are left-to-right. For Mongolian (in the Mongolian
706			Script) the lines are right-to-left, and the characters are top to
707			bottom. This does not override the ordering behavior of bidirectional
708			text; it does, however, supply the paragraph direction for that text
709			(for more information, see <i>UAX #9: The Bidirectional Algorithm</i>
710			[<a href="http://www.unicode.org/reports/tr41/#UAX9">UAX9</a>]).
711		</p>
712
713		<p>For dates, times, and other data to appear in the right order,
714			the display for them should be set to the orientation of the locale.</p>
715
716		<p>&lt;inList&gt; (deprecated)</p>
717
718		<p>
719			The &lt;inList&gt; element is deprecated and has been superseded by
720			the &lt;contextTransforms&gt; element; see <i>Section 12 <a
721				href="#Context_Transform_Elements">ContextTransform Elements</a>
722			</i>.
723		</p>
724
725		<p>This element controls whether display names (language,
726			territory, etc) are title cased in GUI menu lists and the like. It is
727			only used in languages where the normal display is lower case, but
728			title case is used in lists. There are two options:</p>
729
730		<pre>&lt;inList casing="titlecase-words"&gt;</pre>
731		<pre>&lt;inList casing="titlecase-firstword"&gt;</pre>
732
733		<p>
734			In both cases, the title case operation is the default title case
735			function defined by Chapter 3 of <i>[<a href="tr35.html#Unicode">Unicode</a>]
736			</i>. In the second case, only the first word (using the word boundaries
737			for that locale) will be title cased. The results can be fine-tuned
738			by using alt="list" on any element where titlecasing as defined by
739			the Unicode Standard will produce the wrong value. For example,
740			suppose that "turc de Crimée" is a value, and the title case should
741			be "Turc de Crimée". Then that can be expressed using the alt="list"
742			value.
743		</p>
744
745		<p>&lt;inText&gt; (deprecated)</p>
746
747		<p>
748			The &lt;inList&gt; element is deprecated and has been superseded by
749			the &lt;contextTransforms&gt; element; see <i>Section 12 <a
750				href="#Context_Transform_Elements">ContextTransform Elements</a>
751			</i>.
752		</p>
753
754		<p>This element indicates the casing of the data in the category
755			identified by the inText type attribute, when that data is written in
756			text or how it would appear in a dictionary. For example :</p>
757
758		<pre>&lt;inText type="languages"&gt;lowercase-words&lt;/inText&gt;</pre>
759
760		<p>indicates that language names embedded in text are normally
761			written in lower case. The possible values and their meanings are :</p>
762
763		<ul>
764			<li>titlecase-words : all words in the phrase should be title
765				case</li>
766			<li>titlecase-firstword : the first word should be title case</li>
767			<li>lowercase-words : all words in the phrase should be lower
768				case</li>
769			<li>mixed : a mixture of upper and lower case is permitted.
770				generally used when the correct value is unknown.</li>
771		</ul>
772
773
774		<h2>
775			3 <a name="Character_Elements" href="#Character_Elements">Character
776				Elements</a>
777		</h2>
778
779
780		<p class="dtd">&lt;!ELEMENT characters ( alias | ( exemplarCharacters*, ellipsis*, moreInformation*, stopwords*, indexLabels*, mapping*, parseLenients*, special* ) ) &gt;</p>
781		<p>
782			The &lt;characters&gt; element provides optional information about
783			characters that are in common use in the locale, and information that
784			can be helpful in picking resources or data appropriate for the
785			locale, such as when choosing among character encodings that are
786			typically used to transmit data in the language of the locale. It may
787			also be used to help reduce confusability issues: see [<a
788				href="http://www.unicode.org/reports/tr41/#UTR36">UTR39</a>]. It
789			typically only occurs in a language locale, not in a
790			language/territory locale. The stopwords are an experimental feature,
791			and should not be used.
792		</p>
793		<h3>
794			3.1 <a name="Exemplars" href="#Exemplars">Exemplars</a>
795		</h3>
796
797		<p>Exemplars are characters used by a language, separated into
798			different categories. The following table provides a summary, with
799			more details below.</p>
800		<table>
801			<tr>
802				<th scope="col">Type</th>
803				<th scope="col">Description</th>
804				<th scope="col">Examples</th>
805			</tr>
806			<tr>
807				<td>main / standard</td>
808				<td>Main letters used in the language</td>
809				<td style="font-family: Georgia, 'Times New Roman', Times, serif">a-z
810					å æ ø</td>
811			</tr>
812			<tr>
813				<td><span class="element2">auxiliary</span></td>
814				<td>Additional characters for common foreign words, technical
815					usage</td>
816				<td style="font-family: Georgia, 'Times New Roman', Times, serif"817					à ă â å ä ã ā æ ç é è ĕ ê ë ē í ì ĭ î ï ī ñ ó ò ŏ ô ö ø ō œ ú ù ŭ û
818					ü ū ÿ</td>
819			</tr>
820			<tr>
821				<td><span class="element2">index</span></td>
822				<td>Characters for the header of an index</td>
823				<td style="font-family: Georgia, 'Times New Roman', Times, serif">A
824					B C D E F G H I J K L M N O P Q R S T U V W X Y Z</td>
825			</tr>
826			<tr>
827				<td>punctuation</td>
828				<td>Common punctuation</td>
829				<td style="font-family: Georgia, 'Times New Roman', Times, serif">-
830					‐ – — , ; \: ! ? . … “ ” ‘ ’ ( ) [ ] § @ * / &amp; # † ‡ ′ ″</td>
831			</tr>
832		  <tr>
833			  <td>numbers</td>
834			  <td>The characters needed to display the common number formats: decimal, percent, and currency.</td>
835			  <td style="font-family: Georgia, 'Times New Roman', Times, serif">[\u061C\u200E \- , ٫ ٬ . % ٪ ‰ ؉ + 0٠ 1١ 2٢ 3٣ 4٤ 5٥ 6٦ 7٧ 8٨ 9٩]</td>
836		  </tr>
837		</table>
838		<p>
839			The basic exemplar character sets (main and auxiliary) contain the
840			commonly used letters for a given modern form of a language, which
841			can be for testing and for determining the appropriate repertoire of
842			letters for charset conversion or collation. ("Letter" is interpreted
843			broadly, as anything having the property Alphabetic in the [<a
844				href="http://unicode.org/reports/tr41/#UAX44">UAX44</a>], which also
845			includes syllabaries and ideographs.) It is not a complete set of
846			letters used for a language, nor should it be considered to apply to
847			multiple languages in a particular country. Punctuation and other
848			symbols should not be included in the main and auxiliary sets. In
849			particular, format characters like CGJ are not included.
850		</p>
851		<p>
852			There are five sets altogether: main, auxiliary, punctuation, numbers, and
853			index. The <i>main</i> set should contain the minimal set required
854			for users of the language, while the <i>auxiliary</i> exemplar set is
855			designed to encompass additional characters: those non-native or
856			historical characters that would customarily occur in common
857			publications, dictionaries, and so on. Major style guidelines are
858			good references for the auxiliary set. So, for example, if Irish
859			newspapers and magazines would commonly have Danish names using å,
860			for example, then it would be appropriate to include å in the
861			auxiliary exemplar characters; just not in the main exemplar set.
862			Thus English has the following:
863		</p>
864
865		<p>
866			&lt;exemplarCharacters&gt;[a b c d e f g h i j k l m n o p q r s t u
867			v w x y z]&lt;/exemplarCharacters&gt;<br> &lt;exemplarCharacters
868			type="auxiliary"&gt;[á à ă â å ä ã ā æ ç é è ĕ ê ë ē í ì ĭ î ï ī ñ ó
869			ò ŏ ô ö ø ō œ ú ù ŭ û ü ū ÿ]&lt;/exemplarCharacters&gt;
870		</p>
871
872		<p>For a given language, there are a few factors that help for
873			determining whether a character belongs in the auxiliary set, instead
874			of the main set:</p>
875
876		<ul>
877			<li>The character is not available on all normal keyboards.</li>
878			<li>It is acceptable to always use spellings that avoid that
879				character.</li>
880		</ul>
881
882		<p>For example, the exemplar character set for en (English) is the
883			set [a-z]. This set does not contain the accented letters that are
884			sometimes seen in words like "résumé" or "naïve", because it is
885			acceptable in common practice to spell those words without the
886			accents. The exemplar character set for fr (French), on the other
887			hand, must contain those characters: [a-z é è ù ç à â ê î ô û æ œ ë ï
888			ÿ]. The main set typically includes those letters commonly
889			"alphabet".</p>
890
891		<p>
892			The <em>punctuation</em> set consists of common punctuation
893			characters that are used with the language (corresponding to main and
894			auxiliary). Symbols may also be included where they are common in
895			plain text, such as ©. It does not include characters with narrow
896			technical usage, such as dictionary punctuation/symbols or copy-edit
897			symbols. For example, English would have something like the
898			following:
899		</p>
900
901		<blockquote>
902			- ‐ – — <br> , ; : ! ? . … <br> ' &lsquo; &rsquo; " &ldquo;
903			&rdquo; ′ ″ <br> ( ) [ ] { } ⟨ ⟩<br> © ® ™ @ &amp; ° ‧ ·/ #
904			% ¶ § * † ‡<br> + − ± × ÷ &lt; ≤ = ≅ ≥ &gt; √<br>
905		</blockquote>
906
907		<p>
908			The numbers exemplars does not currently include lesser-used characters: exponential notation (3.1 × 10²³, ∞, NAN). Nor does it contain the units or currency symbols such as $, ¥, ₹,… It does contain %, because that occurs in the percent format. It may contain some special formatting characters like the RLM. A full list of the currency symbols used with that locale are in the &lt;currencies&gt; element, while the units can be gotten from  the &lt;units&gt; element (both using inheritance, of course).The digits used in each numbering system are accessed in
909			numberingSystems.xml. For more information, see <em><strong>Part
910					3: <a href="tr35-numbers.html#Contents">Numbers</a> </strong>, Section 2 <a href="tr35-numbers.html#Number_Elements">Number
911		Elements</a></em>. </p>
912        <p> <em>Examples for zh.xml:</em> </p>
913        <table>
914          <tr>
915            <th scope="col">Type</th>
916            <th scope="col">Description</th>
917          </tr>
918          <tr>
919            <td>defaultNumberingSystem</td>
920            <td>latn</td>
921          </tr>
922          <tr>
923            <td>otherNumberingSystems/native</td>
924            <td>hanidec</td>
925          </tr>
926          <tr>
927            <td>otherNumberingSystems/traditional</td>
928            <td>hans</td>
929          </tr>
930          <tr>
931            <td>otherNumberingSystems/finance</td>
932            <td>hansfin</td>
933          </tr>
934        </table>
935        <p>When determining the character repertoire needed to support a
936			language, a reasonable initial set would include at least the
937			characters in the main and punctuation exemplar sets, along with the
938			digits and common symbols associated with the numberSystems supported
939			for the locale (see <i> <a
940				href="tr35-numbers.html#Numbering_Systems">Numbering Systems</a></i>).
941		</p>
942
943		<p>
944			The <em>index</em> characters are a set of characters for use as a UI
945			"index", that is, a list of clickable characters (or character
946			sequences) that allow the user to see a segment of a larger "target"
947			list. For details see the <a
948				href="tr35-collation.html#Collation_Indexes">Unicode LDML:
949				Collation</a> document. The index set may only contain characters whose
950			lowercase versions are in the main and auxiliary exemplar sets,
951			though for cased languages the index exemplars are typically in
952			uppercase. Characters from the auxiliary exemplar set may be
953			necessary in the index set if it needs to properly handle items such
954			as names which may require characters not included in the main
955			exemplar set.
956		</p>
957
958		<p>Here is a sample of the XML structure:</p>
959
960		<pre>&lt;exemplarCharacters type="index"&gt;[A B C D E F G H I J K L M N O P Q R S T U V W X Y Z]&lt;/exemplarCharacters&gt;</pre>
961
962		<p>The display of the index characters can be modified with the
963			Index labels elements, discussed in Section 5.6.4.</p>
964
965		<h4>
966			3.1.1 <a name="ExemplarSyntax" href="#ExemplarSyntax">Exemplar
967				Syntax</a>
968		</h4>
969
970
971		<p>
972			In all of the exemplar characters, the list of characters is in the <a
973				href="tr35.html#Unicode_Sets">Unicode Set</a> format, which normally
974			allows boolean combinations of sets of letters and Unicode
975			properties.
976		</p>
977
978		<p>
979			Sequences of characters that act like a single letter in the language
980			— especially in collation — are included within braces, such as [a-z
981			á é í ó ú ö ü ő ű {cs} {dz} {dzs} {gy} ...]. The characters should be
982			in normalized form (NFC). Where combining marks are used
983			generatively, and apply to a large number of base characters (such as
984			in Indic scripts), the individual combining marks should be included.
985			Where they are used with only a few base characters, the specific
986			combinations should be included. Wherever there is not a precomposed
987			character (for example, single codepoint) for a given combination,
988			that must be included within braces. For example, to include
989			sequences from the <a href="http://unicode.org/standard/where/">Where
990				is my Character?</a> page on the Unicode site, one would write: [{ch}
991			{tʰ} {x̣} {ƛ̓} {ą́} {i̇́} {ト゚}], but for French one would just write
992			[a-z é è ù ...]. When in doubt use braces, since it does no harm to
993			include them around single code points: for example, [a-z {é} {è} {ù}
994			...].
995		</p>
996
997		<p>If the letter 'z' were only ever used in the combination 'tz',
998			then we might have [a-y {tz}] in the main set. (The language would
999			probably have plain 'z' in the auxiliary set, for use in foreign
1000			words.) If combining characters can be used productively in
1001			combination with a large number of others (such as say Indic matras),
1002			then they are not listed in all the possible combinations, but
1003			separately, such as:</p>
1004
1005		<blockquote>[‌ ‍ ॐ ०-९ ऄ-ऋ ॠ ऌ ॡ ऍ-क क़ ख ख़ ग ग़ घ-ज ज़
1006			झ-ड ड़ ढ ढ़ ण-फ फ़ ब-य य़ र-ह ़ ँ-ः ॑-॔ ऽ ् ॽ ा-ॄ ॢ ॣ ॅ-ौ]</blockquote>
1007
1008		<p>The exemplar character set for Han characters is composed
1009			somewhat differently. It is even harder to draw a clear line for Han
1010			characters, since usage is more like a frequency curve that slowly
1011			trails off to the right in terms of decreasing frequency. So for this
1012			case, the exemplar characters simply contain a set of reasonably
1013			frequent characters for the language.</p>
1014
1015		<p>The ordering of the characters in the set is irrelevant, but
1016			for readability in the XML file the characters should be in sorted
1017			order according to the locale's conventions. The main and auxiliary
1018			sets should only contain lower case characters (except for the
1019			special case of Turkish and similar languages, where the dotted
1020			capital I should be included); the upper case letters are to be
1021			mechanically added when the set is used. For more information on
1022			casing, see the discussion of Special Casing in the Unicode Character
1023			Database.</p>
1024
1025		<h4>
1026			3.1.2 <a name="Restrictions" href="#Restrictions">Restrictions</a>
1027		</h4>
1028
1029
1030		<ol>
1031			<li>The main, auxiliary and index sets are normally restricted
1032				to those letters with a specific <a
1033				href="http://unicode.org/Public/UNIDATA/Scripts.txt">Script </a>character
1034				property (that is, not the values Common or Inherited) or required <a
1035				href="http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt">Default_Ignorable_Code_Point</a>
1036				characters (such as a non-joiner), or combining marks, or the <a
1037				href="http://www.unicode.org/Public/UNIDATA/auxiliary/WordBreakProperty.txt">Word_Break</a>
1038				properties <a name="Katakana" href="#Katakana">Katakana</a>, <a
1039				name="ALetter" href="#ALetter">ALetter</a>, or <a name="MidLetter"
1040				href="#MidLetter">MidLetter</a>.
1041			</li>
1042
1043			<li>The auxiliary set should not overlap with the main set.
1044				There is one exception to this: Hangul Syllables and CJK Ideographs
1045				can overlap between the sets.</li>
1046
1047			<li>Any <a
1048				href="http://unicode.org/Public/UNIDATA/DerivedCoreProperties.txt">Default_Ignorable_Code_Point</a>s
1049				should be in the auxiliary set , or, if they are only needed for
1050				currency formatting, in the currency set. These can include
1051				characters such as U+200E LEFT-TO-RIGHT MARK and U+200F
1052				RIGHT-TO-LEFT MARK which may be needed in bidirectional text in
1053				order for date, currency or other formats to display correctly.
1054			</li>
1055			<li>For exemplar characters the <a href="tr35.html#Unicode_Sets">Unicode
1056					Set</a> format is restricted so as to not use properties or boolean
1057				combinations .
1058			</li>
1059		</ol>
1060
1061		<h3>
1062			3.2 <a name="Character_Mapping" href="#Character_Mapping">Mapping</a>
1063		</h3>
1064
1065		<p>
1066			<b>This element has been deprecated.</b> For information on its
1067			structure and how it was intended to specify locale-specific
1068			preferred encodings for various purposes (e-mail, web), see the <a
1069				href="http://www.unicode.org/reports/tr35/tr35-39/tr35-general.html#Character_Mapping">Mapping</a>
1070			section from the CLDR 27 version of the LDML Specification.
1071		</p>
1072
1073
1074		<h3>
1075			3.3 <a name="IndexLabels" href="#IndexLabels">Index Labels</a>
1076		</h3>
1077
1078		<p>
1079			<b>This element and its subelements have been deprecated.</b> For
1080			information on its structure and how it was intended to provide data
1081			for a compressed display of index exemplar characters where space is
1082			limited, see the <a
1083				href="http://www.unicode.org/reports/tr35/tr35-39/tr35-general.html#IndexLabels">Index
1084				Labels</a> section from the CLDR 27 version of the LDML Specification.
1085		</p>
1086
1087		<p class="dtd">&lt;!ELEMENT indexLabels (indexSeparator*,
1088			compressedIndexSeparator*, indexRangePattern*, indexLabelBefore*,
1089			indexLabelAfter*, indexLabel*) &gt;</p>
1090
1091
1092		<h3>
1093			3.4 <a name="Ellipsis" href="#Ellipsis">Ellipsis</a>
1094		</h3>
1095
1096		<p class="dtd">
1097			&lt;!ELEMENT ellipsis ( #PCDATA ) &gt;<br> &lt;!ATTLIST ellipsis
1098			type ( initial | medial | final | word-initial | word-medial |
1099			word-final ) #IMPLIED &gt;
1100		</p>
1101
1102		<p>The ellipsis element provides patterns for use when truncating
1103			strings. There are three versions: initial for removing an initial
1104			part of the string (leaving final characters); medial for removing
1105			from the center of the string (leaving initial and final characters),
1106			and final for removing a final part of the string (leaving initial
1107			characters). For example, the following uses the ellipsis character
1108			in all three cases (although some languages may have different
1109			characters for different positions).</p>
1110
1111		<p>
1112			<code>
1113				&lt;ellipsis type="initial"&gt;…{0}&lt;/ellipsis&gt;<br>
1114				&lt;ellipsis type="medial"&gt;{0}…{1}&lt;/ellipsis&gt;<br>
1115				&lt;ellipsis type="final"&gt;{0}…&lt;/ellipsis&gt;
1116			</code>
1117		</p>
1118		<p>There are alternatives for cases where the breaks are on a word
1119			boundary, where some languages include a space. For example, such as
1120			case would be:</p>
1121		<p>
1122			<code>&lt;ellipsis type="word-initial"&gt;…
1123				{0}&lt;/ellipsis&gt;</code>
1124		</p>
1125
1126		<h3>
1127			3.5 <a name="Character_More_Info" href="#Character_More_Info">More
1128				Information</a>
1129		</h3>
1130
1131
1132		<p>The moreInformation string is one that can be displayed in an
1133			interface to indicate that more information is available. For
1134			example:</p>
1135		<p>&lt;moreInformation&gt;?&lt;/moreInformation&gt;</p>
1136		<h3> 3.6 <a name="Character_Parse_Lenient" href="#Character_Parse_Lenient">Parse Lenient</a> </h3>
1137		  <p  class='dtd'>&lt;!ELEMENT parseLenients ( alias | ( parseLenient*, special* ) ) &gt;<br>
1138		    &lt;!ATTLIST parseLenients scope (general | number | date) #REQUIRED &gt;<br>
1139		    &lt;!ATTLIST parseLenients level (lenient | stricter) #REQUIRED &gt;</p>
1140		  <p class='dtd'>&lt;!ELEMENT parseLenient ( #PCDATA ) &gt;<br>
1141		    &lt;!ATTLIST parseLenient sample CDATA #REQUIRED &gt;<br>
1142		    &lt;!ATTLIST parseLenient alt NMTOKENS #IMPLIED &gt;<br>
1143		    &lt;!ATTLIST parseLenient draft (approved | contributed | provisional | unconfirmed) #IMPLIED &gt;<br>
1144		  </p>
1145<p>Example:</p>
1146<pre>&lt;parseLenients scope=&quot;date&quot; level=&quot;lenient&quot;&gt;
1147    &lt;parseLenient sample=&quot;-&quot;&gt;[\-./]&lt;/parseLenient&gt;
1148    &lt;parseLenient sample=&quot;:&quot;&gt;[\:∶]&lt;/parseLenient&gt;
1149&lt;/parseLenients&gt;</pre>
1150<p>The parseLenient elements are used to indicate that characters within a particular UnicodeSet are normally to be treated as equivalent when doing a lenient parse. The <strong>scope</strong> attribute value defines where the lenient sets are intended for use. The <strong>level</strong> attribute value is included for future expansion; currently the only value is &quot;lenient&quot;.</p>
1151<p>The <strong>sample</strong> attribute value is a paradigm element of that UnicodeSet, but the only reason for pulling it out separately is so that different classes of characters are separated, and to enable inheritance overriding. The first version of this data is populated with the data used for lenient parsing from ICU.</p>
1152
1153		<h2>
1154			4 <a name="Delimiter_Elements" href="#Delimiter_Elements">Delimiter
1155				Elements</a>
1156		</h2>
1157
1158
1159		<p class="dtd">&lt;!ELEMENT delimiters (alias | (quotationStart*,
1160			quotationEnd*, alternateQuotationStart*, alternateQuotationEnd*,
1161			special*)) &gt;</p>
1162
1163		<p>The delimiters supply common delimiters for bracketing
1164			quotations. The quotation marks are used with simple quoted text,
1165			such as:</p>
1166
1167		<blockquote>
1168			<p>He said, “Don’t be absurd!”</p>
1169		</blockquote>
1170
1171		<p>When quotations are nested, the quotation marks and alternate
1172			marks are used in an alternating fashion:</p>
1173
1174		<blockquote>
1175			<p>He said, “Remember what the Mad Hatter said: ‘Not the same
1176				thing a bit! Why you might just as well say that “I see what I eat”
1177				is the same thing as “I eat what I see”!’”</p>
1178		</blockquote>
1179
1180		<p>
1181			<code>&lt;quotationStart&gt;</code>
1182			<span style="color: blue">“</span>
1183			<code>&lt;/quotationStart&gt;</code>
1184			<br>
1185			<code>&lt;quotationEnd&gt;</code>
1186			<span style="color: blue">”</span>
1187			<code>&lt;/quotationEnd&gt;</code>
1188			<br>
1189			<code>&lt;alternateQuotationStart&gt;</code>
1190			<span style="color: blue">‘</span>
1191			<code>&lt;/alternateQuotationStart&gt;</code>
1192			<br>
1193			<code>&lt;alternateQuotationEnd&gt;</code>
1194			<span style="color: blue">’</span>
1195			<code>&lt;/alternateQuotationEnd&gt;</code>
1196		</p>
1197
1198
1199		<h2>
1200			5 <a name="Measurement_System_Data" href="#Measurement_System_Data">Measurement
1201				System Data</a>
1202		</h2>
1203
1204
1205		<p class="dtd">
1206			&lt;!ELEMENT measurementData ( measurementSystem*, paperSize* ) &gt;<br>
1207			<br> &lt;!ELEMENT measurementSystem EMPTY &gt;<br>
1208			&lt;!ATTLIST measurementSystem type ( metric | US | UK ) #REQUIRED
1209			&gt;<br> &lt;!ATTLIST measurementSystem category ( temperature )
1210			#IMPLIED &gt;<br>&lt;!ATTLIST measurementSystem territories
1211			NMTOKENS #REQUIRED &gt;<br> <br> &lt;!ELEMENT paperSize
1212			EMPTY &gt;<br> &lt;!ATTLIST paperSize type ( A4 | US-Letter )
1213			#REQUIRED &gt;<br> &lt;!ATTLIST paperSize territories NMTOKENS
1214			#REQUIRED &gt;
1215		</p>
1216
1217		<p>The measurement system is the normal measurement system in
1218			common everyday use (except for date/time). For example:</p>
1219
1220		<pre>&lt;measurementData&gt;
1221 &lt;measurementSystem type=&quot;metric&quot;  territories=&quot;001&quot;/&gt;
1222 &lt;measurementSystem type=&quot;US&quot;  territories=&quot;LR MM US&quot;/&gt;
1223 &lt;measurementSystem type=&quot;metric&quot; category=&quot;temperature&quot; territories=&quot;LR MM&quot;/&gt;
1224 &lt;measurementSystem type=&quot;US&quot; category=&quot;temperature&quot; territories=&quot;BS BZ KY PR PW&quot;/&gt;
1225 &lt;measurementSystem type=&quot;UK&quot;  territories=&quot;GB&quot;/&gt;
1226 &lt;paperSize type=&quot;A4&quot;  territories=&quot;001&quot;/&gt;
1227 &lt;paperSize type=&quot;US-Letter&quot;  territories=&quot;BZ CA CL CO CR GT MX NI PA PH PR SV US VE&quot;/&gt;
1228&lt;/measurementData&gt;</pre>
1229
1230		<p>The values are "metric", "US", or "UK"; others may be added
1231			over time.</p>
1232		<ul>
1233			<li>The "metric" value indicates the use of SI [<a
1234				href="tr35.html#ISO1000">ISO1000</a>] base or derived units, or
1235				non-SI units accepted for use with SI: for example, meters,
1236				kilograms, liters, and degrees Celsius.
1237			</li>
1238			<li>The "US" value indicates the customary system of measurement
1239				as used in the United States: feet, inches, pints, quarts, degrees
1240				Fahrenheit, and so on.</li>
1241			<li>The "UK" value indicates the mix of metric units and
1242				Imperial units (feet, inches, pints, quarts, and so on) used in the
1243				United Kingdom, in which Imperial volume units such
1244				as pint, quart, and gallon are different sizes than in the "US"
1245				customary system. For more detail about specific units
1246				for various usages, see <strong>Part 6: Supplemental:</strong> <em>Section 2.4.1
1247				<a href="tr35-info.html#Preferred_Units_For_Usage">Preferred Units for
1248				Specific Usages</a></em>.
1249			</li>
1250		</ul>
1251		<p>In some cases, it may be common to use different measurement
1252			systems for different categories of measurements. For example, the
1253			following indicates that for the category of temperature, in the
1254			regions LR and MM, it is more common to use metric units than US
1255			units.</p>
1256
1257		<pre>
1258			&lt;measurementSystem type=&quot;metric&quot; category=&quot;temperature&quot; territories=&quot;LR MM&quot;/&gt;
1259		</pre>
1260
1261		<p>The paperSize attribute gives the height and width of paper
1262			used for normal business letters. The values are "A4" and
1263			"US-Letter".</p>
1264
1265		<p>For both measurementSystem entries and paperSize entries, later
1266			entries for specific territories such as "US" will override the value
1267			assigned to that territory by earlier entries for more inclusive
1268			territories such as "001".</p>
1269
1270		<p>The measurement information was formerly in the main LDML file,
1271			and had a somewhat different format.</p>
1272
1273		<p>Again, for finer-grained detail about specific units
1274			for various usages, see <strong>Part 6: Supplemental:</strong> <em>Section 2.4.1
1275			<a href="tr35-info.html#Preferred_Units_For_Usage">Preferred Units for
1276			Specific Usages</a></em>.</p>
1277
1278		<h3>
1279			5.1 <a name="Measurement_Elements" href="#Measurement_Elements">Measurement
1280				Elements (deprecated)</a>
1281		</h3>
1282
1283
1284		<p class="dtd">&lt;!ELEMENT measurement (alias |
1285			(measurementSystem?, paperSize?, special*)) &gt;</p>
1286		<p>The measurement element is deprecated in the main LDML files,
1287			because the data is more appropriately organized as connected to
1288			territories, not to linguistic data. Instead, the measurementData
1289			element in the supplemental data file should be used.</p>
1290
1291
1292		<h2>
1293			6 <a name="Unit_Elements" href="#Unit_Elements">Unit Elements</a>
1294		</h2>
1295
1296
1297		<p class="dtd">
1298			&lt;!ELEMENT units (alias | (unit*, unitLength*, durationUnit*,
1299			special*) ) &gt;<br> <br> &lt;!ELEMENT unitLength (alias |
1300			(compoundUnit*, unit*, coordinateUnit*, special*) ) &gt;<br>
1301			&lt;!ATTLIST unitLength type (long | short | narrow) #REQUIRED &gt; <br>
1302			<br> &lt;!ELEMENT compoundUnit (alias | (compoundUnitPattern*,
1303			special*) ) &gt;<br> &lt;!ATTLIST compoundUnit type NMTOKEN
1304			#REQUIRED &gt; <br> <br> &lt;!ELEMENT unit (alias |
1305			(displayName*, unitPattern*, perUnitPattern*, special*) ) &gt;<br>
1306			&lt;!ATTLIST unit type NMTOKEN #REQUIRED &gt; <br> <br>
1307			&lt;!ELEMENT durationUnit (alias | (durationUnitPattern*, special*) )
1308			&gt;<br> &lt;!ATTLIST durationUnit type NMTOKEN #REQUIRED &gt; <br>
1309			<br> &lt;!ELEMENT unitPattern ( #PCDATA ) &gt;<br>
1310			&lt;!ATTLIST unitPattern count (0 | 1 | zero | one | two | few | many
1311			| other) #REQUIRED &gt; <br> <br> &lt;!ELEMENT
1312			compoundUnitPattern ( #PCDATA ) &gt;<br> <br> &lt;!ELEMENT
1313			coordinateUnit ( alias | ( displayName*, coordinateUnitPattern*, special* ) ) &gt;<br>&lt;!ELEMENT
1314			coordinateUnitPattern ( #PCDATA ) &gt;<br> &lt;!ATTLIST
1315			coordinateUnitPattern type (north | east | south | west) #REQUIRED
1316			&gt; <br> <br> &lt;!ELEMENT durationUnitPattern ( #PCDATA )
1317			&gt;<br>
1318		</p>
1319
1320		<p>These elements specify the localized way of formatting
1321			quantities of units such as years, months, days, hours, minutes and
1322			seconds— for example, in English, "1 day" or "3 days". The English
1323			rules that produce this example are as follows ({0} indicates the
1324			position of the formatted numeric value):</p>
1325
1326		<pre>&lt;unit type="duration-day"&gt;
1327&nbsp;&nbsp;&lt;displayName&gt;days&lt;/displayName&gt;
1328&nbsp;&nbsp;&lt;unitPattern count="one"&gt;<span style="color: blue">{0} day</span>&lt;/unitName&gt;
1329&nbsp;&nbsp;&lt;unitPattern count="other"&gt;<span style="color: blue">{0} days</span>&lt;/unitName&gt;
1330&lt;/unit&gt;</pre>
1331
1332		<p>In addition to supporting language-specific plural cases
1333			such as “one” and “other”, unitPatterns support the language-independent
1334			explicit cases “0” and “1” for special handling of numeric values that are
1335			exactly 0 or 1; see
1336			<a href="tr35-numbers.html#Explicit_0_1_rules">Explicit 0 and 1 rules</a>.</p>
1337		<p>
1338			Units, like other values with a <strong>count</strong> attribute, use
1339			a special inheritance. See <strong>Part 1: Core:</strong> <em>Section
1340				4.1 <a href="tr35.html#Multiple_Inheritance">Multiple
1341					Inheritance</a>
1342			</em>.
1343		</p>
1344		<p>The displayName is used for labels, such as in a UI. It is
1345			typically lowercased and as neutral a plural form as possible, and
1346			then uses the casing context for the proper display. For example, for
1347			English in a UI it would appear as titlecase:</p>
1348		<p>
1349			<strong>Duration:</strong>
1350		</p>
1351		<table style="margin-left: 5em">
1352			<tr>
1353				<td>Days</td>
1354				<td style="color: silver">enter the vacation length</td>
1355			</tr>
1356		</table>
1357		<p>&nbsp;</p>
1358		<p>The value of the type attribute are <em>unit identifiers</em>. Syntactically, they have the following structure:</p>
1359		<div class='syntax'>
1360		<p>unit_identifier := type &quot;-&quot; unit</p>
1361		<p>type := [a-z]+</p>
1362		<p>unit := [a-z]+([-][a-z]+)*</p>
1363		</div>
1364		<p>Example:		</p>
1365		<p class="xmlExample">&lt;unit
1366			type=&quot;acceleration-g-force&quot;&gt;</p>
1367		<p>​</p>
1368		<p>
1369			Examples of these include but are not limited to the following. The units in CLDR are not comprehensive; it is anticipated that
1370			more will be added over time. The complete list of supported units is in the
1371			validity data: see <em>Section <a href="tr35.html#Validity_Data">3.11
1372					Validity Data</a></em>.
1373		</p>
1374		<table>
1375			<tr>
1376				<td><strong>Type</strong></td>
1377				<td><strong>Unit</strong></td>
1378				<td><strong>Sample Format</strong></td>
1379			</tr>
1380			<tr>
1381				<td><em>acceleration</em></td>
1382				<td>g-force</td>
1383				<td>{0} G</td>
1384			</tr>
1385			<tr>
1386				<td><em>acceleration</em></td>
1387				<td>meter-per-second-squared</td>
1388				<td>{0} m/s²</td>
1389			</tr>
1390			<tr>
1391				<td><em>angle</em></td>
1392				<td>revolution</td>
1393				<td>{0} rev</td>
1394			</tr>
1395			<tr>
1396				<td><em>angle</em></td>
1397				<td>radian</td>
1398				<td>{0} rad</td>
1399			</tr>
1400			<tr>
1401				<td><em>angle</em></td>
1402				<td>degree</td>
1403				<td>{0}°</td>
1404			</tr>
1405			<tr>
1406				<td><em>angle</em></td>
1407				<td>arc-minute</td>
1408				<td>{0}′</td>
1409			</tr>
1410			<tr>
1411				<td><em>angle</em></td>
1412				<td>arc-second</td>
1413				<td>{0}″</td>
1414			</tr>
1415			<tr>
1416				<td><em>area</em></td>
1417				<td>square-kilometer</td>
1418				<td>{0} km²</td>
1419			</tr>
1420			<tr>
1421				<td><em>area</em></td>
1422				<td>hectare</td>
1423				<td>{0} ha</td>
1424			</tr>
1425			<tr>
1426				<td><em>area</em></td>
1427				<td>square-meter</td>
1428				<td>{0} m²</td>
1429			</tr>
1430			<tr>
1431				<td><em>area</em></td>
1432				<td>square-centimeter</td>
1433				<td>{0} cm²</td>
1434			</tr>
1435			<tr>
1436				<td><em>area</em></td>
1437				<td>square-mile</td>
1438				<td>{0} mi²</td>
1439			</tr>
1440			<tr>
1441				<td><em>area</em></td>
1442				<td>acre</td>
1443				<td>{0} ac</td>
1444			</tr>
1445			<tr>
1446				<td><em>area</em></td>
1447				<td>square-yard</td>
1448				<td>{0} yd²</td>
1449			</tr>
1450			<tr>
1451				<td><em>area</em></td>
1452				<td>square-foot</td>
1453				<td>{0} ft²</td>
1454			</tr>
1455			<tr>
1456				<td><em>area</em></td>
1457				<td>square-inch</td>
1458				<td>{0} in²</td>
1459			</tr>
1460			<tr>
1461				<td><em>concentr</em></td>
1462				<td>karat</td>
1463				<td>{0} kt</td>
1464				<td>dimensionless</td>
1465			</tr>
1466			<tr>
1467				<td><em>concentr</em></td>
1468				<td>milligram-per-deciliter</td>
1469				<td>{0} mg/dL</td>
1470			</tr>
1471			<tr>
1472				<td><em>concentr</em></td>
1473				<td>millimole-per-liter</td>
1474				<td>{0} mmol/L</td>
1475			</tr>
1476			<tr>
1477				<td><em>concentr</em></td>
1478				<td>part-per-million</td>
1479				<td>{0} ppm</td>
1480				<td>dimensionless</td>
1481			</tr>
1482			<tr>
1483				<td><em>concentr</em></td>
1484				<td>percent</td>
1485				<td>{0}%</td>
1486				<td>dimensionless</td>
1487			</tr>
1488			<tr>
1489				<td><em>concentr</em></td>
1490				<td>permille</td>
1491				<td>{0}‰</td>
1492				<td>dimensionless</td>
1493			</tr>
1494			<tr>
1495				<td><em>consumption</em></td>
1496				<td>liter-per-kilometer</td>
1497				<td>{0} L/km</td>
1498			</tr>
1499			<tr>
1500				<td><em>consumption</em></td>
1501				<td>liter-per-100kilometers</td>
1502				<td>{0} L/100km</td>
1503			</tr>
1504			<tr>
1505				<td><em>consumption</em></td>
1506				<td>mile-per-gallon (US)</td>
1507				<td>{0} mpg</td>
1508			</tr>
1509			<tr>
1510				<td><em>consumption</em></td>
1511				<td>mile-per-gallon-imperial</td>
1512				<td>{0} mpg Imp.</td>
1513			</tr>
1514			<tr>
1515				<td><em>digital</em></td>
1516				<td>petabyte</td>
1517				<td>{0} PB</td>
1518			</tr>
1519			<tr>
1520				<td><em>digital</em></td>
1521				<td>terabyte</td>
1522				<td>{0} TB</td>
1523			</tr>
1524			<tr>
1525				<td><em>digital</em></td>
1526				<td>terabit</td>
1527				<td>{0} Tb</td>
1528			</tr>
1529			<tr>
1530				<td><em>digital</em></td>
1531				<td>gigabyte</td>
1532				<td>{0} GB</td>
1533			</tr>
1534			<tr>
1535				<td><em>digital</em></td>
1536				<td>gigabit</td>
1537				<td>{0} Gb</td>
1538			</tr>
1539			<tr>
1540				<td><em>digital</em></td>
1541				<td>megabyte</td>
1542				<td>{0} MB</td>
1543			</tr>
1544			<tr>
1545				<td><em>digital</em></td>
1546				<td>megabit</td>
1547				<td>{0} Mb</td>
1548			</tr>
1549			<tr>
1550				<td><em>digital</em></td>
1551				<td>kilobyte</td>
1552				<td>{0} kB</td>
1553			</tr>
1554			<tr>
1555				<td><em>digital</em></td>
1556				<td>kilobit</td>
1557				<td>{0} kb</td>
1558			</tr>
1559			<tr>
1560				<td><em>digital</em></td>
1561				<td>byte</td>
1562				<td>{0} byte</td>
1563			</tr>
1564			<tr>
1565				<td><em>digital</em></td>
1566				<td>bit</td>
1567				<td>{0} bit</td>
1568			</tr>
1569			<tr>
1570				<td><em>duration</em></td>
1571				<td>century</td>
1572				<td>{0} c</td>
1573			</tr>
1574			<tr>
1575				<td><em>duration</em></td>
1576				<td>year</td>
1577				<td>{0} y</td>
1578			</tr>
1579			<tr>
1580				<td><em>duration</em></td>
1581				<td>year-person</td>
1582				<td>{0} y</td>
1583			</tr>
1584			<tr>
1585				<td><em>duration</em></td>
1586				<td>month</td>
1587				<td>{0} m</td>
1588			</tr>
1589			<tr>
1590				<td><em>duration</em></td>
1591				<td>month-person</td>
1592				<td>{0} m</td>
1593			</tr>
1594			<tr>
1595				<td><em>duration</em></td>
1596				<td>week</td>
1597				<td>{0} w</td>
1598			</tr>
1599			<tr>
1600				<td><em>duration</em></td>
1601				<td>week-person</td>
1602				<td>{0} w</td>
1603			</tr>
1604			<tr>
1605				<td><em>duration</em></td>
1606				<td>day</td>
1607				<td>{0} d</td>
1608			</tr>
1609			<tr>
1610				<td><em>duration</em></td>
1611				<td>day-person</td>
1612				<td>{0} d</td>
1613			</tr>
1614			<tr>
1615				<td><em>duration</em></td>
1616				<td>hour</td>
1617				<td>{0} h</td>
1618			</tr>
1619			<tr>
1620				<td><em>duration</em></td>
1621				<td>minute</td>
1622				<td>{0} min</td>
1623			</tr>
1624			<tr>
1625				<td><em>duration</em></td>
1626				<td>second</td>
1627				<td>{0} s</td>
1628			</tr>
1629			<tr>
1630				<td><em>duration</em></td>
1631				<td>millisecond</td>
1632				<td>{0} ms</td>
1633			</tr>
1634			<tr>
1635				<td><em>duration</em></td>
1636				<td>microsecond</td>
1637				<td>{0} μs</td>
1638			</tr>
1639			<tr>
1640				<td><em>duration</em></td>
1641				<td>nanosecond</td>
1642				<td>{0} ns</td>
1643			</tr>
1644			<tr>
1645				<td><em>electric</em></td>
1646				<td>ampere</td>
1647				<td>{0} A</td>
1648			</tr>
1649			<tr>
1650				<td><em>electric</em></td>
1651				<td>milliampere</td>
1652				<td>{0} mA</td>
1653			</tr>
1654			<tr>
1655				<td><em>electric</em></td>
1656				<td>ohm</td>
1657				<td>{0} Ω</td>
1658			</tr>
1659			<tr>
1660				<td><em>electric</em></td>
1661				<td>volt</td>
1662				<td>{0} V</td>
1663			</tr>
1664			<tr>
1665				<td><em>energy</em></td>
1666				<td>kilocalorie</td>
1667				<td>{0} kcal</td>
1668			</tr>
1669			<tr>
1670				<td><em>energy</em></td>
1671				<td>calorie</td>
1672				<td>{0} cal</td>
1673			</tr>
1674			<tr>
1675				<td><em>energy</em></td>
1676				<td>foodcalorie</td>
1677				<td>{0} Cal</td>
1678			</tr>
1679			<tr>
1680				<td><em>energy</em></td>
1681				<td>kilojoule</td>
1682				<td>{0} kJ</td>
1683			</tr>
1684			<tr>
1685				<td><em>energy</em></td>
1686				<td>joule</td>
1687				<td>{0} J</td>
1688			</tr>
1689			<tr>
1690				<td><em>energy</em></td>
1691				<td>kilowatt-hour</td>
1692				<td>{0} kWh</td>
1693			</tr>
1694			<tr>
1695				<td><em>frequency</em></td>
1696				<td>gigahertz</td>
1697				<td>{0} GHz</td>
1698			</tr>
1699			<tr>
1700				<td><em>frequency</em></td>
1701				<td>megahertz</td>
1702				<td>{0} MHz</td>
1703			</tr>
1704			<tr>
1705				<td><em>frequency</em></td>
1706				<td>kilohertz</td>
1707				<td>{0} kHz</td>
1708			</tr>
1709			<tr>
1710				<td><em>frequency</em></td>
1711				<td>hertz</td>
1712				<td>{0} Hz</td>
1713			</tr>
1714			<tr>
1715				<td><em>length</em></td>
1716				<td>kilometer</td>
1717				<td>{0} km</td>
1718			</tr>
1719			<tr>
1720				<td><em>length</em></td>
1721				<td>meter</td>
1722				<td>{0} m</td>
1723			</tr>
1724			<tr>
1725				<td><em>length</em></td>
1726				<td>decimeter</td>
1727				<td>{0} dm</td>
1728			</tr>
1729			<tr>
1730				<td><em>length</em></td>
1731				<td>centimeter</td>
1732				<td>{0} cm</td>
1733			</tr>
1734			<tr>
1735				<td><em>length</em></td>
1736				<td>millimeter</td>
1737				<td>{0} mm</td>
1738			</tr>
1739			<tr>
1740				<td><em>length</em></td>
1741				<td>micrometer</td>
1742				<td>{0} µm</td>
1743			</tr>
1744			<tr>
1745				<td><em>length</em></td>
1746				<td>nanometer</td>
1747				<td>{0} nm</td>
1748			</tr>
1749			<tr>
1750				<td><em>length</em></td>
1751				<td>picometer</td>
1752				<td>{0} pm</td>
1753			</tr>
1754			<tr>
1755				<td><em>length</em></td>
1756				<td>mile</td>
1757				<td>{0} mi</td>
1758			</tr>
1759			<tr>
1760				<td><em>length</em></td>
1761				<td>yard</td>
1762				<td>{0} yd</td>
1763			</tr>
1764			<tr>
1765				<td><em>length</em></td>
1766				<td>foot</td>
1767				<td>{0} ft</td>
1768			</tr>
1769			<tr>
1770				<td><em>length</em></td>
1771				<td>inch</td>
1772				<td>{0} in</td>
1773			</tr>
1774			<tr>
1775				<td><em>length</em></td>
1776				<td>parsec</td>
1777				<td>{0} pc</td>
1778			</tr>
1779			<tr>
1780				<td><em>length</em></td>
1781				<td>light-year</td>
1782				<td>{0} ly</td>
1783			</tr>
1784			<tr>
1785				<td><em>length</em></td>
1786				<td>astronomical-unit</td>
1787				<td>{0} au</td>
1788			</tr>
1789			<tr>
1790				<td><em>length</em></td>
1791				<td>furlong</td>
1792				<td>{0} fur</td>
1793			</tr>
1794			<tr>
1795				<td><em>length</em></td>
1796				<td>fathom</td>
1797				<td>{0} fm</td>
1798			</tr>
1799			<tr>
1800				<td><em>length</em></td>
1801				<td>nautical-mile</td>
1802				<td>{0} nmi</td>
1803			</tr>
1804			<tr>
1805				<td><em>length</em></td>
1806				<td>mile-scandinavian</td>
1807				<td>{0} smi</td>
1808			</tr>
1809			<tr>
1810				<td><em>length</em></td>
1811				<td>point</td>
1812				<td>{0} pt</td>
1813				<td> typographic point, 1/72 inch</td>
1814			</tr>
1815			<tr>
1816				<td><em>light</em></td>
1817				<td>lux</td>
1818				<td>{0} lx</td>
1819			</tr>
1820			<tr>
1821				<td><em>mass</em></td>
1822				<td>metric-ton</td>
1823				<td>{0} t</td>
1824			</tr>
1825			<tr>
1826				<td><em>mass</em></td>
1827				<td>kilogram</td>
1828				<td>{0} kg</td>
1829			</tr>
1830			<tr>
1831				<td><em>mass</em></td>
1832				<td>gram</td>
1833				<td>{0} g</td>
1834			</tr>
1835			<tr>
1836				<td><em>mass</em></td>
1837				<td>milligram</td>
1838				<td>{0} mg</td>
1839			</tr>
1840			<tr>
1841				<td><em>mass</em></td>
1842				<td>microgram</td>
1843				<td>{0} µg</td>
1844			</tr>
1845			<tr>
1846				<td><em>mass</em></td>
1847				<td>ton</td>
1848				<td>{0} tn</td>
1849			</tr>
1850			<tr>
1851				<td><em>mass</em></td>
1852				<td>stone</td>
1853				<td>{0} st</td>
1854			</tr>
1855			<tr>
1856				<td><em>mass</em></td>
1857				<td>pound</td>
1858				<td>{0} lb</td>
1859			</tr>
1860			<tr>
1861				<td><em>mass</em></td>
1862				<td>ounce</td>
1863				<td>{0} oz</td>
1864			</tr>
1865			<tr>
1866				<td><em>mass</em></td>
1867				<td>ounce-troy</td>
1868				<td>{0} oz t</td>
1869			</tr>
1870			<tr>
1871				<td><em>mass</em></td>
1872				<td>carat</td>
1873				<td>{0} CD</td>
1874			</tr>
1875			<tr>
1876				<td><em>power</em></td>
1877				<td>gigawatt</td>
1878				<td>{0} GW</td>
1879			</tr>
1880			<tr>
1881				<td><em>power</em></td>
1882				<td>megawatt</td>
1883				<td>{0} MW</td>
1884			</tr>
1885			<tr>
1886				<td><em>power</em></td>
1887				<td>kilowatt</td>
1888				<td>{0} kW</td>
1889			</tr>
1890			<tr>
1891				<td><em>power</em></td>
1892				<td>watt</td>
1893				<td>{0} W</td>
1894			</tr>
1895			<tr>
1896				<td><em>power</em></td>
1897				<td>milliwatt</td>
1898				<td>{0} mW</td>
1899			</tr>
1900			<tr>
1901				<td><em>power</em></td>
1902				<td>horsepower</td>
1903				<td>{0} hp</td>
1904			</tr>
1905			<tr>
1906				<td><em>pressure</em></td>
1907				<td>hectopascal</td>
1908				<td>{0} hPa</td>
1909			</tr>
1910			<tr>
1911				<td><em>pressure</em></td>
1912				<td>millimeter-of-mercury</td>
1913				<td>{0} mm Hg</td>
1914			</tr>
1915			<tr>
1916				<td><em>pressure</em></td>
1917				<td>pound-per-square-inch</td>
1918				<td>{0} psi</td>
1919			</tr>
1920			<tr>
1921				<td><em>pressure</em></td>
1922				<td>inch-hg</td>
1923				<td>{0} inHg</td>
1924			</tr>
1925			<tr>
1926				<td><em>pressure</em></td>
1927				<td>millibar</td>
1928				<td>{0} mbar</td>
1929			</tr>
1930			<tr>
1931				<td><em>pressure</em></td>
1932				<td>atmosphere</td>
1933				<td>{0} atm</td>
1934			</tr>
1935			<tr>
1936				<td><em>speed</em></td>
1937				<td>kilometer-per-hour</td>
1938				<td>{0} km/h</td>
1939			</tr>
1940			<tr>
1941				<td><em>speed</em></td>
1942				<td>meter-per-second</td>
1943				<td>{0} m/s</td>
1944			</tr>
1945			<tr>
1946				<td><em>speed</em></td>
1947				<td>mile-per-hour</td>
1948				<td>{0} mi/h</td>
1949			</tr>
1950			<tr>
1951				<td><em>speed</em></td>
1952				<td>knot</td>
1953				<td>{0} kn</td>
1954			</tr>
1955			<tr>
1956				<td><em>temperature</em></td>
1957				<td>generic</td>
1958				<td>{0}°</td>
1959			</tr>
1960			<tr>
1961				<td><em>temperature</em></td>
1962				<td>celsius</td>
1963				<td>{0}°C</td>
1964			</tr>
1965			<tr>
1966				<td><em>temperature</em></td>
1967				<td>fahrenheit</td>
1968				<td>{0}°F</td>
1969			</tr>
1970			<tr>
1971				<td><em>temperature</em></td>
1972				<td>kelvin</td>
1973				<td>{0} K</td>
1974			</tr>
1975			<tr>
1976				<td><em>volume</em></td>
1977				<td>cubic-kilometer</td>
1978				<td>{0} km³</td>
1979			</tr>
1980			<tr>
1981				<td><em>volume</em></td>
1982				<td>cubic-meter</td>
1983				<td>{0} m³</td>
1984			</tr>
1985			<tr>
1986				<td><em>volume</em></td>
1987				<td>cubic-centimeter</td>
1988				<td>{0} cm³</td>
1989			</tr>
1990			<tr>
1991				<td><em>volume</em></td>
1992				<td>cubic-mile</td>
1993				<td>{0} mi³</td>
1994			</tr>
1995			<tr>
1996				<td><em>volume</em></td>
1997				<td>cubic-yard</td>
1998				<td>{0} yd³</td>
1999			</tr>
2000			<tr>
2001				<td><em>volume</em></td>
2002				<td>cubic-foot</td>
2003				<td>{0} ft³</td>
2004			</tr>
2005			<tr>
2006				<td><em>volume</em></td>
2007				<td>cubic-inch</td>
2008				<td>{0} in³</td>
2009			</tr>
2010			<tr>
2011				<td><em>volume</em></td>
2012				<td>megaliter</td>
2013				<td>{0} ML</td>
2014			</tr>
2015			<tr>
2016				<td><em>volume</em></td>
2017				<td>hectoliter</td>
2018				<td>{0} hL</td>
2019			</tr>
2020			<tr>
2021				<td><em>volume</em></td>
2022				<td>liter</td>
2023				<td>{0} L</td>
2024			</tr>
2025			<tr>
2026				<td><em>volume</em></td>
2027				<td>deciliter</td>
2028				<td>{0} dL</td>
2029			</tr>
2030			<tr>
2031				<td><em>volume</em></td>
2032				<td>centiliter</td>
2033				<td>{0} cL</td>
2034			</tr>
2035			<tr>
2036				<td><em>volume</em></td>
2037				<td>milliliter</td>
2038				<td>{0} mL</td>
2039			</tr>
2040			<tr>
2041				<td><em>volume</em></td>
2042				<td>pint-metric</td>
2043				<td>{0} mpt</td>
2044			</tr>
2045			<tr>
2046				<td><em>volume</em></td>
2047				<td>cup-metric</td>
2048				<td>{0} mc</td>
2049			</tr>
2050			<tr>
2051				<td><em>volume</em></td>
2052				<td>acre-foot</td>
2053				<td>{0} ac ft</td>
2054			</tr>
2055			<tr>
2056				<td><em>volume</em></td>
2057				<td>bushel</td>
2058				<td>{0} bu</td>
2059			</tr>
2060			<tr>
2061				<td><em>volume</em></td>
2062				<td>gallon (US)</td>
2063				<td>{0} gal</td>
2064			</tr>
2065			<tr>
2066				<td><em>volume</em></td>
2067				<td>gallon-imperial</td>
2068				<td>{0} gal Imp.</td>
2069			</tr>
2070			<tr>
2071				<td><em>volume</em></td>
2072				<td>quart</td>
2073				<td>{0} qt</td>
2074			</tr>
2075			<tr>
2076				<td><em>volume</em></td>
2077				<td>pint</td>
2078				<td>{0} pt</td>
2079			</tr>
2080			<tr>
2081				<td><em>volume</em></td>
2082				<td>cup</td>
2083				<td>{0} c</td>
2084			</tr>
2085			<tr>
2086				<td><em>volume</em></td>
2087				<td>fluid-ounce</td>
2088				<td>{0} fl oz</td>
2089			</tr>
2090			<tr>
2091				<td><em>volume</em></td>
2092				<td>tablespoon</td>
2093				<td>{0} tbsp</td>
2094			</tr>
2095			<tr>
2096				<td><em>volume</em></td>
2097				<td>teaspoon</td>
2098				<td>{0} tsp</td>
2099			</tr>
2100		</table>
2101		<p>
2102			There are three widths: <strong>long</strong>, <strong>short</strong>,
2103			and <strong>narrow</strong>. As usual, the narrow forms may not be
2104			unique: in English, 1′ could mean 1 minute of arc, or 1 foot. Thus
2105			narrow forms should only be used where the context makes the meaning
2106			clear.
2107		</p>
2108		<p>
2109			Where the unit of measurement is one of the <a
2110				href="http://physics.nist.gov/cuu/Units/units.html">International
2111				System of Units (SI)</a>, the short and narrow forms will typically use
2112			the international symbols, such as “mm” for millimeter. They may,
2113			however, be different if that is customary for the language or
2114			locale. For example, in Russian it may be more typical to see the
2115			Cyrillic characters “мм”.
2116		</p>
2117		<p>Units are included for translation even where they are not
2118			typically used in a particular locale, such as kilometers in the US,
2119			or inches in Germany. This is to account for use by travelers and
2120			specialized domains, such as the German “̌Fernseher von 32 bis 55
2121			Zoll (80 bis 140 cm)” for TV screen size in inches and centimeters.</p>
2122		<p>For temperature, there is a special unit &lt;unit
2123			type=&quot;temperature-generic&quot;&gt;, which is used when it is
2124			clear from context whether Celcius or Fahrenheit is implied.</p>
2125		<p>For duration, there are special units such as &lt;unit
2126			type=&quot;duration-year-person&quot;&gt; and &lt;unit
2127			type=&quot;duration-year-week&quot;&gt; for indicating the age of a
2128			person, which requires special forms in some languages. For example,
2129			in "zh", references to a person being 3 days old or 30 years old
2130			would use the forms “他3天大” and “他30岁” respectively.</p>
2131		<h3>
2132			6.1 <a name="perUnitPatterns" href="#perUnitPatterns">per Unit
2133				patterns</a><a name="compoundUnitPattern" href="#compoundUnitPattern"></a>
2134		</h3>
2135		<p>
2136			A common combination of units is X per Y, such as <em>miles per
2137				hour</em> or <em>liters per second</em>. Some units already have
2138			'precomputed' forms, such as <strong>kilometer-per-hour</strong>;
2139			where such units exist, they should be used in preference. There are
2140			two other patterns that can be used to compose unit symbols or names.
2141		</p>
2142		<p>
2143			<strong>compoundUnit</strong> — This is used to construct a pattern
2144			from two unit names. For example, a form such as &quot;{0} per
2145			{1}&quot; or &quot;{0}/{1}&quot; can be used to construct cases such
2146			as &quot;2 feet<strong> per </strong>second&quot; or &quot;ft<strong>/</strong>s&quot;
2147		</p>
2148		<p>
2149			<strong>perUnitPattern</strong> — This is used as the denominator
2150			with another unit name. For example, a form such as &quot;{0} per
2151			second&quot; can be used to form &quot;2 feet<strong> per
2152				second</strong>&quot;.
2153		</p>
2154		<p>The difference between these is that in some inflected
2155			languages, the compoundUnit cannot be used to form grammatical
2156			phrases. This is typically because the &quot;per&quot; +
2157			&quot;second&quot; combine in a non-trivial way. For such languages,
2158			the compoundUnit should only be used as a fallback, when there is no
2159			other recourse.</p>
2160		<p>When constructing a pattern for value=V, numeratorUnit=N,
2161			denominatorUnit=D, the following precess is used.</p>
2162		<ol>
2163			<li>If there is a compound form for N/D already available, use
2164				it.</li>
2165			<li>Otherwise, format the N pattern with the number using plural
2166				categories.
2167				<ul>
2168					<li>→ &quot;3 kilograms&quot;</li>
2169				</ul>
2170			</li>
2171			<li>See if there is a <strong>perUnitPattern</strong> for D.
2172
2173				<ol>
2174					<li>If so, then substitute the formatted numerator into the <strong>perUnitPattern</strong>
2175						<ul>
2176							<li>&quot;3 kilograms&quot; + &quot;{0} per second&quot; →
2177								&quot;3 kilograms per second&quot;</li>
2178						</ul>
2179					</li>
2180					<li>If not, get the <strong>compoundUnit</strong> pattern, and
2181						substitute the formatted numerator for {0} and the singular form
2182						of the denominator for {1}, after stripping the {0} and trimming
2183						spaces.
2184						<ul>
2185							<li>&quot;3 kilograms&quot; + &quot;{0} per {1}&quot; +
2186								&quot;{0} second&quot; →</li>
2187							<li>&quot;3 kilograms&quot; + &quot;{0} per {1}&quot; +
2188								&quot;second&quot; →</li>
2189							<li>&quot;3 kilograms per second&quot;</li>
2190						</ul></li>
2191				</ol>
2192			</li>
2193		</ol>
2194		<p>The patterns can have different unit lengths, so the
2195			appropriate unit length should be used (with fallbacks if necessary).</p>
2196		<h3>
2197			6.2 <a name="Unit_Sequences" href="#Unit_Sequences">Unit
2198				Sequences</a>
2199		</h3>
2200		<p>
2201			Units may be used in composed sequences, such as <strong>5°
2202				30′</strong> for 5 degrees 30 minutes, or <strong>3 ft 2 in.</strong>For that
2203			purpose, the appropriate width of the unit listPattern can be used to
2204			compose the units in a sequence.
2205		</p>
2206		<pre>&lt;listPattern type=&quot;unit&quot;&gt; (for the long form)
2207&lt;listPattern type=&quot;unit-narrow&quot;&gt;
2208&lt;listPattern type=&quot;unit-short&quot;&gt;
2209</pre>
2210		<h3>
2211			6.3 <a name="durationUnit" href="#durationUnit">durationUnit</a>
2212		</h3>
2213		<p>The durationUnit is a special type of unit used for composed
2214			time unit durations.</p>
2215		<pre>&lt;durationUnit type=&quot;hms&quot;&gt;
2216  &lt;durationUnitPattern&gt;h:mm:ss&lt;/durationUnitPattern&gt; &lt;!-- 33:04:59 --&gt;
2217&lt;/durationUnit&gt;   </pre>
2218		<p>The type contains a skeleton, where 'h' stands for hours, 'm'
2219			for minutes, and 's' for sections. These are the same symbols used in
2220			availableFormats, except that there is no need to distinguish
2221			different forms of the hour.</p>
2222
2223		<h3>
2224			6.4 <a name="coordinateUnit" href="#coordinateUnit">coordinateUnit</a>
2225		</h3>
2226		<p>
2227			The <strong>coordinateUnitPattern</strong> is a special type of
2228			pattern used for composing degrees of latitude and longitude, with an
2229			indicator of the quadrant. There are exactly 4 type values,
2230			plus a displayName for the items in this category. An angle
2231			is composed using the appropriate combination of the <strong>angle-degrees</strong>,
2232			<strong>angle-arc-minute</strong> and <strong>angle-arc-second</strong>
2233			values. It is then substituted for the placeholder field {0} in the
2234			appropriate <strong>coordinateUnit</strong> pattern.
2235		</p>
2236		<p class="xmlExample">
2237			&lt;displayName&gt;direction&lt;/displayName&gt;<br>
2238			&lt;coordinateUnitPattern
2239			type=&quot;east&quot;&gt;{0}E&lt;/coordinateUnitPattern&gt;<br>
2240			&lt;coordinateUnitPattern
2241			type=&quot;north&quot;&gt;{0}N&lt;/coordinateUnitPattern&gt;<br>
2242			&lt;coordinateUnitPattern
2243			type=&quot;south&quot;&gt;{0}S&lt;/coordinateUnitPattern&gt;<br>
2244			&lt;coordinateUnitPattern
2245			type=&quot;west&quot;&gt;{0}W&lt;/coordinateUnitPattern&gt;
2246		</p>
2247
2248		<h3>
2249			6.5 <a name="Territory_Based_Unit_Preferences"
2250				href="#Territory_Based_Unit_Preferences">Territory-Based Unit
2251				Preferences</a>
2252		</h3>
2253		<p>Different locales have different preferences
2254			for which unit or combination of units is used for a particular
2255			usage, such as measuring a person’s height. This is more fine-grained
2256			than merely a preference for metric versus US or UK measurement
2257			systems. For example, one locale may use meters alone, while another
2258			may use centimeters alone or a combination of meters and centimeters;
2259			a third may use inches alone, or (informally) a combination of feet
2260			and inches.</p>
2261		<p>
2262			The &lt;unitPreferenceData&gt; element, described in <a
2263				href="tr35-info.html#Preferred_Units_For_Usage">Preferred Units
2264				for Specific Usages</a>, provides information on which unit or
2265			combination of units is used for various purposes in different
2266			locales, with options for the level of formality and the scale of the
2267			measurement (e.g measuring the height of an adult versus that of an
2268			infant).
2269		</p>
2270
2271		<h2>
2272			7 <a name="POSIX_Elements" href="#POSIX_Elements">POSIX Elements</a>
2273		</h2>
2274
2275
2276		<p class="dtd">
2277			&lt;!ELEMENT posix (alias | (messages*, special*)) &gt;<br>
2278			&lt;!ELEMENT messages (alias | ( yesstr*, nostr*)) &gt;
2279		</p>
2280
2281		<p>The following are included for compatibility with POSIX.</p>
2282
2283		<p>
2284			&lt;posix&gt;<br> &nbsp;&nbsp;&nbsp;&nbsp;&lt;posix:messages&gt;<br>
2285			&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;posix:yesstr&gt;<span
2286				style="color: #0000FF">ja</span>&lt;/posix:yesstr&gt;<br>
2287			&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&lt;posix:nostr&gt;<span
2288				style="color: #0000FF">nein</span>&lt;/posix:nostr&gt;<br>
2289			&nbsp;&nbsp;&nbsp;&nbsp;&lt;/posix:messages&gt;<br>
2290			&lt;posix&gt;
2291		</p>
2292
2293		<ol>
2294			<li>The values for yesstr and nostr contain a colon-separated
2295				list of strings that would normally be recognized as "yes" and "no"
2296				responses. For cased languages, this shall include only the lower
2297				case version. POSIX locale generation tools must generate the upper
2298				case equivalents, and the abbreviated versions, and add the English
2299				words wherever they do not conflict. Examples:
2300				<ul>
2301					<li>ja → ja:Ja:j:J:yes:Yes:y:Y</li>
2302					<li>ja → ja:Ja:j:J:yes:Yes // exclude y:Y if it conflicts with
2303						the native "no".</li>
2304				</ul>
2305			</li>
2306
2307			<li>The older elements yesexpr and noexpr are deprecated. They
2308				should instead be generated from yesstr and nostr so that they match
2309				all the responses.</li>
2310		</ol>
2311
2312		<p>So for English, the appropriate strings and expressions would
2313			be as follows:</p>
2314
2315		<p>
2316			yesstr "yes:y"<br> nostr "no:n"
2317		</p>
2318
2319		<p>The generated yesexpr and noexpr would be:</p>
2320
2321		<p>
2322			<code>
2323				yesexpr "^([yY]([eE][sS])?)"<br>
2324			</code>
2325			This would match y,Y,yes,yeS,yEs,yES,Yes,YeS,YEs,YES.<br> <br>
2326			<code>noexpr "^([nN][oO]?)"</code>
2327			<br> This would match n,N,no,nO,No,NO.
2328		</p>
2329
2330
2331		<h2>
2332			8 <a name="Reference_Elements" href="#Reference_Elements">Reference
2333				Element</a>
2334		</h2>
2335
2336
2337		<p>(Use only in supplemental data; deprecated for ldml.dtd and
2338			locale data)</p>
2339		<p class="dtd">
2340			&lt;!ELEMENT references ( reference* ) &gt;<br> &lt;!ELEMENT
2341			reference ( #PCDATA ) &gt;<br> &lt;!ATTLIST reference type
2342			NMTOKEN #REQUIRED&gt;<br> &lt;!ATTLIST reference standard ( true
2343			| false ) #IMPLIED &gt;<br> &lt;!ATTLIST reference uri CDATA
2344			#IMPLIED &gt;
2345		</p>
2346
2347		<p>The references section supplies a central location for
2348			specifying references and standards. The uri should be supplied if at
2349			all possible. If not online, then a ISBN number should be supplied,
2350			such as in the following example:</p>
2351
2352		<p class="example">
2353			&lt;reference type="R2"
2354			uri="http://www.ur.se/nyhetsjournalistik/3lan.html"&gt;Landskoder på
2355			Internet&lt;/reference&gt;<br> &lt;reference type="R3"
2356			uri="URN:ISBN:91-47-04974-X"&gt;Svenska skrivregler&lt;/reference&gt;
2357		</p>
2358
2359
2360		<h2>
2361			9 <a name="Segmentations" href="#Segmentations">Segmentations</a>
2362		</h2>
2363
2364		<p class="dtd">&lt;!ELEMENT segmentations ( alias | segmentation*)
2365			&gt;</p>
2366		<p class="dtd">
2367			&lt;!ELEMENT segmentation ( alias | (variables?, segmentRules? ,
2368			exceptions?, suppressions?) | special*) &gt; <br> &lt;!ATTLIST
2369			segmentation type NMTOKEN #REQUIRED &gt;
2370		</p>
2371		<p class="dtd">&lt;!ELEMENT variables ( alias | variable*) &gt;</p>
2372		<p class="dtd">
2373			&lt;!ELEMENT variable ( #PCDATA ) &gt;<br> &lt;!ATTLIST variable
2374			id CDATA #REQUIRED &gt;
2375		</p>
2376		<p class="dtd">&lt;!ELEMENT segmentRules ( alias | rule*) &gt;</p>
2377		<p class="dtd">
2378			&lt;!ELEMENT rule ( #PCDATA ) &gt;<br> &lt;!ATTLIST rule id
2379			NMTOKEN #REQUIRED &gt;
2380		</p>
2381		<p class="dtd">&lt;!ELEMENT suppressions ( suppression* ) &gt;</p>
2382		<p class="dtd">&lt;!ATTLIST suppressions type NMTOKEN "standard"
2383			&gt;</p>
2384		<p class="dtd">&lt;!ATTLIST suppressions draft ( approved |
2385			contributed | provisional | unconfirmed ) #IMPLIED &gt;</p>
2386		<p class="dtd">&lt;!ELEMENT suppression ( #PCDATA ) &gt;</p>
2387
2388		<p>
2389			The segmentations element provides for segmentation of text into
2390			words, lines, or other segments. The structure is based on [<a
2391				href="http://www.unicode.org/reports/tr41/#UAX29">UAX29</a>]
2392			notation, but adapted to be machine-readable. It uses a list of
2393			variables (representing character classes) and a list of rules. Each
2394			must have an id attribute.
2395		</p>
2396
2397		<p>
2398			The rules in <i>root</i> implement the segmentations found in [<a
2399				href="http://www.unicode.org/reports/tr41/#UAX29">UAX29</a>] and [<a
2400				href="http://www.unicode.org/reports/tr41/#UAX14">UAX14</a>], for
2401			grapheme clusters, words, sentences, and lines. They can be
2402			overridden by rules in child locales.
2403		</p>
2404
2405		<p>Here is an example:</p>
2406
2407		<pre>&lt;segmentations&gt;
2408  &lt;segmentation type="GraphemeClusterBreak"&gt;
2409    &lt;variables&gt;
2410      &lt;variable id="$CR"&gt;\p{Grapheme_Cluster_Break=CR}&lt;/variable&gt;
2411      &lt;variable id="$LF"&gt;\p{Grapheme_Cluster_Break=LF}&lt;/variable&gt;
2412      &lt;variable id="$Control"&gt;\p{Grapheme_Cluster_Break=Control}&lt;/variable&gt;
2413      &lt;variable id="$Extend"&gt;\p{Grapheme_Cluster_Break=Extend}&lt;/variable&gt;
2414      &lt;variable id="$L"&gt;\p{Grapheme_Cluster_Break=L}&lt;/variable&gt;
2415      &lt;variable id="$V"&gt;\p{Grapheme_Cluster_Break=V}&lt;/variable&gt;
2416      &lt;variable id="$T"&gt;\p{Grapheme_Cluster_Break=T}&lt;/variable&gt;
2417      &lt;variable id="$LV"&gt;\p{Grapheme_Cluster_Break=LV}&lt;/variable&gt;
2418      &lt;variable id="$LVT"&gt;\p{Grapheme_Cluster_Break=LVT}&lt;/variable&gt;
2419    &lt;/variables&gt;
2420    &lt;segmentRules&gt;
2421      &lt;rule id="3"&gt; $CR × $LF &lt;/rule&gt;
2422      &lt;rule id="4"&gt; ( $Control | $CR | $LF ) ÷ &lt;/rule&gt;
2423      &lt;rule id="5"&gt; ÷ ( $Control | $CR | $LF ) &lt;/rule&gt;
2424      &lt;rule id="6"&gt; $L × ( $L | $V | $LV | $LVT ) &lt;/rule&gt;
2425      &lt;rule id="7"&gt; ( $LV | $V ) × ( $V | $T ) &lt;/rule&gt;
2426      &lt;rule id="8"&gt; ( $LVT | $T) × $T &lt;/rule&gt;
2427      &lt;rule id="9"&gt; × $Extend &lt;/rule&gt;
2428    &lt;/segmentRules&gt;
2429  &lt;/segmentation&gt;
2430...</pre>
2431
2432		<p>
2433			<b>Variables:</b> All variable ids must start with a $, and otherwise
2434			be valid identifiers according to the Unicode definitions in [<a
2435				href="http://www.unicode.org/reports/tr41/#UAX31">UAX31</a>]. The
2436			contents of a variable is a regular expression using variables and <a
2437				href="tr35.html#Unicode_Sets">UnicodeSet</a>s. The ordering of
2438			variables is important; they are evaluated in order from first to
2439			last (see <i><a href="#Segmentation_Inheritance">Section 9.1
2440					Segmentation Inheritance</a></i>). It is an error to use a variable before
2441			it is defined.
2442		</p>
2443
2444		<p>
2445			<b>Rules:</b> The contents of a rule uses the syntax of [<a
2446				href="http://www.unicode.org/reports/tr41/#UAX29">UAX29</a>]. The
2447			rules are evaluated in numeric id order (which may not be the order
2448			in which the appear in the file). The first rule that matches
2449			determines the status of a boundary position, that is, whether it
2450			breaks or not. Thus ÷ means a break is allowed; × means a break is
2451			forbidden. It is an error if the rule does not contain exactly one of
2452			these characters (except where a rule has no contents at all, or if
2453			the rule uses a variable that has not been defined.
2454		</p>
2455
2456		<p>There are some implicit rules:</p>
2457
2458		<ul>
2459			<li>The implicit initial rules are always "start-of-text ÷" and
2460				"÷ end-of-text"; these are not to be included explicitly.</li>
2461			<li>The implicit final rule is always "Any ÷ Any". This is not
2462				to be included explicitly.</li>
2463		</ul>
2464
2465		<blockquote>
2466			<p>
2467				<b>Note:</b> A rule like X Format* -&gt; X in [<a
2468					href="http://www.unicode.org/reports/tr41/#UAX29">UAX29</a>] and [<a
2469					href="http://www.unicode.org/reports/tr41/#UAX14">UAX14</a>] is not
2470				supported. Instead, this needs to be expressed as normal regular
2471				expressions. The normal way to support this is to modify the
2472				variables, such as in the following example:
2473			</p>
2474
2475			<pre id="line870">&lt;variable id="$Format"&gt;\p{Word_Break=Format}&lt;/variable&gt;
2476&lt;variable id="$Katakana"&gt;\p{Word_Break=Katakana}&lt;/variable&gt;
2477...
2478&lt;!-- In place of rule 3, add format and extend to everything --&gt;
2479&lt;variable id="$X"&gt;[$Format $Extend]*&lt;/variable&gt;
2480&lt;variable id="$Katakana"&gt;($Katakana $X)&lt;/variable&gt;
2481&lt;variable id="$ALetter"&gt;($ALetter $X)&lt;/variable&gt;
2482...</pre>
2483		</blockquote>
2484
2485		<h3>
2486			9.1 <a name="Segmentation_Inheritance"
2487				href="#Segmentation_Inheritance">Segmentation Inheritance</a>
2488		</h3>
2489
2490
2491		<p>Variables and rules both inherit from the parent.</p>
2492
2493		<p>
2494			<b>Variables:</b> The child&#39;s variable list is logically appended
2495			to the parent&#39;s, and evaluated in that order. For example:
2496		</p>
2497
2498		<p>
2499			<font color="#0000FF"><code>// in parent</code></font>
2500			<code>
2501				<br> &lt;variable id="$AL"&gt;[:linebreak=AL:]&lt;/variable&gt;<br>
2502				&lt;variable id="$YY"&gt;[[:linebreak=XX:]$AL]&lt;/variable&gt;
2503			</code>
2504			<font color="#0000FF"><code>// adds $AL</code></font>
2505		</p>
2506
2507		<p>
2508			<font color="#0000FF"><code>// in child</code></font>
2509			<code>
2510				<br> &lt;variable id="$AL"&gt;[$AL &amp;&amp;
2511				[^a-z]]&lt;/variable&gt; <font color="#0000FF">// changes
2512					$AL, does not affect $YY</font><br> &lt;variable
2513				id="$ABC"&gt;[abc]&lt;/variable&gt;
2514			</code>
2515			<font color="#0000FF"><code>// adds new rule</code></font>
2516		</p>
2517
2518		<p>
2519			<b>Rules:</b> The rules are also logically appended to the
2520			parent&#39;s. Because rules are evaluated in numeric id order, to
2521			insert a rule in between others just requires using an intermediate
2522			number. For example, to insert a rule after id="10.1" and before
2523			id="10.2", just use id="10.15". To delete a rule, use empty contents,
2524			such as:
2525		</p>
2526
2527		<p>
2528			<code>&lt;rule id="3"/&gt;</code>
2529			<font color="#0000FF"><code> // deletes rule 3</code></font>
2530		</p>
2531
2532
2533		<h3>
2534			9.2 <a name="Segmentation_Exceptions" href="#Segmentation_Exceptions">Segmentation
2535				Suppressions </a>
2536		</h3>
2537
2538		<p>
2539			<b>Note:</b> As of CLDR 26, the
2540			<code>&lt;suppressions&gt;</code>
2541			data is to be considered a technology preview. Data currently in CLDR
2542			was extracted from the Unicode Localization Interoperability project,
2543			or ULI. See <a href="http://uli.unicode.org">http://uli.unicode.org</a>
2544			for more information on the ULI project.
2545		</p>
2546
2547		<p>
2548			The segmentation <b>suppressions</b> list provides a set of cases
2549			which, though otherwise identified as a segment by rules, should be
2550			skipped (suppressed) during segmentation.
2551		</p>
2552
2553		<p>For example, in the English phrase "Mr. Smith", CLDR
2554			segmentation rules would normally find a Sentence Break between "Mr"
2555			and "Smith". However, typically, "Mr." is just an abbreviation for
2556			"Mister", and not actually the end of a sentence.</p>
2557
2558		<p>
2559			Each suppression has a separate
2560			<code>&lt;suppression&gt;</code>
2561			element, whose contents are the break to be skipped.
2562		</p>
2563
2564		<p>Example:</p>
2565
2566		<pre>
2567    &lt;segmentation type="SentenceBreak"&gt;
2568      &lt;suppressions type="standard" draft="provisional"&gt;
2569        &lt;suppression&gt;Maj.&lt;/suppression&gt;
2570        &lt;suppression&gt;Mr.&lt;/suppression&gt;
2571        &lt;suppression&gt;Lt.Cdr.&lt;/suppression&gt;
2572	. . .
2573      &lt;/suppressions&gt;
2574    &lt;/segmentation&gt;
2575                </pre>
2576
2577		<p>
2578			<b>Note:</b> These elements were called
2579			<code>&lt;exceptions&gt;</code>
2580			and
2581			<code>&lt;exception&gt;</code>
2582			prior to CLDR 26, but those names are now deprecated.
2583		</p>
2584
2585		<h2>
2586			10 <a name="Transforms" href="#Transforms">Transforms</a>
2587		</h2>
2588
2589
2590		<p>
2591			Transforms provide a set of rules for transforming text via a
2592			specialized set of context-sensitive matching rules. They are
2593			commonly used for transliterations or transcriptions, but also other
2594			transformations such as full-width to half-width (for <i>katakana</i>
2595			characters). The rules can be simple one-to-one relationships between
2596			characters, or involve more complicated mappings. Here is an example:
2597		</p>
2598
2599		<pre>&lt;transform source="Greek" target="Latin" variant="UNGEGN" direction="both"&gt;
2600...
2601  &lt;comment&gt;Useful variables&lt;/comment&gt;
2602  &lt;tRule&gt;$gammaLike = [ΓΚΞΧγκξχϰ] ;&lt;/tRule&gt;
2603  &lt;tRule&gt;$egammaLike = [GKXCgkxc] ;&lt;/tRule&gt;
2604...
2605  &lt;comment&gt;Rules are predicated on running NFD first, and NFC afterwards&lt;/comment&gt;
2606  &lt;tRule&gt;::NFD (NFC) ;&lt;/tRule&gt;
2607...
2608  &lt;tRule&gt;λ ↔ l ;&lt;/tRule&gt;
2609  &lt;tRule&gt;Λ ↔ L ;&lt;/tRule&gt;
2610...
2611  &lt;tRule&gt;γ } $gammaLike ↔ n } $egammaLike ;&lt;/tRule&gt;
2612  &lt;tRule&gt;γ ↔ g ;&lt;/tRule&gt;
2613...
2614  &lt;tRule&gt;::NFC (NFD) ;&lt;/tRule&gt;
2615...
2616&lt;/transform&gt;</pre>
2617
2618		<p>The source and target values are valid locale identifiers,
2619			where &#39;und&#39; means an unspecified language, plus some
2620			additional extensions.</p>
2621
2622		<ul>
2623			<li>The long names of a script according to [<a
2624				href="http://www.unicode.org/reports/tr41/#UAX24">UAX24</a>] may be
2625				used instead of the short script codes. The script identifier may
2626				also omit und; that is, "und_Latn" may be written as just "Latn".
2627			</li>
2628
2629			<li>The long names of the English languages may also be used
2630				instead of the languages.</li>
2631
2632			<li>The term "Any" may be used instead of a solitary "und".</li>
2633
2634			<li>Other identifiers may be used for special purposes. In CLDR,
2635				these include: Accents, Digit, Fullwidth, Halfwidth, Jamo,
2636				NumericPinyin, Pinyin, Publishing, Tone. (Other than these values,
2637				valid private use locale identifiers should be used, such as
2638				"x-Special".)</li>
2639
2640			<li>When presenting localizing transform names, the "und_" is
2641				normally omitted. Thus for a transliterator with the ID
2642				"und_Latn-und_Grek" (or the equivalent "Latin-Greek"), the
2643				translated name for Greek would be Λατινικό-Ελληνικό.</li>
2644		</ul>
2645		<p>In version 29.0, BCP47 identifiers were added
2646			as aliases (while retaining the old identifiers). The following table
2647			shows the relationship between the old identifiers and the BCP47
2648			format identifiers.</p>
2649		<table class='simple'>
2650			<tbody>
2651				<tr>
2652					<th>Old ID</th>
2653					<th>BCP47 ID</th>
2654					<th>Comments</th>
2655				</tr>
2656				<tr>
2657					<td><strong>es_FONIPA</strong>-es_419_FONIPA</td>
2658					<td>es-419-fonipa-t-<strong>es-fonipa</strong></td>
2659					<td rowspan="2">The order reverses with -t-. That is, the
2660						language subtag part is what results.</td>
2661				</tr>
2662				<tr>
2663					<td><strong>hy_AREVMDA</strong>-hy_AREVMDA_FONIPA</td>
2664					<td>hy-arevmda-fonipa-t-<strong>hy-arevmda</strong></td>
2665				</tr>
2666				<tr>
2667					<td><strong>Devanagari</strong>-Latin</td>
2668					<td>und-Latn-t-<strong>und-deva</strong></td>
2669					<td rowspan="2">Scripts add <strong>und-</strong></td>
2670				</tr>
2671				<tr>
2672					<td><strong>Latin</strong>-Devanagari</td>
2673					<td>und-Deva-t-<strong>und-latn</strong></td>
2674				</tr>
2675				<tr>
2676					<td>Greek-Latin/UNGEGN</td>
2677					<td>und-Latn-t-und-grek-<strong>m0-ungegn</strong></td>
2678					<td>Variants use the <strong>-m0-</strong> key.
2679					</td>
2680				</tr>
2681				<tr>
2682					<td>Russian-Latin/BGN</td>
2683					<td>ru<strong>-Latn</strong>-t-ru-m0-bgn
2684					</td>
2685					<td>Languages will have a script when it isn’t the default.</td>
2686				</tr>
2687				<tr>
2688					<td>Any-Hex/xml</td>
2689					<td>und-t-<strong>d0-hex</strong>-m0-xml
2690					</td>
2691					<td rowspan="2"><strong>Any</strong> becomes <strong>und</strong>,
2692						and keys <strong>d0</strong> (destination) and <strong>s0</strong>
2693						(source) are used for non-locales.</td>
2694				</tr>
2695				<tr>
2696					<td>Hex-Any/xml</td>
2697					<td>und-t-<strong>s0-hex</strong>-m0-xml
2698					</td>
2699				</tr>
2700				<tr>
2701					<td>Any-<strong>Publishing</strong></td>
2702					<td>und-t-d0-<strong>publish</strong></td>
2703					<td rowspan="2">Non-locales are normally the lowercases of the
2704						old ID, but may change because of BCP47 length restrictions.</td>
2705				</tr>
2706				<tr>
2707					<td><strong>Publishing</strong>-Any</td>
2708					<td>und-t-s0-<strong>publish</strong></td>
2709				</tr>
2710			</tbody>
2711		</table>
2712		<p>Note that the script and region codes are cased
2713			iff they are in the main subtag, but are lowercase in extensions.</p>
2714		<h3>
2715			10.1 <a name="Inheritance" href="#Inheritance">Inheritance</a>
2716		</h3>
2717
2718		<p>The CLDR transforms are built using the following locale
2719			inheritance. While this inheritance is not required of LDML
2720			implementations, the transforms supplied with CLDR may not otherwise
2721			behave as expected without some changes.</p>
2722
2723		<p>For either the source or the target, the fallback starts from
2724			the maximized locale ID (using the likely-subtags data). It also uses
2725			the country for lookup before the base language is reached, and root
2726			is never accessed: instead the script(s) associated with the language
2727			are used. Where there are multiple scripts, the maximized script is
2728			tried first, and then the other scripts associated with the language
2729			(from supplemental data).</p>
2730
2731		<p>
2732			For example, see the bolded items below in the fallback chain for <strong>az_IR</strong>.
2733		</p>
2734
2735		<table>
2736			<tr>
2737				<th>&nbsp;</th>
2738				<th>Locale ID</th>
2739				<th>Comments</th>
2740			</tr>
2741			<tr>
2742				<td>1</td>
2743				<td><strong>az_Arab_IR</strong></td>
2744				<td>The maximized locale for az_IR</td>
2745			</tr>
2746			<tr>
2747				<td>2</td>
2748				<td>az_Arab</td>
2749				<td>Normal fallback</td>
2750			</tr>
2751			<tr>
2752				<td>3</td>
2753				<td><strong>az_IR</strong></td>
2754				<td>Inserted country locale</td>
2755			</tr>
2756			<tr>
2757				<td>4</td>
2758				<td>az</td>
2759				<td>Normal fallback</td>
2760			</tr>
2761			<tr>
2762				<td>5</td>
2763				<td><strong>Arab</strong></td>
2764				<td>Maximized script</td>
2765			</tr>
2766			<tr>
2767				<td>6</td>
2768				<td><strong>Cyrl</strong></td>
2769				<td>Other associated script</td>
2770			</tr>
2771		</table>
2772
2773		<p>The source, target, and variant use "laddered" fallback, where
2774			the source changes the most quickly (using the above rules), then the
2775			target (using the above rules), then the variant if any, is
2776			discarded. That is, in pseudo code:</p>
2777
2778		<ul>
2779			<li>for variant in {variant, ""}
2780				<ul>
2781					<li>for target in target-chain
2782						<ul>
2783							<li>for source in source-chain
2784								<ul>
2785									<li>transform = lookup source-target/variant</li>
2786									<li>if transform != null return transform</li>
2787								</ul>
2788							</li>
2789						</ul>
2790					</li>
2791				</ul>
2792			</li>
2793		</ul>
2794
2795		<p>
2796			For example, here is the fallback chain for <strong>ru_RU-el_GR/BGN</strong>.
2797		</p>
2798		<div align="center">
2799			<table>
2800				<tr>
2801					<th>source</th>
2802					<th>&nbsp;</th>
2803					<th>target</th>
2804					<th>variant</th>
2805				</tr>
2806				<tr>
2807					<td>ru_RU</td>
2808					<td>-</td>
2809					<td>el_GR</td>
2810					<td>/BGN</td>
2811				</tr>
2812				<tr>
2813					<td>ru</td>
2814					<td>-</td>
2815					<td>el_GR</td>
2816					<td>/BGN</td>
2817				</tr>
2818				<tr>
2819					<td>Cyrl</td>
2820					<td>-</td>
2821					<td>el_GR</td>
2822					<td>/BGN</td>
2823				</tr>
2824				<tr>
2825					<td>ru_RU</td>
2826					<td>-</td>
2827					<td>el</td>
2828					<td>/BGN</td>
2829				</tr>
2830				<tr>
2831					<td>ru</td>
2832					<td>-</td>
2833					<td>el</td>
2834					<td>/BGN</td>
2835				</tr>
2836				<tr>
2837					<td>Cyrl</td>
2838					<td>-</td>
2839					<td>el</td>
2840					<td>/BGN</td>
2841				</tr>
2842				<tr>
2843					<td>ru_RU</td>
2844					<td>-</td>
2845					<td>Grek</td>
2846					<td>/BGN</td>
2847				</tr>
2848				<tr>
2849					<td>ru</td>
2850					<td>-</td>
2851					<td>Grek</td>
2852					<td>/BGN</td>
2853				</tr>
2854				<tr>
2855					<td>Cyrl</td>
2856					<td>-</td>
2857					<td>Grek</td>
2858					<td>/BGN</td>
2859				</tr>
2860				<tr>
2861					<td>ru_RU</td>
2862					<td>-</td>
2863					<td>el_GR</td>
2864					<td></td>
2865				</tr>
2866				<tr>
2867					<td>ru</td>
2868					<td>-</td>
2869					<td>el_GR</td>
2870					<td></td>
2871				</tr>
2872				<tr>
2873					<td>Cyrl</td>
2874					<td>-</td>
2875					<td>el_GR</td>
2876					<td></td>
2877				</tr>
2878				<tr>
2879					<td>ru_RU</td>
2880					<td>-</td>
2881					<td>el</td>
2882					<td></td>
2883				</tr>
2884				<tr>
2885					<td>ru</td>
2886					<td>-</td>
2887					<td>el</td>
2888					<td></td>
2889				</tr>
2890				<tr>
2891					<td>Cyrl</td>
2892					<td>-</td>
2893					<td>el</td>
2894					<td></td>
2895				</tr>
2896				<tr>
2897					<td>ru_RU</td>
2898					<td>-</td>
2899					<td>Grek</td>
2900					<td></td>
2901				</tr>
2902				<tr>
2903					<td>ru</td>
2904					<td>-</td>
2905					<td>Grek</td>
2906					<td></td>
2907				</tr>
2908				<tr>
2909					<td>Cyrl</td>
2910					<td>-</td>
2911					<td>Grek</td>
2912					<td></td>
2913				</tr>
2914			</table>
2915		</div>
2916		<p>Japanese and Korean are special, since they can
2917			be represented by combined script codes, such as ja_Jpan, ja_Hrkt,
2918			ja_Hira, or ja_Kana. These need to be considered in the above
2919			fallback chain as well.</p>
2920		<h4>
2921			10.1.1 <a name="Pivots" href="#Pivots">Pivots</a>
2922		</h4>
2923		<p>
2924			Transforms can also use <i>pivots</i>. These are used when there is
2925			no direct transform between a source and target, but there are
2926			transforms X-Y and Y-Z. In such a case, the transforms can be
2927			internally chained to get X-Y = X-Y;Y-Z. This is done explicitly with
2928			the Indic script transforms: to get Devanagari-Latin, internally it
2929			is done by transforming first from Devanagari to Interindic (an
2930			internal superset encoding for Indic scripts), then from Interindic
2931			to Latin. This allows there to be only N sets of transform rules for
2932			the Indic scripts: each one to and from Interindic. These pivots are
2933			explicitly represented in the CLDR transforms.</p>
2934		<p>Note that the characters currently used by Interindic are private use characters. To prevent those from “leaking” out into text, transforms converting from Interindic must ensure that they convert all the possible values used in Interindic.</p>
2935		<p>
2936			The pivots can also be produced automatically (implicitly), as a
2937			fallback. A particularly useful pivot is IPA, since that tends to
2938			preserve pronunciation. For example, <em>Czech to IPA</em> can be
2939			chained with <em>IPA to Katakana</em> to get <em>Czech to
2940				Katakana</em>.
2941		</p>
2942		<p>CLDR often has special forms of IPA: not just
2943			&quot;und-FONIPA&quot; but &quot;cs-FONIPA&quot;: specifically IPA
2944			that has come from Czech. These variants typically preserve some
2945			features of the source language — such as double consonants — that
2946			are indistinguishable from single consonants in that language, but
2947			that are often preserved in traditional transliterations. Thus when
2948			matching prospective pivots, FONIPA is treated specially. If there is
2949			an exact match, that match is used (such as cs-cs_FONIPA +
2950			cs_FONIPA-ko). Otherwise, the language is ignored, as for example in
2951			cs-cs_FONIPA + ru_FONIPA-ko.</p>
2952		<p>The interaction of implicit pivots and
2953			inheritance may result in a longer inheritance chain lookup than
2954			desired, so implementers may consider having some sort of caching
2955			mechanism to increase performance.</p>
2956		<h3>
2957			10.2 <a name="Variants" href="#Variants">Variants</a>
2958		</h3>
2959
2960		<p>
2961			Variants used in CLDR include UNGEGN and BGN, both indicating sources
2962			for transliterations. There is an additional attribute
2963			<code>private="true"</code>
2964			which is used to indicate that the transform is meant for internal
2965			use, and should not be displayed as a separate choice in a UI.
2966		</p>
2967
2968		<p>There are many different systems of transliteration. The goal
2969			for the "unqualified" script transliterations are</p>
2970
2971		<ol>
2972			<li>to be lossless when going to Latin and back</li>
2973			<li>to be as lossless as possible when going to other scripts</li>
2974			<li>to abide by a common standard as much as possible (possibly
2975				supplemented to meet goals 1 and 2).</li>
2976		</ol>
2977
2978		<p>Language-to-language transliterations, and variant
2979			script-to-script transliterations are generally transcriptions, and
2980			not expected to be lossless.</p>
2981
2982		<p>Additional transliterations may also be defined, such as
2983			customized language-specific transliterations (such as between
2984			Russian and French), or those that match a particular transliteration
2985			standard, such as the following:</p>
2986
2987		<ul>
2988			<li>UNGEGN - United Nations Group of Experts on Geographical
2989				Names</li>
2990			<li>BGN - United States Board on Geographic Names</li>
2991			<li>ISO9 - ISO/IEC 9</li>
2992			<li>ISO15915 - ISO/IEC 15915</li>
2993			<li>ISCII91 - ISCII 91</li>
2994			<li>KMOCT - South Korean Ministry of Culture &amp; Tourism</li>
2995			<li>USLC - US Library of Congress</li>
2996			<li>UKPCGN - Permanent Committee on Geographical Names for
2997				British Official Use</li>
2998			<li>RUGOST - Russian Main Administration of Geodesy and
2999				Cartography</li>
3000		</ul>
3001
3002		<p>
3003			The rules for transforms are described in Section 10.3 <a
3004				href="#Transform_Rules_Syntax">Transform Rules Syntax</a>. For more
3005			information on Transliteration, see <a
3006				href="http://cldr.unicode.org/index/cldr-spec/transliteration-guidelines">Transliteration
3007				Guidelines</a>.
3008		</p>
3009
3010		<h3>
3011			10.3 <a name="Transform_Rules_Syntax" href="#Transform_Rules_Syntax">Transform
3012				Rules Syntax</a>
3013		</h3>
3014
3015
3016		<p class="dtd">
3017			&lt;!ELEMENT transforms ( transform*) &gt;<br> &lt;!ELEMENT
3018			transform ((comment | tRule)*) &gt;<br> &lt;!ATTLIST transform
3019			source CDATA #IMPLIED &gt;<br> &lt;!ATTLIST transform target
3020			CDATA #IMPLIED &gt;<br> &lt;!ATTLIST transform variant CDATA
3021			#IMPLIED &gt;<br> &lt;!ATTLIST transform direction ( forward |
3022			backward | both ) "both" &gt;<br> &lt;!ATTLIST
3023				transform alias CDATA #IMPLIED &gt; <br>   &lt;!--@VALUE--&gt;
3024				<br> &lt;!ATTLIST transform backwardAlias CDATA #IMPLIED &gt; <br>
3025				  &lt;!--@VALUE--&gt;
3026			<br> &lt;!ATTLIST transform visibility ( internal | external )
3027			"external" &gt;<br> &lt;!ELEMENT comment (#PCDATA) &gt;<br>
3028			&lt;!ELEMENT tRule (#PCDATA) &gt;
3029		</p>
3030		<p>
3031			The transform attributes indicate the <strong>source</strong>, <strong>target</strong>,
3032			<strong>direction</strong>, and <strong>alias</strong>es. For
3033			example:
3034		</p>
3035		<p class='example'>
3036			&lt;transform<br>   source=&quot;ja_Hrkt&quot;<br>  
3037			target=&quot;ja_Latn&quot;<br>   variant=&quot;BGN&quot;<br>
3038			  direction=&quot;forward&quot;<br>  
3039			draft=&quot;provisional&quot;<br>  
3040			alias=&quot;Katakana-Latin/BGN ja-Latn-t-ja-hrkt-m0-bgn&quot;&gt;
3041		</p>
3042		<p>
3043			The direction is either <strong>forward</strong> or <strong>both</strong>
3044			(<strong>backward</strong> is possible in theory, but not used). This
3045			indicates which directions the rules support.
3046		</p>
3047		<p>
3048			If the direction is <strong>forward</strong>, then an ID is composed
3049			from <strong>target + &quot;-&quot; + source + &quot;/&quot;
3050				+ variant</strong>. If the direction is <strong>both</strong>, then the
3051			inverse ID is also value: <strong>source + &quot;-&quot; +
3052				target + &quot;/&quot; + variant</strong>. The <strong>alias</strong>
3053			attribute contains a space-delimited list of alternant forward IDs,
3054			while the <strong>backwardAlias</strong> contains a space-delimited
3055			list of alternant backward IDs. The BCP47 versions of the IDs will be
3056			in the <strong>alias</strong> and/or <strong>backwardAlias</strong>
3057			attributes.
3058		</p>
3059		<p>
3060			The <strong>visibility</strong> attribute indicates whether the IDs
3061			should be externally visible, or whether they are only used
3062			internally.
3063		</p>
3064		<p>In previous versions, the rules were expressed
3065			as fine-grained XML. That was discarded in CLDR version 29, in favor
3066			of a simpler format where the separate rules are simply terminated
3067			with &quot;;&quot;.</p>
3068		<p>
3069			The transform rules are similar to regular-expression substitutions,
3070			but adapted to the specific domain of text transformations. The rules
3071			and comments in this discussion will be intermixed, with # marking
3072			the comments. The simplest rule is a
3073			conversion rule, which replaces one string of characters with
3074			another. The conversion rule takes the following form:
3075		</p>
3076
3077		<table cellspacing="0" cellpadding="8" border="1">
3078			<tr>
3079				<td valign="top" bgcolor="#eeeeee"><code>xy → z ;</code></td>
3080			</tr>
3081		</table>
3082
3083		<p>This converts any substring "xy" into "z". Rules are executed
3084			in order; consider the following rules:</p>
3085
3086		<table cellspacing="0" cellpadding="8" border="1">
3087			<tr>
3088				<td valign="top" bgcolor="#eeeeee"><code>
3089						sch → sh ;<br> ss → z ;
3090					</code></td>
3091			</tr>
3092		</table>
3093
3094		<p>This conversion rule transforms "bass school" into "baz shool".
3095			The transform walks through the string from start to finish. Thus
3096			given the rules above "bassch" will convert to "bazch", because the
3097			"ss" rule is found before the "sch" rule in the string (later, we'll
3098			see a way to override this behavior). If two rules can both apply at
3099			a given point in the string, then the transform applies the first
3100			rule in the list.</p>
3101
3102		<p>All of the ASCII characters except numbers and letters are
3103			reserved for use in the rule syntax, as are the characters →, ←, ↔.
3104			Normally, these characters do not need to be converted. However, to
3105			convert them use either a pair of single quotes or a slash. The pair
3106			of single quotes can be used to surround a whole string of text. The
3107			slash affects only the character immediately after it. For example,
3108			to convert from a U+2190 ( ← ) LEFTWARDS ARROW to the string "arrow
3109			sign" (with a space), use one of the following rules:</p>
3110
3111		<table cellspacing="0" cellpadding="8" border="1">
3112			<tr>
3113				<td valign="top" bgcolor="#eeeeee"><code>
3114						\←&nbsp;&nbsp; →&nbsp; arrow\ sign ;<br> '←'&nbsp;&nbsp;
3115						→&nbsp;&nbsp; 'arrow sign' ;<br> '←'&nbsp;&nbsp;
3116						→&nbsp;&nbsp; arrow' 'sign ;
3117					</code></td>
3118			</tr>
3119		</table>
3120
3121		<p>Spaces may be inserted anywhere without any effect on the
3122			rules. Use extra space to separate items out for clarity without
3123			worrying about the effects. This feature is particularly useful with
3124			combining marks; it is handy to put some spaces around it to separate
3125			it from the surrounding text. The following is an example:</p>
3126
3127		<table cellspacing="0" cellpadding="8" border="1">
3128			<tr>
3129				<td valign="top" bgcolor="#eeeeee"><code>&nbsp;→ i ; #
3130						an iota-subscript diacritic turns into an i.</code></td>
3131			</tr>
3132		</table>
3133
3134		<p>For a real space in the rules, place quotes around it. For a
3135			real backslash, either double it \\, or quote it '\'. For a real
3136			single quote, double it '', or place a backslash before it \'.</p>
3137
3138		<p>Any text that starts with a hash mark and concludes a line is a
3139			comment. Comments help document how the rules work. The following
3140			shows a comment in a rule:</p>
3141
3142		<table cellspacing="0" cellpadding="8" border="1">
3143			<tr>
3144				<td valign="top" bgcolor="#eeeeee"><code>x → ks ; #
3145						change every x into ks</code></td>
3146			</tr>
3147		</table>
3148
3149		<p>The “\u” and “\x” hex notations can be used instead of any
3150			letter. For instance, instead of using the Greek π, one could write
3151			either of the following:</p>
3152
3153		<table cellspacing="0" cellpadding="8" border="1">
3154			<tr>
3155				<td valign="top" bgcolor="#eeeeee"><code>
3156						\u03C0 → p ;<br> \x{3C0} → p ;
3157					</code></td>
3158			</tr>
3159		</table>
3160
3161		<p>One can also define and use variables, such as:</p>
3162
3163		<table cellspacing="0" cellpadding="8" border="1">
3164			<tr>
3165				<td valign="top" bgcolor="#eeeeee"><code>
3166						$pi = \u03C0 ;<br> $pi → p ;
3167					</code></td>
3168			</tr>
3169		</table>
3170
3171		<h4>
3172			10.3.1 <a name="Dual_Rules" href="#Dual_Rules">Dual Rules</a>
3173		</h4>
3174		<p>Rules can also specify what happens when an inverse transform
3175			is formed. To do this, we reverse the direction of the "←" sign. Thus
3176			the above example becomes:</p>
3177
3178		<table cellspacing="0" cellpadding="8">
3179			<tr>
3180				<td valign="top" bgcolor="#eeeeee"><code>$pi ← p ;</code></td>
3181			</tr>
3182		</table>
3183
3184		<p>With the inverse transform, "p" will convert to the Greek p.
3185			These two directions can be combined together into a dual conversion
3186			rule by using the "↔" operator, yielding:</p>
3187
3188		<table cellspacing="0" cellpadding="8" border="1">
3189			<tr>
3190				<td valign="top" bgcolor="#eeeeee"><code>$pi ↔ p ;</code></td>
3191			</tr>
3192		</table>
3193
3194		<h4>
3195			10.3.2 <a name="Context" href="#Context">Context</a>
3196		</h4>
3197
3198		<p>Context can be used to have the results of a transformation be
3199			different depending on the characters before or after. The following
3200			rule removes hyphens, but only when they follow lowercase characters:
3201		</p>
3202
3203		<table cellspacing="0" cellpadding="8" border="1">
3204			<tr>
3205				<td valign="top" bgcolor="#eeeeee"><code> [:Lowercase:]
3206						{ '-' → ; </code></td>
3207			</tr>
3208		</table>
3209
3210		<p>Contexts can be before or after or both, such as in a rule to
3211			remove hyphens between lowercase and uppercase letters:</p>
3212		<table cellspacing="0" cellpadding="8" border="1">
3213			<tr>
3214				<td valign="top" bgcolor="#eeeeee"><code>[:Lowercase:] {
3215						'-' } [:Uppercase:] → ;</code></td>
3216			</tr>
3217		</table>
3218		<p>Each context is optional and may be empty; the following two
3219			rules are equivalent:</p>
3220		<table cellspacing="0" cellpadding="8" border="1">
3221			<tr>
3222				<td valign="top" bgcolor="#eeeeee"><code>
3223						$pi ↔ p ;<br> {$pi} ↔ {p} ;
3224					</code></td>
3225			</tr>
3226		</table>
3227		<p>
3228			The context itself ([:
3229			<code> Lowercase </code>
3230			:]) is unaffected by the replacement; only the text within braces is
3231			changed.
3232		</p>
3233		<p>
3234			Character classes (UnicodeSets) in the contexts can contain the
3235			special symbol $, which means “off either end of the string”. It is
3236			roughly similar to $ and ^ in regex. Unlike normal regex, however, it
3237			can occur in character classes. Thus the following rule removes
3238			hyphens that are after lowercase characters, <em>or</em> are at the
3239			start of a string.
3240		</p>
3241		<table cellspacing="0" cellpadding="8" border="1">
3242			<tr>
3243				<td valign="top" bgcolor="#eeeeee"><code>[[:Lowercase:]$]
3244						{'-' → ;</code></td>
3245			</tr>
3246		</table>
3247
3248		<p>
3249			Thus the negation of a UnicodeSet will normally also match before or
3250			after the end of a string. The following will remove hyphens that are
3251			not after lowercase characters<em>, including hyphens at the
3252				start of a string</em>.
3253		</p>
3254		<table cellspacing="0" cellpadding="8" border="1">
3255			<tr>
3256				<td valign="top" bgcolor="#eeeeee"><code>[^[:Lowercase:]]
3257						{'-' → ;</code></td>
3258			</tr>
3259		</table>
3260		<p>It will thus convert “-B A-B a-b” to “B AB a-b”.</p>
3261		<h4>
3262			10.3.3 <a name="Revisiting" href="#Revisiting">Revisiting</a>
3263		</h4>
3264
3265		<p>If the resulting text contains a vertical bar "|", then that
3266			means that processing will proceed from that point and that the
3267			transform will revisit part of the resulting text. Thus the | marks a
3268			"cursor" position. For example, if we have the following, then the
3269			string "xa" will convert to "w".</p>
3270
3271		<table cellspacing="0" cellpadding="8" border="1">
3272			<tr>
3273				<td valign="top" bgcolor="#eeeeee"><code>
3274						x → y | z ;<br> z a → w;
3275					</code></td>
3276			</tr>
3277		</table>
3278
3279		<p>First, "xa" is converted to "yza". Then the processing will
3280			continue from after the character "y", pick up the "za", and convert
3281			it. Had we not had the "|", the result would have been simply "yza".
3282			The '@' character can be used as filler character to place the
3283			revisiting point off the start or end of the string. Thus the
3284			following causes x to be replaced, and the cursor to be backed up by
3285			two characters.</p>
3286
3287		<table cellspacing="0" cellpadding="8" border="1">
3288			<tr>
3289				<td valign="top" bgcolor="#eeeeee"><code>x → |@@y;</code></td>
3290			</tr>
3291		</table>
3292
3293		<h4>
3294			10.3.4 <a name="Example" href="#Example">Example</a>
3295		</h4>
3296
3297		<p>The following shows how these features are combined together in
3298			the Transliterator "Any-Publishing". This transform converts the
3299			ASCII typewriter conventions into text more suitable for desktop
3300			publishing (in English). It turns straight quotation marks or UNIX
3301			style quotation marks into curly quotation marks, fixes multiple
3302			spaces, and converts double-hyphens into a dash.</p>
3303
3304		<table cellspacing="0" cellpadding="8" border="1">
3305			<tr>
3306				<td valign="top" bgcolor="#eeeeee"><code>
3307						# Variables<br> <br> $single = \' ;<br> $space = '
3308						' ;<br> $double = \" ;<br> $back = \` ;<br> $tab =
3309						'\u0008' ;<br> <br> # the following is for spaces, line
3310						ends, (, [, {, ...<br> $makeRight = [[:separator:][:start
3311						punctuation:][:initial punctuation:]] ;<br> <br> # fix
3312						UNIX quotes<br> <br> $back $back → “ ; # generate right
3313						d.q.m. (double quotation mark)<br> $back → ‘ ;<br> <br>
3314						# fix typewriter quotes, by context<br> <br> $makeRight
3315						{ $double ↔ “ ; # convert a double to right d.q.m. after certain
3316						chars<br> ^ { $double → “ ; # convert a double at the start
3317						of the line.<br> $double ↔ ” ; # otherwise convert to a left
3318						q.m.<br> <br> $makeRight {$single} ↔ ‘ ; # do the same
3319						for s.q.m.s<br> ^ {$single} → ‘ ;<br> $single ↔ ’;<br>
3320						<br> # fix multiple spaces and hyphens<br> <br>
3321						$space {$space} → ; # collapse multiple spaces<br> '--' ↔ — ;
3322						# convert fake dash into real one
3323					</code></td>
3324			</tr>
3325		</table>
3326		<p>There is an online demo where the rules can be tested, at:</p>
3327		<p>
3328			<a target="demo" href="http://unicode.org/cldr/utility/transform.jsp">http://unicode.org/cldr/utility/transform.jsp</a>
3329		</p>
3330		<h4>
3331			10.3.5 <a name="Rule_Syntax" href="#Rule_Syntax">Rule Syntax</a>
3332		</h4>
3333
3334		<p>The following describes the full format of the list of rules
3335			used to create a transform. Each rule in the list is terminated by a
3336			semicolon. The list consists of the following:</p>
3337
3338		<ul>
3339			<li>an optional filter rule</li>
3340			<li>zero or more transform rules</li>
3341			<li>zero or more variable-definition rules</li>
3342			<li>zero or more conversion rules</li>
3343			<li>an optional inverse filter rule</li>
3344		</ul>
3345
3346		<p>The filter rule, if present, must appear at the beginning of
3347			the list, before any of the other rules.&nbsp; The inverse filter
3348			rule, if present, must appear at the end of the list, after all of
3349			the other rules.&nbsp; The other rules may occur in any order and be
3350			freely intermixed.</p>
3351
3352		<p>The rule list can also generate the inverse of the transform.
3353			In that case, the inverse of each of the rules is used, as described
3354			below.</p>
3355
3356		<h4>
3357			10.3.6 <a name="Transform_Rules" href="#Transform_Rules">Transform
3358				Rules</a>
3359		</h4>
3360
3361		<p>Each transform rule consists of two colons followed by a
3362			transform name, which is of the form source-target. For example:</p>
3363
3364		<table cellspacing="0" cellpadding="8" border="1">
3365			<tr>
3366				<td valign="top" bgcolor="#eeeeee"><code>
3367						:: NFD ;<br> :: und_Latn-und_Greek ;<br> :: Latin-Greek;
3368						# alternate form
3369					</code></td>
3370			</tr>
3371		</table>
3372
3373		<p>If either the source or target is 'und', it can be omitted,
3374			thus 'und_NFC' is equivalent to 'NFC'. For compatibility, the English
3375			names for scripts can be used instead of the und_Latn locale name,
3376			and "Any" can be used instead of "und". Case is not significant.</p>
3377
3378		<p>The following transforms are defined not by rules, but by the
3379			operations in the Unicode Standard, and may be used in building any
3380			other transform:</p>
3381
3382		<blockquote>
3383			<b>Any-NFC, Any-NFD, Any-NFKD, Any-NFKC</b> - the normalization forms
3384			defined by [<a href="http://www.unicode.org/reports/tr41/#UAX15">UAX15</a>].<br>
3385			<p>
3386				<b>Any-Lower, Any-Upper, Any-Title</b> - full case transformations,
3387				defined by [<a href="tr35.html#Unicode">Unicode</a>] Chapter 3.
3388			</p>
3389		</blockquote>
3390
3391		<p>In addition, the following special cases are defined:</p>
3392
3393		<blockquote>
3394			<b>Any-Null</b> - has no effect; that is, each character is left
3395			alone.<br> <b>Any-Remove</b> - maps each character to the empty
3396			string; this, removes each character.
3397		</blockquote>
3398
3399		<p>The inverse of a transform rule uses parentheses to indicate
3400			what should be done when the inverse transform is used. For example:</p>
3401
3402		<table cellspacing="0" cellpadding="8" border="1">
3403			<tr>
3404				<td valign="top" bgcolor="#eeeeee"><code>
3405						:: lower () ; # only executed for the normal<br> :: (lower) ;
3406						# only executed for the inverse<br> :: lower ; # executed for
3407						both the normal and the inverse
3408					</code></td>
3409			</tr>
3410		</table>
3411
3412		<h4>
3413			10.3.7 <a name="Variable_Definition_Rules"
3414				href="#Variable_Definition_Rules">Variable Definition Rules</a>
3415		</h4>
3416
3417		<p>Each variable definition is of the following form:</p>
3418
3419		<table cellspacing="0" cellpadding="8" border="1">
3420			<tr>
3421				<td valign="top" bgcolor="#eeeeee"><code>$variableName =
3422						contents ;</code></td>
3423			</tr>
3424		</table>
3425
3426		<p>
3427			The variable name can contain letters and digits, but must start with
3428			a letter. More precisely, the variable names use Unicode identifiers
3429			as defined by [<a href="http://www.unicode.org/reports/tr41/#UAX31">UAX31</a>].
3430			The identifier properties allow for the use of foreign letters and
3431			numbers.
3432		</p>
3433
3434		<p>The contents of a variable definition is any sequence of
3435			Unicode sets and characters or characters. For example:</p>
3436
3437		<table cellspacing="0" cellpadding="8" border="1">
3438			<tr>
3439				<td valign="top" bgcolor="#eeeeee"><code>$mac = M [aA]
3440						[cC] ;</code></td>
3441			</tr>
3442		</table>
3443
3444		<p>Variables are only replaced within other variable definition
3445			rules and within conversion rules. They have no effect on
3446			transliteration rules.</p>
3447
3448		<h4>
3449			10.3.8 <a name="Filter_Rules" href="#Filter_Rules">Filter Rules</a>
3450		</h4>
3451
3452		<p>A filter rule consists of two colons followed by a UnicodeSet.
3453			This filter is global in that only the characters matching the filter
3454			will be affected by any transform rules or conversion rules. The
3455			inverse filter rule consists of two colons followed by a UnicodeSet
3456			in parentheses. This filter is also global for the inverse transform.</p>
3457
3458		<p>For example, the Hiragana-Latin transform can be implemented by
3459			"pivoting" through the Katakana converter, as follows:</p>
3460
3461		<table cellspacing="0" cellpadding="8" border="1">
3462			<tr>
3463				<td valign="top" bgcolor="#eeeeee"><code>
3464						:: [:^Katakana:] ; # do not touch any katakana that was in the
3465						text!<br> :: Hiragana-Katakana;<br> :: Katakana-Latin;<br>
3466						:: ([:^Katakana:]) ; # do not touch any katakana that was in the
3467						text<br>
3468						&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
3469						# for the inverse either!
3470					</code></td>
3471			</tr>
3472		</table>
3473
3474		<p>The filters keep the transform from mistakenly converting any
3475			of the "pivot" characters. Note that this is a case where a rule list
3476			contains no conversion rules at all, just transform rules and
3477			filters.</p>
3478
3479		<h4>
3480			10.3.9 <a name="Conversion_Rules" href="#Conversion_Rules">Conversion
3481				Rules</a>
3482		</h4>
3483
3484		<p>Conversion rules can be forward, backward, or double. The
3485			complete conversion rule syntax is described below:</p>
3486
3487		<p>
3488			<b>Forward</b>
3489		</p>
3490
3491		<blockquote>
3492			<p>A forward conversion rule is of the following form:</p>
3493
3494			<blockquote>
3495				<pre>before_context { text_to_replace } after_context → completed_result | result_to_revisit ;</pre>
3496			</blockquote>
3497
3498			<p>If there is no before_context, then the "{" can be omitted. If
3499				there is no after_context, then the "}" can be omitted. If there is
3500				no result_to_revisit, then the "|" can be omitted. A forward
3501				conversion rule is only executed for the normal transform and is
3502				ignored when generating the inverse transform.</p>
3503		</blockquote>
3504
3505		<p>
3506			<b>Backward</b>
3507		</p>
3508
3509		<blockquote>
3510			<p>A backward conversion rule is of the following form:</p>
3511
3512			<blockquote>
3513				<pre>completed_result | result_to_revisit ← before_context { text_to_replace } after_context ;</pre>
3514			</blockquote>
3515
3516			<p>The same omission rules apply as in the case of forward
3517				conversion rules. A backward conversion rule is only executed for
3518				the inverse transform and is ignored when generating the normal
3519				transform.</p>
3520		</blockquote>
3521
3522		<p>
3523			<b>Dual</b>
3524		</p>
3525		<blockquote>
3526			<p>A dual conversion rule combines a forward conversion rule and
3527				a backward conversion rule into one, as discussed above. It is of
3528				the form:</p>
3529
3530			<table cellspacing="0" cellpadding="8" border="1">
3531				<tr>
3532					<td valign="top" bgcolor="#eeeeee"><code>a { b | c } d
3533							↔ e { f | g } h ;</code></td>
3534				</tr>
3535			</table>
3536
3537			<p>When generating the normal transform and the inverse, the
3538				revisit mark "|" and the before and after contexts are ignored on
3539				the sides where they do not belong. Thus, the above is exactly
3540				equivalent to the sequence of the following two rules:</p>
3541
3542			<table cellspacing="0" cellpadding="8" border="1">
3543				<tr>
3544					<td valign="top" bgcolor="#eeeeee"><code>
3545							a { b c } d&nbsp; →&nbsp; f | g&nbsp; ;<br> b | c&nbsp;
3546							←&nbsp; e { f g } h ;&nbsp;
3547						</code></td>
3548				</tr>
3549			</table>
3550		</blockquote>
3551
3552		<h4>
3553			10.3.10 <a name="Intermixing_Transform_Rules_and_Conversion_Rules"
3554				href="#Intermixing_Transform_Rules_and_Conversion_Rules">
3555				Intermixing Transform Rules and Conversion Rules</a>
3556		</h4>
3557
3558		<p>Transform rules and conversion rules may be freely intermixed.
3559			Inserting a transform rule into the middle of a set of conversion
3560			rules has an important side effect.</p>
3561
3562		<p>Normally, conversion rules are considered together as a
3563			group.&nbsp; The only time their order in the rule set is important
3564			is when more than one rule matches at the same point in the
3565			string.&nbsp; In that case, the one that occurs earlier in the rule
3566			set wins.&nbsp; In all other situations, when multiple rules match
3567			overlapping parts of the string, the one that matches earlier wins.</p>
3568
3569		<p>Transform rules apply to the whole string.&nbsp; If you have
3570			several transform rules in a row, the first one is applied to the
3571			whole string, then the second one is applied to the whole string, and
3572			so on.&nbsp; To reconcile this behavior with the behavior of
3573			conversion rules, transform rules have the side effect of breaking a
3574			surrounding set of conversion rules into two groups: First all of the
3575			conversion rules before the transform rule are applied as a group to
3576			the whole string in the usual way, then the transform rule is applied
3577			to the whole string, and then the conversion rules after the
3578			transform rule are applied as a group to the whole string.&nbsp; For
3579			example, consider the following rules:</p>
3580
3581		<table cellspacing="0" cellpadding="8" border="1">
3582			<tr>
3583				<td valign="top" bgcolor="#eeeeee"><code>
3584						abc → xyz;<br> xyz → def;<br> ::Upper;
3585					</code></td>
3586			</tr>
3587		</table>
3588
3589		<p>If you apply these rules to “abcxyz”, you get “XYZDEF”.&nbsp;
3590			If you move the “::Upper;” to the middle of the rule set and change
3591			the cases accordingly, then applying this to “abcxyz” produces
3592			“DEFDEF”.</p>
3593
3594		<table cellspacing="0" cellpadding="8" border="1">
3595			<tr>
3596				<td valign="top" bgcolor="#eeeeee"><code>
3597						abc → xyz;<br> ::Upper;<br> XYZ → DEF;
3598					</code></td>
3599			</tr>
3600		</table>
3601
3602		<p>This is because “::Upper;” causes the transliterator to reset
3603			to the beginning of the string. The first rule turns the string into
3604			“xyzxyz”, the second rule upper cases the whole thing to “XYZXYZ”,
3605			and the third rule turns this into “DEFDEF”.</p>
3606
3607		<p>This can be useful when a transform naturally occurs in
3608			multiple “passes.”&nbsp; Consider this rule set:</p>
3609
3610		<table cellspacing="0" cellpadding="8" border="1">
3611			<tr>
3612				<td valign="top" bgcolor="#eeeeee"><code>
3613						[:Separator:]* → ' ';<br> 'high school' → 'H.S.';<br>
3614						'middle school' → 'M.S.';<br> 'elementary school' → 'E.S.';
3615					</code></td>
3616			</tr>
3617		</table>
3618
3619		<p>If you apply this rule to “high school”, you get “H.S.”, but if
3620			you apply it to “high&nbsp; school” (with two spaces), you just get
3621			“high school” (with one space). To have “high&nbsp; school” (with two
3622			spaces) turn into “H.S.”, you'd either have to have the first rule
3623			back up some arbitrary distance (far enough to see “elementary”, if
3624			you want all the rules to work), or you have to include the whole
3625			left-hand side of the first rule in the other rules, which can make
3626			them hard to read and maintain:</p>
3627
3628		<table cellspacing="0" cellpadding="8" border="1">
3629			<tr>
3630				<td valign="top" bgcolor="#eeeeee"><code>
3631						$space = [:Separator:]*;<br> high $space school → 'H.S.';<br>
3632						middle $space school → 'M.S.';<br> elementary $space school →
3633						'E.S.';
3634					</code></td>
3635			</tr>
3636		</table>
3637
3638		<p>
3639			Instead, you can simply insert “
3640			<code>::Null;</code>
3641			” in order to get things to work right:
3642		</p>
3643
3644		<table cellspacing="0" cellpadding="8" border="1">
3645			<tr>
3646				<td valign="top" bgcolor="#eeeeee"><code>
3647						[:Separator:]* → ' ';<br> ::Null;<br> 'high school' →
3648						'H.S.';<br> 'middle school' → 'M.S.';<br> 'elementary
3649						school' → 'E.S.';
3650					</code></td>
3651			</tr>
3652		</table>
3653
3654		<p>The “::Null;” has no effect of its own (the null transform, by
3655			definition, does not do anything), but it splits the other rules into
3656			two “passes”: The first rule is applied to the whole string,
3657			normalizing all runs of white space into single spaces, and then we
3658			start over at the beginning of the string to look for the phrases.
3659			“high&nbsp;&nbsp;&nbsp; school” (with four spaces) gets correctly
3660			converted to “H.S.”.</p>
3661
3662		<p>This can also sometimes be useful with rules that have
3663			overlapping domains.&nbsp; Consider this rule set from before:</p>
3664
3665		<table cellspacing="0" cellpadding="8" border="1">
3666			<tr>
3667				<td valign="top" bgcolor="#eeeeee"><code>
3668						sch → sh ;<br> ss → z ;
3669					</code></td>
3670			</tr>
3671		</table>
3672
3673		<p>Apply this rule to “bassch” results in “bazch” because “ss”
3674			matches earlier in the string than “sch”. If you really wanted
3675			“bassh”—that is, if you wanted the first rule to win even when the
3676			second rule matches earlier in the string, you'd either have to add
3677			another rule for this special case...</p>
3678
3679		<table cellspacing="0" cellpadding="8" border="1">
3680			<tr>
3681				<td valign="top" bgcolor="#eeeeee"><code>
3682						sch → sh ;<br> ssch → ssh;<br> ss → z ;
3683					</code></td>
3684			</tr>
3685		</table>
3686
3687		<p>...or you could use a transform rule to apply the conversions
3688			in two passes:</p>
3689
3690		<table cellspacing="0" cellpadding="8" border="1">
3691			<tr>
3692				<td valign="top" bgcolor="#eeeeee"><code>
3693						sch → sh ;<br> ::Null;<br> ss → z ;
3694					</code></td>
3695			</tr>
3696		</table>
3697
3698		<h4>
3699			10.3.11 <a name="Inverse_Summary" href="#Inverse_Summary">Inverse
3700				Summary</a>
3701		</h4>
3702
3703		<p>The following table shows how the same rule list generates two
3704			different transforms, where the inverse is restated in terms of
3705			forward rules (this is a contrived example, simply to show the
3706			reordering):</p>
3707
3708		<table>
3709			<tr bgcolor="#99ccff">
3710				<th bgcolor="#cccccc">Original Rules</th>
3711				<th bgcolor="#cccccc">Forward</th>
3712				<th bgcolor="#cccccc">Inverse</th>
3713			</tr>
3714			<tr bgcolor="#99ccff">
3715				<td bgcolor="#eeeeee"><code>
3716						:: [:Uppercase Letter:] ;<br> :: latin-greek ;<br> ::
3717						greek-japanese ;<br> x ↔ y ;<br> z → w ;<br> r ← m
3718						; <br> :: upper;<br> a → b ;<br> c ↔ d ;<br>
3719						:: any-publishing ;<br> :: ([:Number:]) ;
3720					</code></td>
3721				<td bgcolor="#eeeeee"><code>
3722						:: [:Uppercase Letter:] ;<br> :: latin-greek ;<br> ::
3723						greek-japanese ;<br> x → y ;<br> z → w ;<br> ::
3724						upper ;<br> a → b ;<br> c → d ;<br> ::
3725						any-publishing ;<br>
3726					</code></td>
3727				<td bgcolor="#eeeeee"><code>
3728						:: [:Number:] ;<br> :: publishing-any ;<br> d → c ;<br>
3729						:: lower ;<br> y → x ;<br> m → r ;<br> ::
3730						japanese-greek ;<br> :: greek-latin ;<br>
3731					</code></td>
3732			</tr>
3733		</table>
3734
3735		<p>Note how the irrelevant rules (the inverse filter rule and the
3736			rules containing ←) are omitted (ignored, actually) in the forward
3737			direction, and notice how things are reversed: the transform rules
3738			are inverted and happen in the opposite order, and the groups of
3739			conversion rules are also executed in the opposite relative order
3740			(although the rules within each group are executed in the same
3741			order).</p>
3742
3743		<h2>
3744			11 <a name="ListPatterns" href="#ListPatterns">List Patterns</a>
3745		</h2>
3746
3747
3748		<p class="dtd">&lt;!ELEMENT listPatterns (alias | (listPattern*,
3749			special*)) &gt;</p>
3750
3751		<p class="dtd">
3752			&lt;!ELEMENT listPattern (alias | (listPatternPart*, special*)) &gt;<br>
3753			&lt;!ATTLIST listPattern type (NMTOKEN) #IMPLIED &gt;
3754		</p>
3755
3756		<p class="dtd">
3757			&lt;!ELEMENT listPatternPart ( #PCDATA ) &gt;<br> &lt;!ATTLIST
3758			listPatternPart type (start | middle | end | 2 | 3) #REQUIRED &gt;
3759		</p>
3760
3761		<p>List patterns can be used to format variable-length lists of
3762			things in a locale-sensitive manner, such as "Monday, Tuesday,
3763			Friday, and Saturday" (in English) versus "lundi, mardi, vendredi et
3764			samedi" (in French). For example, consider the following example:</p>
3765
3766		<pre class="example">&lt;listPatterns&gt;
3767 &lt;listPattern&gt;
3768  &lt;listPatternPart type="2"&gt;{0} and {1}&lt;/listPatternPart&gt;
3769  &lt;listPatternPart type="start"&gt;{0}, {1}&lt;/listPatternPart&gt;
3770  &lt;listPatternPart type="middle"&gt;{0}, {1}&lt;/listPatternPart&gt;
3771  &lt;listPatternPart type="end"&gt;{0}, and {1}&lt;/listPatternPart&gt;
3772 &lt;/listPattern&gt;
3773&lt;/listPatterns&gt;</pre>
3774
3775		<p>The data is used as follows: If there is a type type matches
3776			exactly the number of elements in the desired list (such as "2" in
3777			the above list), then use that pattern. Otherwise,</p>
3778
3779		<ol>
3780			<li>Format the last two elements with the "end" format.</li>
3781			<li>Then use middle format to add on subsequent elements working
3782				towards the front, all but the very first element. That is, {1} is
3783				what you've already done, and {0} is the previous element.</li>
3784			<li>Then use "start" to add the front element, again with {1} as
3785				what you've done so far, and {0} is the first element.</li>
3786		</ol>
3787		<p>Thus a list (a,b,c,...m, n) is formatted as:
3788			start(a,middle(b,middle(c,middle(...end(m, n))...)))</p>
3789
3790
3791		<p>The following type attributes are in use:</p>
3792		<table border="1" cellpadding="2" cellspacing="0" class='simple'>
3793		  <tr>
3794		    <th>type attribute value</th>
3795		    <th>Description</th>
3796		    <th>Examples</th>
3797	      </tr>
3798		  <tr>
3799		    <td nowrap>standard (or no <strong>type</strong>)</td>
3800		    <td>A typical 'and' list for arbitrary placeholders</td>
3801		    <td nowrap><em>January, February, and March</em></td>
3802	      </tr>
3803			  <tr>
3804		    <td>standard-short</td>
3805		    <td>A short version of a 'and' list, suitable for use with short or abbreviated placeholder values</td>
3806		    <td><em>Jan., Feb., and Mar.</em></td>
3807	      </tr>
3808	  <tr>
3809		    <td>or</td>
3810		    <td>A typical 'or' list for arbitrary placeholders</td>
3811		    <td><em>January, February, or March</em></td>
3812	      </tr>
3813	  <tr>
3814	    <td>or-short</td>
3815	    <td>A short version of an 'or' list</td>
3816	    <td><em>Jan., Feb., or Mar.</em></td>
3817	    </tr>
3818	  <tr>
3819	    <td>unit</td>
3820	    <td>A list suitable for wide units</td>
3821	    <td><em>3 feet, 7 inches</em></td>
3822	    </tr>
3823	  <tr>
3824	    <td>unit-short</td>
3825	    <td>A list suitable for short units</td>
3826	    <td><em>3 ft, 7 in</em></td>
3827	    </tr>
3828	  <tr>
3829	    <td>unit-narrow</td>
3830	    <td>A list suitable for narrow units, where space on the screen is very limited.</td>
3831	    <td><em>3′ 7″</em></td>
3832	    </tr>
3833      </table>
3834		<p>In many languages there may not be a difference among many of these lists. In others, the spacing, the length or presence or a conjunction, and the separators may change.</p>
3835
3836		<h3>
3837			11.1 <a name="List_Gender" href="#List_Gender">Gender of Lists</a>
3838		</h3>
3839
3840
3841		<p class="dtd">
3842			&lt;!-- Gender List support --&gt;<br> &lt;!ELEMENT gender (
3843			personList+ ) &gt;<br> &lt;!ELEMENT personList EMPTY &gt;<br>
3844			&lt;!ATTLIST personList type ( neutral | mixedNeutral | maleTaints )
3845			#REQUIRED &gt;<br> &lt;!ATTLIST personList locales NMTOKENS
3846			#REQUIRED &gt;<br>
3847		</p>
3848
3849		<p>This can be used to determine the gender of a list of 2 or more
3850			persons, such as "Tom and Mary", for use with gender-selection
3851			messages. For example,</p>
3852
3853		<pre class="example">
3854  &lt;supplementalData&gt;
3855    &lt;gender&gt;
3856      &lt;!-- neutral: gender(list) = other --&gt;
3857      &lt;personList type="neutral" locales="af da en..."/&gt;
3858
3859      &lt;!-- mixedNeutral: gender(all male) = male, gender(all female) = female, otherwise gender(list) = other --&gt;
3860      &lt;personList type="mixedNeutral" locales="el"/&gt;
3861
3862      &lt;!-- maleTaints: gender(all female) = female, otherwise gender(list) = male --&gt;
3863      &lt;personList type="maleTaints" locales="ar ca..."/&gt;
3864    &lt;/gender&gt;
3865  &lt;/supplementalData&gt;</pre>
3866
3867		<p>There are three ways the gender of a list can be formatted:</p>
3868
3869		<ol>
3870			<li><b>neutral:</b> A gender-independent "other" form will be
3871				used for the list.</li>
3872
3873			<li><b>mixedNeutral:</b> If the elements of the list are all
3874				male, "male" form is used for the list. If all the elements of the
3875				lists are female, "female" form is used. If the list has a mix of
3876				male, female and neutral names, the "other" form is used.</li>
3877
3878			<li><b>maleTaints:</b> If all the elements of the lists are
3879				female, "female" form is used, otherwise the "male" form is used.</li>
3880		</ol>
3881
3882
3883		<h2>
3884			12 <a name="Context_Transform_Elements"
3885				href="#Context_Transform_Elements">ContextTransform Elements</a>
3886		</h2>
3887
3888
3889		<p class="dtd">
3890			&lt;!ELEMENT contextTransforms ( alias | (contextTransformUsage*,
3891			special*)) &gt;<br> &lt;!ELEMENT contextTransformUsage ( alias |
3892			(contextTransform*, special*)) &gt;<br> &lt;!ATTLIST
3893			contextTransformUsage type CDATA #REQUIRED &gt;<br> &lt;!ELEMENT
3894			contextTransform ( #PCDATA ) &gt;<br> &lt;!ATTLIST
3895			contextTransform type ( uiListOrMenu | stand-alone ) #REQUIRED &gt;
3896		</p>
3897
3898		<p>CLDR locale elements provide data for display names or symbols
3899			in many categories. The default capitalization for these elements is
3900			intended to be the form used in the middle of running text. In many
3901			languages, other capitalization may be required in other contexts,
3902			depending on the type of name or symbol.</p>
3903
3904		<p>
3905			Each &lt;contextTransformUsage&gt; element’s type attribute specifies
3906			a category of data from the table below; the element includes one or
3907			more &lt;contextTransform&gt; elements that specify how to perform
3908			capitalization of this category of data in different contexts. The
3909			&lt;contextTransform&gt; elements are needed primarily for cases in
3910			which the capitalization is other than the default form used in the
3911			middle of running text. However, it is also useful to mark cases in
3912			which it is <em>known</em> that no transformation from this default
3913			form is needed; this may be necessary, for example, to override the
3914			transformation specified by a parent locale. The following values are
3915			currently defined for the &lt;contextTransform&gt; element:
3916		</p>
3917
3918		<ul>
3919			<li>"titlecase-firstword" designates the case in which raw CLDR
3920				text that is in middle-of-sentence form, typically lowercase, needs
3921				to have its first word titlecased.</li>
3922			<li>"no-change" designates the case in which it is known that no
3923				change from the raw CLDR text (middle-of-sentence form) is needed.</li>
3924		</ul>
3925
3926		<p>Four contexts for capitalization behavior are currently
3927			identified. Two need no data, and hence have no corresponding
3928			&lt;contextTransform&gt; elements:</p>
3929
3930		<ul>
3931			<li>In the middle of running text: This is the default form, so
3932				no additional data is required.</li>
3933			<li>At the beginning of a complete sentence: The initial word is
3934				titlecased, no additional data is required to indicate this.</li>
3935		</ul>
3936
3937		<p>Two other contexts require &lt;contextTransform&gt; elements if
3938			their capitalization behavior is other than the default for running
3939			text. The context is identified by the type attribute, as follows:</p>
3940
3941		<ul>
3942			<li>uiListOrMenu: Capitalization appropriate to a user-interface
3943				list or menu.</li>
3944			<li>stand-alone: Capitalization appropriate to an isolated
3945				user-interface element (e.g. an isolated name on a calendar page)</li>
3946		</ul>
3947
3948		<p>Example:</p>
3949
3950		<pre>    &lt;contextTransforms&gt;
3951        &lt;contextTransformUsage type="languages"&gt;
3952             &lt;contextTransform type="uiListOrMenu"&gt;titlecase-firstword&lt;/contextTransform&gt;
3953             &lt;contextTransform type="stand-alone"&gt;titlecase-firstword&lt;/contextTransform&gt;
3954        &lt;/contextTransformUsage&gt;
3955        &lt;contextTransformUsage type="month-format-except-narrow"&gt;
3956             &lt;contextTransform type="uiListOrMenu"&gt;titlecase-firstword&lt;/contextTransform&gt;
3957        &lt;/contextTransformUsage&gt;
3958        &lt;contextTransformUsage type="month-standalone-except-narrow"&gt;
3959             &lt;contextTransform type="uiListOrMenu"&gt;titlecase-firstword&lt;/contextTransform&gt;
3960        &lt;/contextTransformUsage&gt;
3961    &lt;/contextTransforms&gt;</pre>
3962
3963		<table cellspacing="0" cellpadding="2" border="1" class='simple'>
3964			<caption>
3965				<a name="contextTransformUsage_type_attribute_values"
3966					href="#contextTransformUsage_type_attribute_values">Element
3967					contextTransformUsage type attribute values</a>
3968			</caption>
3969			<tr>
3970				<th>type attribute value</th>
3971				<th>Description</th>
3972			</tr>
3973			<tr>
3974				<td>all</td>
3975				<td>Special value, indicates that the specified transformation
3976					applies to all of the categories below</td>
3977			</tr>
3978			<tr>
3979				<td>language</td>
3980				<td>localeDisplayNames language names</td>
3981			</tr>
3982			<tr>
3983				<td>script</td>
3984				<td>localeDisplayNames script names</td>
3985			</tr>
3986			<tr>
3987				<td>territory</td>
3988				<td>localeDisplayNames territory names</td>
3989			</tr>
3990			<tr>
3991				<td>variant</td>
3992				<td>localeDisplayNames variant names</td>
3993			</tr>
3994			<tr>
3995				<td>key</td>
3996				<td>localeDisplayNames key names</td>
3997			</tr>
3998			<tr>
3999				<td>keyValue</td>
4000				<td>localeDisplayNames key value type names</td>
4001			</tr>
4002			<tr>
4003				<td>month-format-except-narrow</td>
4004				<td>dates/calendars/calendar[type=*]/months format wide and
4005					abbreviated month names</td>
4006			</tr>
4007			<tr>
4008				<td>month-standalone-except-narrow</td>
4009				<td>dates/calendars/calendar[type=*]/months stand-alone wide
4010					and abbreviated month names</td>
4011			</tr>
4012			<tr>
4013				<td>month-narrow</td>
4014				<td>dates/calendars/calendar[type=*]/months format and
4015					stand-alone narrow month names</td>
4016			</tr>
4017			<tr>
4018				<td>day-format-except-narrow</td>
4019				<td>dates/calendars/calendar[type=*]/days format wide and
4020					abbreviated day names</td>
4021			</tr>
4022			<tr>
4023				<td>day-standalone-except-narrow</td>
4024				<td>dates/calendars/calendar[type=*]/days stand-alone wide and
4025					abbreviated day names</td>
4026			</tr>
4027			<tr>
4028				<td>day-narrow</td>
4029				<td>dates/calendars/calendar[type=*]/days format and
4030					stand-alone narrow day names</td>
4031			</tr>
4032			<tr>
4033				<td>era-name</td>
4034				<td>dates/calendars/calendar[type=*]/eras (wide) era names</td>
4035			</tr>
4036			<tr>
4037				<td>era-abbr</td>
4038				<td>dates/calendars/calendar[type=*]/eras abbreviated era names</td>
4039			</tr>
4040			<tr>
4041				<td>era-narrow</td>
4042				<td>dates/calendars/calendar[type=*]/eras narrow era names</td>
4043			</tr>
4044			<tr>
4045				<td>quarter-format-wide</td>
4046				<td>dates/calendars/calendar[type=*]/quarters format wide
4047					quarter names</td>
4048			</tr>
4049			<tr>
4050				<td>quarter-standalone-wide</td>
4051				<td>dates/calendars/calendar[type=*]/quarters stand-alone wide
4052					quarter names</td>
4053			</tr>
4054			<tr>
4055				<td>quarter-abbreviated</td>
4056				<td>dates/calendars/calendar[type=*]/quarters format and
4057					stand-alone abbreviated quarter names</td>
4058			</tr>
4059			<tr>
4060				<td>quarter-narrow</td>
4061				<td>dates/calendars/calendar[type=*]/quarters format and
4062					stand-alone narrow quarter names</td>
4063			</tr>
4064			<tr>
4065				<td>calendar-field</td>
4066				<td>dates/fields/field[type=*]/displayName field names<br>(for
4067					relative forms see type "tense" below)
4068				</td>
4069			</tr>
4070			<tr>
4071				<td>zone-exemplarCity</td>
4072				<td>dates/timeZoneNames/zone[type=*]/exemplarCity city names</td>
4073			</tr>
4074			<tr>
4075				<td>zone-long</td>
4076				<td>dates/timeZoneNames/zone[type=*]/long zone names</td>
4077			</tr>
4078			<tr>
4079				<td>zone-short</td>
4080				<td>dates/timeZoneNames/zone[type=*]/short zone names</td>
4081			</tr>
4082			<tr>
4083				<td>metazone-long</td>
4084				<td>dates/timeZoneNames/metazone[type=*]/long metazone names</td>
4085			</tr>
4086			<tr>
4087				<td>metazone-short</td>
4088				<td>dates/timeZoneNames/metazone[type=*]/short metazone names</td>
4089			</tr>
4090			<tr>
4091				<td>symbol</td>
4092				<td>numbers/currencies/currency[type=*]/symbol symbol names</td>
4093			</tr>
4094			<tr>
4095				<td>currencyName</td>
4096				<td>numbers/currencies/currency[type=*]/displayName currency
4097					names</td>
4098			</tr>
4099			<tr>
4100				<td>currencyName-count</td>
4101				<td>numbers/currencies/currency[type=*]/displayName[count=*]
4102					currency names for use with count</td>
4103			</tr>
4104			<tr>
4105				<td>relative</td>
4106				<td>dates/fields/field[type=*]/relative and
4107					dates/fields/field[type=*]/relativeTime relative field names</td>
4108			</tr>
4109			<tr>
4110				<td>unit-pattern</td>
4111				<td>units/unitLength[type=*]/unit[type=*]/unitPattern[count=*]
4112					unit names</td>
4113			</tr>
4114			<tr>
4115				<td>number-spellout</td>
4116				<td>rbnf/rulesetGrouping[type=*]/ruleset[type=*]/rbnfrule
4117					number spellout rules</td>
4118			</tr>
4119		</table>
4120
4121		<h2>
4122			13 <a name="Choice_Patterns" href="#Choice_Patterns">Choice
4123				Patterns</a>
4124		</h2>
4125
4126
4127		<p>A choice pattern is a string that chooses among a number of
4128			strings, based on numeric value. It has the following form:</p>
4129
4130		<p>
4131			&lt;choice_pattern&gt; = &lt;choice&gt; ( '|' &lt;choice&gt; )*<br>
4132			&lt;choice&gt; = &lt;number&gt;&lt;relation&gt;&lt;string&gt;<br>
4133			&lt;number&gt; = ('+' | '-')? (<font size="3">'∞' | [0-9]+
4134				('.' [0-9]+)?)<br> &lt;relation&gt; = '&lt;' | '
4135			</font><span style="color: blue">≤'</span>
4136		</p>
4137
4138		<p>The interpretation of a choice pattern is that given a number
4139			N, the pattern is scanned from right to left, for each choice
4140			evaluating &lt;number&gt; &lt;relation&gt; N. The first choice that
4141			matches results in the corresponding string. If no match is found,
4142			then the first string is used. For example:</p>
4143
4144		<table border="1" cellpadding="0" cellspacing="0">
4145			<tr>
4146				<td width="33%">Pattern</td>
4147				<td width="33%">N</td>
4148				<td width="34%">Result</td>
4149			</tr>
4150			<tr>
4151				<td width="33%" rowspan="4">0≤Rf|1≤Ru|1&lt;Re</td>
4152				<td width="33%">-<font size="3">∞, </font>-3, -1, -0.000001
4153				</td>
4154				<td width="34%">Rf (defaulted to first string)</td>
4155			</tr>
4156			<tr>
4157				<td width="33%">0, 0.01, 0.9999</td>
4158				<td width="34%">Rf</td>
4159			</tr>
4160			<tr>
4161				<td width="33%">1</td>
4162				<td width="34%">Ru</td>
4163			</tr>
4164			<tr>
4165				<td width="33%">1.00001, 5, 99, <font size="3">∞</font></td>
4166				<td width="34%">Re</td>
4167			</tr>
4168		</table>
4169		<p>Quoting is done using ' characters, as in date or number
4170			formats.</p>
4171		<h2>
4172			14 <a name="Annotations" href="#Annotations">Annotations and Labels</a>
4173		</h2>
4174		<p>Annotations provide information about characters, typically
4175			used in input. For example, on a mobile keyboard they can be used to
4176			do completion. They are typically used for symbols, especially emoji
4177			characters.  </p>
4178		<p>For more information, see version 5.0 or <a href="http://unicode.org/reports/tr51/">UTR #51, Unicode Emoji</a>. (Note that during the period between the publication of CLDR v31 and that of Emoji 5.0, the “Latest Proposed Update” link should be used to get to the draft specification for Emoji 5.0.)<br>
4179		</p>
4180
4181		<p class="dtd">&lt;!ELEMENT annotations ( annotation* ) &gt;</p>
4182		<p class="dtd">&lt;!ELEMENT annotation ( #PCDATA ) &gt;</p>
4183		<p class="dtd">&lt;!ATTLIST annotation cp CDATA #REQUIRED &gt;</p>
4184		<p class="dtd">&lt;!ATTLIST annotation type (tts) #IMPLIED &gt;</p>
4185
4186		<p>There are two kinds of annotations: <strong>short names</strong>, and <strong>keywords</strong>.</p>
4187      <p>With an attribute <strong>type="tts"</strong>, the value is  a <strong>short name</strong>, such as one that can be used for text-to-speech. It should be treated as one of the element values for other
4188          purposes.</p>
4189        <p>When there is no<strong> type </strong>attribute, the value is a set of <strong>keywords</strong>, delimited by |. Spaces around each element are to be trimmed. The <strong>keywords</strong> are  words associated with the character(s) that might be used in searching for the character, or in predictive typing on keyboards. The short name itself can be used as a keyword.</p>
4190        <p>Here is an example from German:</p>
4191
4192		<pre class="example">
4193&lt;annotation cp="��"&gt;schlecht | Hand | Daumen | nach unten&lt;/annotation&gt;
4194&lt;annotation cp="��" type="tts"&gt;Daumen runter&lt;/annotation&gt;
4195</pre>
4196
4197		<p>The cp attribute value has two formats: either a single string, or if contained within […] a UnicodeSet. The latter format can contain
4198			multiple code points or strings. A code point pr string can occur in multiple annotation
4199			element <strong>cp</strong> values, such as the following, which also contains the
4200			&quot;thumbs down&quot; character.</p>
4201		<pre class="example"><span >&lt;annotation cp='[☝✊-✍��-����-����������������������]'&gt;hand&lt;/annotation&gt;</span></pre>
4202		<p>Both for short names and keywords, values do not have to match between different languages. They should be the most common values that people using <em>that</em> language
4203			would associated with those characters. For example, a &quot;black heart&quot; might
4204			have the association of &quot;wicked&quot; in English, but not in some other languages.</p>
4205		<p>The cp value may contain sequences, but does not contain any Emoji or Text
4206  		Variant (VS15 &amp; VS16) characters. All such characters should be removed before looking up any short names and keywords.</p>
4207		<h3>
4208			14.1 <a name="SynthesizingNames" href="#SynthesizingNames">Synthesizing Sequence Names</a>
4209		</h3>
4210		<p>Many emoji are represented by sequences of characters. When there are no annotation
4211			elements for that string, the short name can be synthesized as follows.
4212			<strong>Note:</strong> The process details may change after the release of this
4213			specification, and may further change in the future if other sequences are added.
4214			Please see the <a href='https://sites.google.com/site/cldr/index/downloads/cldr-30#TOC-Known-Issues'>Known
4215			Issues</a> section of the CLDR download page for any updates.</p>
4216		<ol>
4217		  <li>If  <strong>sequence</strong> is an <strong>emoji flag sequence</strong>, look up the territory name in CLDR for the
4218		  		corresponding ASCII characters and return as the short name. For example, the regional
4219		  		indicator symbols P+F would map to “Französisch-Polynesien” in German.</li>
4220		  <li>If <strong>sequence</strong> is an <strong>emoji tag sequence</strong>, look up the subdivision name in CLDR for the
4221		  		corresponding ASCII characters and return as the short name. For example, the TAG characters gbsct would map to “Schottland” in German.</li>
4222		  <li>If  <strong>sequence</strong> is a keycap sequence or ��, use the characterLabel for &quot;keycap&quot;
4223		  		as the <strong>prefixName</strong> and  set the <strong>suffix</strong> to be the sequence (or &quot;10&quot; in the case of ��), then go to step 8.</li>
4224		  <li>Let<strong> suffix</strong> and <strong>prefixName</strong> be &quot;&quot;.</li>
4225		  <li>If  <strong>sequence</strong> contains any emoji modifiers, move them (in order) into <strong>suffix</strong>, removing them from  <strong>sequence</strong>.		  </li>
4226		  <li>If  <strong>sequence</strong> is a &quot;KISS&quot;, &quot;HEART&quot; or &quot;FAMILY&quot; emoji
4227		  		ZWJ sequence, move the characters in  <strong>sequence</strong> to the front of <strong>suffix</strong>, and set the <strong>sequence</strong> to be  &quot;��&quot;, &quot;��&quot;, or &quot;��&quot;
4228		  		respectively, and go to step 7.
4229		        <ol>
4230		      <li>A KISS sequence contains ZWJ, &quot;��&quot;,  and &quot;❤&quot;, which are skipped in moving to <strong>suffix</strong>.</li>
4231		      <li>A HEART sequence contains ZWJ and &quot;❤&quot;, which are skipped in moving to <strong>suffix</strong>.</li>
4232		      <li>A FAMILY sequence contains only characters from the set {��, ��, ��, ��, ��, ��, ��}.
4233		      		Nothing is skipped in  moving to <strong>suffix</strong>, except ZWJ.</li>
4234	        </ol>
4235		  </li>
4236		  <li>If   <strong>sequence</strong> ends with ♂ or ♀, and does not have a name, remove the ♂ or ♀  and move the name for &quot;��&quot; or
4237	      &quot;��&quot; respectively to the start of<strong> prefixName</strong>.</li>
4238		  <li>Transform   <strong>sequence</strong> and append to <strong>prefixName</strong>, by successively getting  names for the longest subsequences, skipping any singleton ZWJ characters. If there is more than one name,  use the listPattern for unit-short, type=2 to link them.</li>
4239		  <li>Transform <strong>suffix</strong> into <strong>suffixName</strong> in the same manner.</li>
4240		  <li>If both the <strong>prefixName</strong> and <strong>suffixName</strong> are non-empty, form the name by joining them with the  &quot;category-list&quot; characterLabelPattern and return it. Otherwise return whichever of them is non-empty.</li>
4241	    </ol>
4242		<p>The synthesized keywords can follow a similar process.</p>
4243		<ol>
4244		  <li>For an <strong>emoji flag sequence</strong> or <strong>emoji tag sequence</strong> representing a subdivision, use &quot;flag&quot;.</li>
4245		  <li>For keycap sequences, use &quot;keycap&quot;.</li>
4246		  <li>For other sequences, add the keywords for the subsequences used to get the short names for <strong>prefixName</strong>, and the short names used for <strong>suffixName</strong>.</li>
4247	    </ol>
4248		<p>Some examples for   English data (v30) are given in the following table.</p>
4249	  <table cellspacing="0" cellpadding="2" border="1">
4250        <caption>Synthesized Emoji Sequence Names</caption>
4251		  <tbody>
4252		    <tr>
4253		      <th>Sequence</th>
4254		      <th>Short Name</th>
4255		      <th>Keywords</th>
4256	        </tr>
4257		    <tr>
4258		      <td>����</td>
4259		      <td>European Union</td>
4260		      <td>flag</td>
4261	        </tr>
4262		    <tr>
4263		      <td>#️⃣</td>
4264		      <td>keycap: #</td>
4265		      <td>keycap</td>
4266	        </tr>
4267		    <tr>
4268		      <td>9️⃣</td>
4269		      <td>keycap: 9</td>
4270		      <td>keycap</td>
4271	        </tr>
4272		    <tr>
4273		      <td>��</td>
4274		      <td>kiss</td>
4275		      <td>couple</td>
4276	        </tr>
4277		    <tr>
4278		      <td>��‍❤️‍��‍��</td>
4279		      <td>kiss: woman, woman</td>
4280		      <td>couple, woman</td>
4281	        </tr>
4282		    <tr>
4283		      <td>��</td>
4284		      <td>couple with heart</td>
4285		      <td>love, couple</td>
4286	        </tr>
4287		    <tr>
4288		      <td>��‍❤️‍��</td>
4289		      <td>couple with heart: woman, woman</td>
4290		      <td>love, couple, woman</td>
4291	        </tr>
4292		    <tr>
4293		      <td>��</td>
4294		      <td>family</td>
4295		      <td>family</td>
4296	        </tr>
4297		    <tr>
4298		      <td>��‍��‍��</td>
4299		      <td>family: woman, woman, girl</td>
4300		      <td>woman, family, girl</td>
4301	        </tr>
4302		    <tr>
4303		      <td>����</td>
4304		      <td>boy: light skin tone</td>
4305		      <td>young, light skin tone, boy</td>
4306	        </tr>
4307		    <tr>
4308		      <td>����</td>
4309		      <td>woman: dark skin tone</td>
4310		      <td>woman, dark skin tone</td>
4311	        </tr>
4312		    <tr>
4313		      <td>��‍⚖</td>
4314		      <td>man judge</td>
4315		      <td>scales, justice, man</td>
4316	        </tr>
4317		    <tr>
4318		      <td>����‍⚖</td>
4319		      <td>man judge: dark skin tone</td>
4320		      <td>scales, justice, dark skin tone, man</td>
4321	        </tr>
4322		    <tr>
4323		      <td>��‍⚖</td>
4324		      <td>woman judge</td>
4325		      <td>woman, scales, judge</td>
4326	        </tr>
4327		    <tr>
4328		      <td>����‍⚖</td>
4329		      <td>woman judge: medium-light skin tone</td>
4330		      <td>woman, scales, medium-light skin tone, judge</td>
4331	        </tr>
4332		    <tr>
4333		      <td>��</td>
4334		      <td>police officer</td>
4335		      <td>police, cop, officer</td>
4336	        </tr>
4337		    <tr>
4338		      <td>����</td>
4339		      <td>police officer: dark skin tone</td>
4340		      <td>police, cop, officer, dark skin tone</td>
4341	        </tr>
4342		    <tr>
4343		      <td>��‍♂️</td>
4344		      <td>man police officer</td>
4345		      <td>police, cop, officer, man</td>
4346	        </tr>
4347		    <tr>
4348		      <td>����‍♂️</td>
4349		      <td>man police officer: medium-light skin tone</td>
4350		      <td>police, cop, officer, medium-light skin tone, man</td>
4351	        </tr>
4352		    <tr>
4353		      <td>��‍♀️</td>
4354		      <td>woman police officer</td>
4355		      <td>police, woman, cop, officer</td>
4356	        </tr>
4357		    <tr>
4358		      <td>����‍♀️</td>
4359		      <td>woman police officer: dark skin tone</td>
4360		      <td>police, woman, cop, officer, dark skin tone</td>
4361	        </tr>
4362		    <tr>
4363		      <td>��</td>
4364		      <td>person biking</td>
4365		      <td>cyclist, bicycle, biking</td>
4366	        </tr>
4367		    <tr>
4368		      <td>����</td>
4369		      <td>person biking: dark skin tone</td>
4370		      <td>cyclist, bicycle, biking, dark skin tone</td>
4371	        </tr>
4372		    <tr>
4373		      <td>��‍♂️</td>
4374		      <td>man biking</td>
4375		      <td>cyclist, bicycle, biking, man</td>
4376	        </tr>
4377		    <tr>
4378		      <td>����‍♂️</td>
4379		      <td>man biking: dark skin tone</td>
4380		      <td>cyclist, bicycle, biking, dark skin tone, man</td>
4381	        </tr>
4382		    <tr>
4383		      <td>��‍♀️</td>
4384		      <td>woman biking</td>
4385		      <td>cyclist, woman, bicycle, biking</td>
4386	        </tr>
4387		    <tr>
4388		      <td>����‍♀️</td>
4389		      <td>woman biking: dark skin tone</td>
4390		      <td>cyclist, woman, bicycle, biking, dark skin tone</td>
4391	        </tr>
4392	      </tbody>
4393	  </table>
4394
4395
4396	  <p>
4397			For more information, see <a href='http://unicode.org/reports/tr51'>Unicode
4398				Emoji</a>.
4399		</p>
4400	  		<h3>
4401			14.2 <a name="Character_Labels" href="#Character_Labels">Annotations Character Labels</a>
4402		</h3>
4403	  		<p class="dtd">&lt;!ELEMENT characterLabels ( alias | ( characterLabelPattern*, characterLabel*, special* ) ) &gt; </p>
4404	  		<p class="dtd">&lt;!ELEMENT characterLabelPattern ( #PCDATA ) &gt; </p>
4405	  		<p class="dtd">&lt;!ATTLIST characterLabelPattern type NMTOKEN #REQUIRED &gt;</p>
4406	  		<p class="dtd">&lt;!ATTLIST characterLabelPattern count (0 | 1 | zero | one | two | few | many | other) #IMPLIED &gt;     &lt;!-- count only used for certain patterns&quot; --&gt;</p>
4407	  		<p class="dtd">&lt;!ELEMENT characterLabel ( #PCDATA ) &gt; </p>
4408	  		<p class="dtd">&lt;!ATTLIST characterLabel type NMTOKEN #REQUIRED &gt;</p>
4409            <p>The character labels can be used for categories or groups of characters in a character picker or keyboard palette. They have the above structure. Items with special meanings are explained below. Many of the categories are based on terms used in Unicode. Consult the <a href='http://www.unicode.org/glossary/'>Unicode Glossary</a> where the meaning is not clear.</p>
4410<p>The following are special patterns used in composing labels.</p>
4411<table>
4412<caption>characterLabelPattern</caption>
4413<tr>
4414  <th>Type</th>
4415  <th>English</th>
4416  <th>Description of the group specified.</th>
4417</tr>
4418<tr><th>all</th><td>{0} — all</td>
4419<td>Used where the title {0} is just a subset. For example, {0} might be &quot;Latin&quot;, and contain the most common Latin characters. Then &quot;Latin — all&quot; would be all of them.</td></tr>
4420<tr><th>category-list</th><td>{0}: {1}</td>
4421<td>Use for a name, where {0} is the main item like &quot;Family&quot;, and {1} is a list of one or more components or subcategories. The list is formatted using a list pattern.</td></tr>
4422<tr><th>compatibility</th><td>{0} — compatibility</td>
4423<td>For grouping Unicode compatibility characters separately, such as &quot;Arabic — compatibility&quot;.</td></tr>
4424<tr><th>enclosed</th><td>{0} — enclosed</td>
4425<td>For indicating enclosed forms, such as &quot;digits — enclosed&quot;</td></tr>
4426<tr><th>extended</th><td>{0} — extended</td>
4427<td>For indicating a group of &quot;extended&quot; characters (special use, technical, etc.)</td></tr>
4428<tr><th>historic</th><td>{0} — historic</td>
4429  <td>For indicating a group of &quot;historic&quot; characters (no longer in common use).</td></tr>
4430<tr><th>miscellaneous</th><td>{0} — miscellaneous</td>
4431  <td>For indicating a group of &quot;miscellaneous&quot; characters (typically that don't fall into a broader class).</td></tr>
4432<tr><th>other</th><td>{0} — other</td>
4433  <td>Used where the title {0} is just a subset. For example, {0} might be &quot;Latin&quot;, and contain the most common Latin characters. Then &quot;Latin — other&quot; would be the rest of them.</td></tr>
4434<tr><th>scripts</th><td>scripts — {0}</td>
4435<td>For indicating a group of &quot;scripts&quot; characters matching {0}. The value for {0} may be a geographic indicator, like &quot;Africa&quot; (although there are specific combinations listed below), or some other designation, like &quot;other&quot; (from below).</td></tr>
4436<tr>
4437  <th>strokes</th><td>{0} strokes</td>
4438  <td>Used as an index title for CJK characters. It takes a &quot;count&quot; value, which allows the right plural form to be specified for the language.</td></tr>
4439</table>
4440<p>The following are character labels. Where the meaning of the label is fairly clear (like "animal") or is in the Unicode glossary, it is omitted.</p>
4441<table>
4442<caption>characterLabel</caption>
4443<tr><th>activities</th><td>activity</td>
4444<td>Human activities, such as running.</td></tr>
4445<tr><th>african_scripts</th><td>African script</td>
4446<td>Scripts associated with the continent of Africa.</td></tr>
4447<tr><th>american_scripts</th><td>American script</td>
4448<td>Scripts associated with the continents of North and South America.</td></tr>
4449<tr><th>animals_nature</th><td>animal or nature</td>
4450  <td>A broad category uses for </td></tr>
4451<tr><th>arrows</th><td>arrow</td>
4452<td>Arrow symbols</td></tr>
4453<tr><th>body</th><td>body</td>
4454<td>Symbols for body parts, such as an arm.</td></tr>
4455<tr><th>box_drawing</th><td>box drawing</td>
4456<td>Unicode box-drawing characters (geometric shapes)</td></tr>
4457<tr><th>bullets_stars</th><td>bullet or star</td>
4458<td>Unicode bullets (such as • or ‣ or ⁍) or stars (★✩✪✵...)</td></tr>
4459<tr><th>consonantal_jamo</th><td>consonantal jamo</td>
4460  <td>Korean Jamo consonants.</td></tr>
4461<tr><th>currency_symbols</th><td>currency symbol</td>
4462  <td>Symbols such as $, ¥, £</td></tr>
4463<tr><th>dash_connector</th><td>dash or connector</td>
4464  <td>Characters like _ or ⁓</td></tr>
4465<tr><th>dingbats</th><td>dingbat</td>
4466<td>Font dingbat characters, such as ❿ or ♜.</td></tr>
4467<tr><th>downwards_upwards_arrows</th><td>downwards upwards arrow</td>
4468  <td>⇕,...</td></tr>
4469<tr><th>female</th><td>female</td>
4470<td>Indicates that a character is female or feminine in appearance.</td></tr>
4471<tr><th>format</th><td>format</td>
4472<td>A Unicode format character.</td></tr>
4473<tr><th>format_whitespace</th><td>format &amp; whitespace</td>
4474  <td>A Unicode format character or whitespace.</td></tr>
4475<tr><th>full_width_form_variant</th><td>full-width variant</td>
4476  <td>Full width variant, such as a wide A.</td></tr>
4477<tr><th>half_width_form_variant</th><td>half-width variant</td>
4478<td>Narrow width variant, such as a half-width katakana character.</td></tr>
4479<tr><th>han_characters</th><td>Han character</td>
4480  <td>Han (aka CJK: Chinese, Japanese, or Korean) ideograph</td></tr>
4481<tr><th>han_radicals</th><td>Han radical</td>
4482  <td>Radical (component) used in Han characters.</td></tr>
4483<tr><th>hanja</th><td>hanja</td>
4484  <td>Korean name for Han character.</td></tr>
4485<tr><th>hanzi_simplified</th><td>Hanzi (simplified)</td>
4486  <td>Simplified Chinese ideograph</td></tr>
4487<tr><th>hanzi_traditional</th><td>Hanzi (traditional)</td>
4488  <td>Traditional Chinese ideograph</td></tr>
4489<tr><th>historic_scripts</th><td>historic script</td>
4490  <td>Script no longer in common modern usage, such as Runes or Hieroglyphs.</td></tr>
4491<tr><th>ideographic_desc_characters</th><td>ideographic desc. character</td>
4492  <td>Special Unicode characters (see the glossary).</td></tr>
4493<tr><th>kanji</th><td>kanji</td>
4494  <td>Japanese Han ideograph</td></tr>
4495<tr><th>keycap</th><td>keycap</td>
4496  <td>A key on a computer keyboard or phone. For example, the &quot;3&quot; key on a phone or laptop would be &quot;keycap: 3&quot;</td></tr>
4497<tr><th>limited_use</th><td>limited-use</td>
4498<td>Not in common modern use.</td></tr>
4499<tr><th>male</th><td>male</td>
4500  <td>Indicates that a character is male or masculine in appearance.</td></tr>
4501<tr><th>modifier</th><td>modifier</td>
4502<td>A Unicode modifier letter or symbol.</td></tr>
4503<tr><th>nonspacing</th><td>nonspacing</td>
4504  <td>Uses for characters that occupy no width by themselves, such as the ¨ over the a in ä.</td></tr>
4505</table>
4506		  		<h3>
4507			14.3 <a name="Typographic_Names" href="#Typographic_Names">Typographic Names</a>
4508		</h3>
4509
4510		<p class='dtd'>&lt;!ELEMENT typographicNames ( alias | ( axisName*, styleName*, featureName*, special* ) ) &gt;</p>
4511		<p class='dtd'>&lt;!ELEMENT axisName ( #PCDATA ) &gt;<br>
4512		  &lt;!ATTLIST axisName type (ital | opsz | slnt | wdth | wght) #REQUIRED &gt;<br>
4513	  &lt;!ATTLIST axisName alt NMTOKENS #IMPLIED &gt;</p>
4514		<p class='dtd'>&lt;!ELEMENT styleName ( #PCDATA ) &gt;<br>
4515		  &lt;!ATTLIST styleName type (ital | opsz | slnt | wdth | wght) #REQUIRED &gt;<br>
4516		  &lt;!ATTLIST styleName subtype NMTOKEN #REQUIRED &gt;<br>
4517	  &lt;!ATTLIST styleName alt NMTOKENS #IMPLIED &gt;</p>
4518		<p class='dtd'>&lt;!ELEMENT featureName ( #PCDATA ) &gt;<br>
4519		  &lt;!ATTLIST featureName type (afrc | cpsp | dlig | frac | lnum | onum | ordn | pnum | smcp | tnum | zero) #REQUIRED &gt;<br>
4520	  &lt;!ATTLIST featureName alt NMTOKENS #IMPLIED &gt;</p>
4521		<p>The typographic names provide for names of font features for use in a UI. This is useful for apps that show the name of font styles and design axes according to the user’s languages. It would also be useful for system-level libraries.</p>
4522		<p>The identifers (types) use the tags from the OpenType Feature Tag Registry. Given their large number, only the names of frequently-used OpenType feature names are available CLDR. (Many features are not user-visible settings, but instead serve as a data channel for sofware to pass information to the font).
4523		The example below shows an approach for using the CLDR data. Of course, applications are free to implement their own algorithms depending on their specific needs.</p>
4524<p>To find a localized subfamily name such as &ldquo;Extraleicht Schmal&rdquo; for a font called &ldquo;Extralight Condensed&rdquo;, a system or application library might do the following: </p>
4525        <ol>
4526          <li>
4527            <p>Determine the set of languages in which the subfamily name can potentially be returned.This is the union of the languages for which the font contains &lsquo;name&rsquo; table entries with ID 2 or 17, plus the languages for which CLDR supplies typographic names. </p>
4528          </li>
4529          <li>
4530            <p>Use a language matching algorithm such as in ICU to find the best available language given the user preferences. The resulting subfamily name will be localized to this language. </p>
4531          </li>
4532          <li>
4533            <p>If the font&rsquo;s &lsquo;name&rsquo; table contains a typographic subfamily name (ID17) in this language and all font variation axes are set to their defaults, return this name. </p>
4534          </li>
4535          <li>
4536            <p>If the font&rsquo;s &lsquo;name&rsquo; table contains a font subfamilyname (&lsquo;name&rsquo;ID2) in this language and all font variation axes are set to their defaults, return this name. </p>
4537          </li>
4538          <li>
4539            <p>If the font has a style attributes (STAT) table, lookup the design axis tags and their ordering. If the font has no STAT table, assume [Width, Weight, Slant] as axis ordering, and infer the font&rsquo;s style atributes from other available data in the font (eg. the OS/2 table). </p>
4540          </li>
4541          <li>For each design axis, find a localized style name for its value.
4542             <ol>
4543            <li>If the font&rsquo;s style attributes point to a &lsquo;name&rsquo; table entry that is available the result language, use this name.</li>
4544            <li>Otherwise, generate a fallback name from CLDR style Name data.
4545               <ol>
4546                <li>The type key is the OpenType axis tag ( &lsquo;wght&rsquo;). The subtype and alt keys are taken from the entry in English CLDR where the string is equal to the English name in the font. For example, when the font uses a weight whose English style name is &ldquo;Extralight&rdquo;, this will lead to subtype = &ldquo;200&rdquo; and alt = &ldquo;variant&rdquo;. If there is no match, take the axis value (&ldquo;200&rdquo;) for subtype and the empty string for alt. </li>
4547              <li>Look up (type, subtype) in a data table derived from CLDR&rsquo;s style names. If CLDR supplies multiple alternate names for this (type, subtype), use the one whose &ldquo;alt&rdquo; key is matching; otherwise, use the default alternate (which has no &ldquo;alt&rdquo; atribute in CLDR).</li>
4548            </ol>
4549          </li>
4550        </ol>
4551        </li>
4552        <li>Concatenate the strings, with a separator between them.</li>
4553        </ol>
4554
4555	  <hr>
4556		<p class="copyright">
4557			Copyright © 2001–2018 Unicode, Inc. All
4558			Rights Reserved. The Unicode Consortium makes no expressed or implied
4559			warranty of any kind, and assumes no liability for errors or
4560			omissions. No liability is assumed for incidental and consequential
4561			damages in connection with or arising out of the use of the
4562			information or programs contained or accompanying this technical
4563			report. The Unicode <a href="http://unicode.org/copyright.html">Terms
4564				of Use</a> apply.
4565		</p>
4566		<p class="copyright">Unicode and the Unicode logo are trademarks
4567			of Unicode, Inc., and are registered in some jurisdictions.</p>
4568	</div>
4569
4570</body>
4571
4572</html>
4573