• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2title: XMB
3---
4
5# XMB
6
7### Introduction
8
9Adds tools to CLDR to convert to and from the [XMB message format](http://unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/xmb.dtd). The XMB format is basically a key-value pair list, with no deeper structure. It does have a mechanism for named placeholders, with descriptions and examples. The messages for any given other language must correspond 1:1 with those of English.
10
11The goal is to allow for bulk translation of CLDR files via existing translation tooling.
12
13Examples:
14
15**ENGLISH**
16
17\<msg id='615EB568A2478EAF' desc='The name of the country or region with BCP47 code = UZ. Before translating, please read [cldr.org/translation](http://cldr.org/translation).'
18
19\>**Uzbekistan**\</msg>
20
21\<!-- English: MMMM d, y -->
22
23\<msg id='5D6EA98708B9B43B' desc='Long date format. Before translating, please read [cldr.org/translation](http://cldr.org/translation).'
24
25\>**\<ph name='MONTH\_LONG'>\<ex>September\</ex>MMMM\</ph> \<ph name='DAY\_1\_DIGIT'>\<ex>9\</ex>d\</ph>, \<ph name='YEAR'>\<ex>2010\</ex>y\</ph>**\</msg>
26
27**FRENCH**
28
29\<!-- English: Uzbekistan -->
30
31\<msg id='615EB568A2478EAF'
32
33\>**Ouzbékistan**\</msg>
34
35\<!-- English: MMMM d, y -->
36
37\<msg id='5D6EA98708B9B43B'
38
39\>**\<ph name='DAY\_1\_DIGIT'>\<ex>9\</ex>d\</ph> \<ph name='MONTH\_LONG'>\<ex>September\</ex>MMMM\</ph> \<ph name='YEAR'>\<ex>2010\</ex>y\</ph>**\</msg>
40
41The id is common across the different languages. The description, the placeholder names and the placeholder examples (\<ex>) are visible to the translator, as is the text between placeholders, of course. The translator can change the order of the placeholders, but they cannot be removed (or added).
42
43The main tool for converting CLDR to this format is at [GenerateXMB.java](http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/tool/GenerateXMB.java). It reads the en.xml file, and puts together a EnglishInfo object that has a mapping from paths to descriptions and placeholders. It also generates the English XMB file for translation. Next, each of the other CLDR locale files are read and their data is used to populate a separate XTB file for translation memory.
44
45Files:
46
47|   |   |
48|---|---|
49| [xmb-en.xml](http://unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/xmb-en.xml) | The base English file, for translation into other languages |
50| [xtb-fr.xml](http://unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/xtb-fr.xml) | Sample file (fr) for translation memory. Missing entries would be translated. |
51
52Others are at [xmb/](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/).
53
54The documentation files are at http://cldr.org/translation.
55
56### Log Files
57
58The tool generates log files during processing, targeted at development and debugging.
59
60Examples:
61
62|  |  |  |
63|---|---|---|
64| log/ | [en-missingDescriptions.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/log/en-missingDescriptions.txt) | The paths that don't yet have descriptions in them, which need to be added to  xmbHandling.txt .  |
65| log/ | [en-paths.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/log/en-paths.txt) | The paths used for the base English file. |
66| filtered/ | [xmb-en.xml](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/filtered/xmb-en.xml) | A filtered xmb-en.xml file that contains exactly one item per "starred" path (where a starred path is one with attribute values removed). Useful for reviewing descriptions. |
67| filtered/ | [xtb-fr.xml](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/filtered/xtb-fr.xml) | A filtered sample (fr) xml file. |
68| skipped/ | [xmb-en.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/skipped/xmb-en.txt) | The paths that are skipped out of the base English file. |
69| skipped/ | [xtb-fr.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/skipped/xtb-fr.txt) | The paths that are skipped out of the sample (fr) file. |
70
71### Placeholders
72
73Replaces the placeholders ("{0}", "MMM", etc.) in patterns by variable names with examples. This is data-driven, using the file at [xmbPlaceholders.txt](http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/tool/xmbPlaceholders.txt).
74
75Format:
76
77&emsp;path\_regex ; variable=name example
78
79The name cannot contain spaces.
80
81Example:
82
83&emsp;^//ldml/dates/.\*(pattern|available|intervalFormatItem) ; EEEE=**DAY\_OF\_WEEK\_LONG** Tuesday
84
85### Filtering and descriptions
86
87Data driven, using the file [xmbHandling.txt](http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/tool/xmbHandling.txt).
88
89Format:
90
91&emsp;path\_regex ; description
92
93&emsp;path\_regex ; SKIP
94
95&emsp;path\_regex ; ROOT type\_value; description
96
971. If the value is SKIP, then the path is skipped.
982. The description can have {0}-style variables in it. If so, then the (...) values in the path\_regex are substituted for them.
993. If the value starts with ROOT, then the path is skipped if the type\_value is not in ROOT, where the type\_value is from the first capture group. This is used to make sure that the type\_value is in the major coverage requirements for: language, script, territory, currency, timezone, and metazone. The description can have placeholders, as in case 21.
100
101Example:
102
103^//ldml/dates/timeZoneNames/metazone\\[@type=".\*"]/commonlyUsed ; SKIP
104
105^//ldml/dates/timeZoneNames/zone\\[@type=".\*"]/exemplarCity ; The name of a city in: {0}. See cldr.org/xxxx.
106
107### Plurals
108
109Plurals are represented with ICU Syntax, such as:
110
111<!--
112  {% raw %}
113
114  Disable liquid parsing on this codeblock to prevent errors reading '{{'
115  See: https://talk.jekyllrb.com/t/code-block-is-improperly-handled-and-generates-liquid-syntax-error/7599/2
116-->
117
118```xml
119<msg id='4AC13E2DA211C113' desc='[ICU Syntax] The pattern used to compose plural for week, including abbreviated forms. These forms are special! Before translating, see cldr.org/translation/plurals.'>
120
121{LENGTH, select,
122
123abbreviated {{NUMBER_OF_WEEKS, plural,
124
125=0 {0 wks}
126
127=1 {1 wk}
128
129zero {# wks}
130
131one {# wk}
132
133two {# wks}
134
135few {# wks}
136
137many {# wks}
138
139other {# wks}}}
140
141other {{NUMBER_OF_WEEKS, plural,
142
143=0 {0 weeks}
144
145=1 {1 week}
146
147zero {# weeks}
148
149one {# week}
150
151two {# weeks}
152
153few {# weeks}
154
155many {# weeks}
156
157other {# weeks}}}}</msg>
158```
159
160<!-- {% endraw %} -->
161
162### TODO
163
164- Add missing descriptions
165- Add missing site pages with detailed descriptions, and links from the descriptions
166- Add a limited number of currency plurals (major currencies only).
167- Add a limited number of extra language codes.
168- Rewire items that are in undistinguished attributes
169- Test each xml file for validity
170- Do the conversion from xtb into cldr format to make sure we roundtrip.
171- Figure out how to do the differences between HH and hh, etc.
172    - Current thoughts: don't let the translator choose, but make it part of the xtb-cldr processing.
173
174
175