1--- 2title: XMB 3--- 4 5# XMB 6 7### Introduction 8 9Adds tools to CLDR to convert to and from the [XMB message format](http://unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/xmb.dtd). The XMB format is basically a key-value pair list, with no deeper structure. It does have a mechanism for named placeholders, with descriptions and examples. The messages for any given other language must correspond 1:1 with those of English. 10 11The goal is to allow for bulk translation of CLDR files via existing translation tooling. 12 13Examples: 14 15**ENGLISH** 16 17\<msg id='615EB568A2478EAF' desc='The name of the country or region with BCP47 code = UZ. Before translating, please read [cldr.org/translation](http://cldr.org/translation).' 18 19\>**Uzbekistan**\</msg> 20 21\<!-- English: MMMM d, y --> 22 23\<msg id='5D6EA98708B9B43B' desc='Long date format. Before translating, please read [cldr.org/translation](http://cldr.org/translation).' 24 25\>**\<ph name='MONTH\_LONG'>\<ex>September\</ex>MMMM\</ph> \<ph name='DAY\_1\_DIGIT'>\<ex>9\</ex>d\</ph>, \<ph name='YEAR'>\<ex>2010\</ex>y\</ph>**\</msg> 26 27**FRENCH** 28 29\<!-- English: Uzbekistan --> 30 31\<msg id='615EB568A2478EAF' 32 33\>**Ouzbékistan**\</msg> 34 35\<!-- English: MMMM d, y --> 36 37\<msg id='5D6EA98708B9B43B' 38 39\>**\<ph name='DAY\_1\_DIGIT'>\<ex>9\</ex>d\</ph> \<ph name='MONTH\_LONG'>\<ex>September\</ex>MMMM\</ph> \<ph name='YEAR'>\<ex>2010\</ex>y\</ph>**\</msg> 40 41The id is common across the different languages. The description, the placeholder names and the placeholder examples (\<ex>) are visible to the translator, as is the text between placeholders, of course. The translator can change the order of the placeholders, but they cannot be removed (or added). 42 43The main tool for converting CLDR to this format is at [GenerateXMB.java](http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/tool/GenerateXMB.java). It reads the en.xml file, and puts together a EnglishInfo object that has a mapping from paths to descriptions and placeholders. It also generates the English XMB file for translation. Next, each of the other CLDR locale files are read and their data is used to populate a separate XTB file for translation memory. 44 45Files: 46 47| | | 48|---|---| 49| [xmb-en.xml](http://unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/xmb-en.xml) | The base English file, for translation into other languages | 50| [xtb-fr.xml](http://unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/xtb-fr.xml) | Sample file (fr) for translation memory. Missing entries would be translated. | 51 52Others are at [xmb/](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/). 53 54The documentation files are at http://cldr.org/translation. 55 56### Log Files 57 58The tool generates log files during processing, targeted at development and debugging. 59 60Examples: 61 62| | | | 63|---|---|---| 64| log/ | [en-missingDescriptions.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/log/en-missingDescriptions.txt) | The paths that don't yet have descriptions in them, which need to be added to xmbHandling.txt . | 65| log/ | [en-paths.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/log/en-paths.txt) | The paths used for the base English file. | 66| filtered/ | [xmb-en.xml](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/filtered/xmb-en.xml) | A filtered xmb-en.xml file that contains exactly one item per "starred" path (where a starred path is one with attribute values removed). Useful for reviewing descriptions. | 67| filtered/ | [xtb-fr.xml](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/filtered/xtb-fr.xml) | A filtered sample (fr) xml file. | 68| skipped/ | [xmb-en.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/skipped/xmb-en.txt) | The paths that are skipped out of the base English file. | 69| skipped/ | [xtb-fr.txt](http://www.unicode.org/repos/cldr-tmp/trunk/dropbox/xmb/skipped/xtb-fr.txt) | The paths that are skipped out of the sample (fr) file. | 70 71### Placeholders 72 73Replaces the placeholders ("{0}", "MMM", etc.) in patterns by variable names with examples. This is data-driven, using the file at [xmbPlaceholders.txt](http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/tool/xmbPlaceholders.txt). 74 75Format: 76 77 path\_regex ; variable=name example 78 79The name cannot contain spaces. 80 81Example: 82 83 ^//ldml/dates/.\*(pattern|available|intervalFormatItem) ; EEEE=**DAY\_OF\_WEEK\_LONG** Tuesday 84 85### Filtering and descriptions 86 87Data driven, using the file [xmbHandling.txt](http://unicode.org/cldr/trac/browser/trunk/tools/java/org/unicode/cldr/tool/xmbHandling.txt). 88 89Format: 90 91 path\_regex ; description 92 93 path\_regex ; SKIP 94 95 path\_regex ; ROOT type\_value; description 96 971. If the value is SKIP, then the path is skipped. 982. The description can have {0}-style variables in it. If so, then the (...) values in the path\_regex are substituted for them. 993. If the value starts with ROOT, then the path is skipped if the type\_value is not in ROOT, where the type\_value is from the first capture group. This is used to make sure that the type\_value is in the major coverage requirements for: language, script, territory, currency, timezone, and metazone. The description can have placeholders, as in case 21. 100 101Example: 102 103^//ldml/dates/timeZoneNames/metazone\\[@type=".\*"]/commonlyUsed ; SKIP 104 105^//ldml/dates/timeZoneNames/zone\\[@type=".\*"]/exemplarCity ; The name of a city in: {0}. See cldr.org/xxxx. 106 107### Plurals 108 109Plurals are represented with ICU Syntax, such as: 110 111<!-- 112 {% raw %} 113 114 Disable liquid parsing on this codeblock to prevent errors reading '{{' 115 See: https://talk.jekyllrb.com/t/code-block-is-improperly-handled-and-generates-liquid-syntax-error/7599/2 116--> 117 118```xml 119<msg id='4AC13E2DA211C113' desc='[ICU Syntax] The pattern used to compose plural for week, including abbreviated forms. These forms are special! Before translating, see cldr.org/translation/plurals.'> 120 121{LENGTH, select, 122 123abbreviated {{NUMBER_OF_WEEKS, plural, 124 125=0 {0 wks} 126 127=1 {1 wk} 128 129zero {# wks} 130 131one {# wk} 132 133two {# wks} 134 135few {# wks} 136 137many {# wks} 138 139other {# wks}}} 140 141other {{NUMBER_OF_WEEKS, plural, 142 143=0 {0 weeks} 144 145=1 {1 week} 146 147zero {# weeks} 148 149one {# week} 150 151two {# weeks} 152 153few {# weeks} 154 155many {# weeks} 156 157other {# weeks}}}}</msg> 158``` 159 160<!-- {% endraw %} --> 161 162### TODO 163 164- Add missing descriptions 165- Add missing site pages with detailed descriptions, and links from the descriptions 166- Add a limited number of currency plurals (major currencies only). 167- Add a limited number of extra language codes. 168- Rewire items that are in undistinguished attributes 169- Test each xml file for validity 170- Do the conversion from xtb into cldr format to make sure we roundtrip. 171- Figure out how to do the differences between HH and hh, etc. 172 - Current thoughts: don't let the translator choose, but make it part of the xtb-cldr processing. 173 174 175