1--- 2layout: default 3title: Formatting Messages 4nav_order: 3 5parent: Formatting 6has_children: true 7--- 8<!-- 9© 2020 and later: Unicode, Inc. and others. 10License & terms of use: http://www.unicode.org/copyright.html 11--> 12 13# Formatting Messages 14{: .no_toc } 15 16## Contents 17{: .no_toc .text-delta } 18 191. TOC 20{:toc} 21 22--- 23 24## Overview 25 26Messages are user-visible strings, often with variable elements like names, 27numbers and dates. Message strings are typically translated into the different 28languages of a UI, and translators move around the variable elements according 29to the grammar of the target language. 30 31For this to work in many languages, a message has to be written and translated 32as a single unit, typically a string with placeholder syntax for the variable 33elements. If the user-visible string were concatenated directly from fragments 34and formatted elements, then translators would not be able to rearrange the 35pieces, and they would have a hard time translating each of the string 36fragments. 37 38## `MessageFormat` 39 40The ICU **`MessageFormat`** class uses message `"pattern"` strings with 41variable-element placeholders (called "arguments" in the API docs) enclosed in 42{curly braces}. The argument syntax can include formatting details, otherwise a 43default format is used. For details about the pattern syntax and the formatting 44behavior see the `MessageFormat` API docs 45([Java](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/MessageFormat.html), 46[C++](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classMessageFormat.html#_details), 47[C](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/umsg_8h.html#_details)). 48 49### Complex Argument Types 50 51Certain types of arguments select among several choices which are nested 52`MessageFormat` pattern strings. Keeping these choices together in one message 53pattern string facilitates translation in context, by one single translator. 54(Commercial translation systems often distribute different messages to different 55translators.) 56 57* Use a `"plural"` argument to select sub-messages based on a numeric value, 58 together with the plural rules for the specified language. 59* Use a `"select"` argument to select sub-messages via a fixed set of keywords. 60* Use of the old `"choice"` argument type is discouraged. It cannot handle 61 plural rules for many languages, and is clumsy for simple selection. 62 63It is tempting to cover only a minimal part of a message string with a complex 64argument (e.g., plural). However, this is difficult for translators for two 65reasons: 1. They might have trouble understanding how the sentence fragments in 66the argument sub-messages interact with the rest of the sentence, and 2. They 67will not know whether and how they can shrink or grow the extent of the part of 68the sentence that is inside the argument to make the whole message work for 69their language. 70 71**Recommendation:** If possible, use complex arguments as the outermost 72structure of a message, and write **full sentences** in their sub-messages. If 73you have nested select and plural arguments, place the **select** arguments 74(with their fixed sets of choices) on the **outside** and nest the plural 75arguments (hopefully at most one) inside. 76 77For example: 78 79{% raw %} 80 81```text 82"{gender_of_host, select, " 83 "female {" 84 "{num_guests, plural, offset:1 " 85 "=0 {{host} does not give a party.}" 86 "=1 {{host} invites {guest} to her party.}" 87 "=2 {{host} invites {guest} and one other person to her party.}" 88 "other {{host} invites {guest} and # other people to her party.}}}" 89 "male {" 90 "{num_guests, plural, offset:1 " 91 "=0 {{host} does not give a party.}" 92 "=1 {{host} invites {guest} to his party.}" 93 "=2 {{host} invites {guest} and one other person to his party.}" 94 "other {{host} invites {guest} and # other people to his party.}}}" 95 "other {" 96 "{num_guests, plural, offset:1 " 97 "=0 {{host} does not give a party.}" 98 "=1 {{host} invites {guest} to their party.}" 99 "=2 {{host} invites {guest} and one other person to their party.}" 100 "other {{host} invites {guest} and # other people to their party.}}}}" 101``` 102 103{% endraw %} 104 105**Note:** In a plural argument like in the example above, if the English message 106has both `=0` and `=1` (up to `=offset`+1) then it does not need a "`one`" 107variant because that would never be selected. It does always need an "`other`" 108variant. 109 110**Note:** *The translation system and the translator together need to add 111["`one`", "`few`" etc. if and as necessary per target 112language](http://cldr.unicode.org/index/cldr-spec/plural-rules).* 113 114### Quoting/Escaping 115 116If syntax characters occur in the text portions, then they need to be quoted by 117enclosing the syntax in pairs of ASCII apostrophes. A pair of ASCII apostrophes 118always represents one ASCII apostrophe, similar to `%%` in `printf` representing one `%`, 119although this rule still applies inside quoted text. ("`This '{isn''t}' obvious`" → "`This {isn't} obvious`") 120 121* Before ICU 4.8, ASCII apostrophes always started quoted text and had 122 inconsistent behavior in nested sub-messages, which was a source of problems 123 with authoring and translating message pattern strings. 124* Starting with ICU 4.8, an ASCII apostrophe only starts quoted text if it 125 immediately precedes a character that requires quoting (that is, "only where 126 needed"), and works the same in nested messages as on the top level of the 127 pattern. The new behavior is otherwise compatible; for details see the 128 MessageFormat and MessagePattern (new in ICU 4.8) API docs. 129* Recommendation: Use the real apostrophe (single quote) character `’` (U+2019) 130 for human-readable text, and use the ASCII apostrophe `'` (U+0027) only in 131 program syntax, like quoting in MessageFormat. See the annotations for 132 U+0027 Apostrophe in The Unicode Standard. 133 134### Argument formatting 135 136Arguments are formatted according to their type, using the default ICU 137formatters for those types, unless otherwise specified. For unknown types the 138Java `MessageFormat` will call `toString()`. 139 140There are also several ways to control the formatting. 141 142#### Predefined styles (recommended) 143 144You can specify the `argStyle` to be one of the predefined values `short`, `medium`, 145`long`, `full` (to get one of the standard forms for dates / times) and `integer`, 146`currency`, `percent` (for number formatting). 147 148#### Skeletons (recommended) 149 150Numbers, dates, and times can use a skeleton in `argStyle`, prefixed with `::` to 151distinguish them from patterns. These are locale-independent ways to specify the 152format, and this is the recommended mechanism if the predefined styles are not 153appropriate. 154 155##### Date skeletons: 156 157- **ICU4J:** 158<https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/SimpleDateFormat.html> 159 160- **ICU4C:** <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classSimpleDateFormat.html> 161 162##### Number formatter skeletons: 163 164- **ICU4J:** 165<https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/number/NumberFormatter.html> 166 167- **ICU4C:** <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1NumberFormat.html> 168 169#### Format the parameters separately (recommended) 170 171You can format the parameter as you need **before** calling `MessageFormat`, and 172then passing the resulting string as a parameter to `MessageFormat`. 173 174This offers maximum control, and is preferred to using custom format objects 175(see below). 176 177#### String patterns (discouraged) 178 179These can be used for numbers, dates, and times, but they are locale-sensitive, 180and they therefore would need to be localized by your translators, which adds 181complexity to the localization, and placeholder details are often not accessible 182by translators. If such a pattern is not localized, then users see confusing 183formatting. Consider using skeletons instead of patterns in your message 184strings. 185 186Allowing translators to localize date patterns is error-prone, as translators 187might make mistakes (resulting in invalid ICU date formatter syntax). Also, CLDR 188provides curated patterns for many locales, and using your own pattern means 189that you don't benefit from that CLDR data and the results will likely be 190inconsistent with the rest of the patterns that ICU uses. 191 192It is also a bad internationalization practice, because most companies only 193translate into "generic" versions of the languages (French, or Spanish, or 194Arabic). So the translated patterns get used in tens of countries. On the other 195hand, skeletons are localized according to the MessageFormat locale, which 196should include regional variants (e.g., “fr-CA”). 197 198#### Custom Format Objects (discouraged) 199 200The `MessageFormat` class allows setting custom Format objects to format 201arguments, overriding the arguments' pattern specification. This is discouraged: 202For custom formatting of some values it should normally suffice to format them 203externally and to provide the formatted strings to the `MessageFormat.format()` 204methods. 205 206Only the top-level arguments are accessible and settable via `setFormat()`, 207`getFormat()` etc. Arguments inside nested sub-messages, inside 208choice/plural/select arguments, are "invisible" via these API methods. 209 210Some of these methods (the ones corresponding to the original JDK `MessageFormat` 211API) address the top-level arguments in their order of appearance in the pattern 212string, which is usually not useful because it varies with translations. Newer 213methods address arguments by argument number ("index") or name. 214 215### Examples 216 217The following code fragment created this output: "At 4:34 PM on March 23, there 218was a disturbance in the Force on planet 7." 219 220```cpp 221 UErrorCode err = U_ZERO_ERROR; 222 Formattable arguments[] = { 223 (int32_t)7, 224 Formattable(Calendar.getNow(), Formattable::kIsDate), 225 "a disturbance in the Force" 226 }; 227 228 UnicodeString result; 229 result = MessageFormat::format( 230 "At {1,time,::jmm} on {1,date,::dMMMM}, there was {2} on planet{0,number,integer}.", 231 arguments, 232 3, 233 result, 234 err); 235``` 236 237There are several more usage examples for the `MessageFormat` and `ChoiceFormat` 238classes in [C , C++ and Java](examples.md). 239