• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2layout: default
3title: Formatting Messages
4nav_order: 3
5parent: Formatting
6has_children: true
7---
8<!--
9© 2020 and later: Unicode, Inc. and others.
10License & terms of use: http://www.unicode.org/copyright.html
11-->
12
13# Formatting Messages
14{: .no_toc }
15
16## Contents
17{: .no_toc .text-delta }
18
191. TOC
20{:toc}
21
22---
23
24## Overview
25
26Messages are user-visible strings, often with variable elements like names,
27numbers and dates. Message strings are typically translated into the different
28languages of a UI, and translators move around the variable elements according
29to the grammar of the target language.
30
31For this to work in many languages, a message has to be written and translated
32as a single unit, typically a string with placeholder syntax for the variable
33elements. If the user-visible string were concatenated directly from fragments
34and formatted elements, then translators would not be able to rearrange the
35pieces, and they would have a hard time translating each of the string
36fragments.
37
38## `MessageFormat`
39
40The ICU **`MessageFormat`** class uses message `"pattern"` strings with
41variable-element placeholders (called "arguments" in the API docs) enclosed in
42{curly braces}. The argument syntax can include formatting details, otherwise a
43default format is used. For details about the pattern syntax and the formatting
44behavior see the `MessageFormat` API docs
45([Java](https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/MessageFormat.html),
46[C++](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classMessageFormat.html#_details),
47[C](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/umsg_8h.html#_details)).
48
49### Complex Argument Types
50
51Certain types of arguments select among several choices which are nested
52`MessageFormat` pattern strings. Keeping these choices together in one message
53pattern string facilitates translation in context, by one single translator.
54(Commercial translation systems often distribute different messages to different
55translators.)
56
57*   Use a `"plural"` argument to select sub-messages based on a numeric value,
58    together with the plural rules for the specified language.
59*   Use a `"select"` argument to select sub-messages via a fixed set of keywords.
60*   Use of the old `"choice"` argument type is discouraged. It cannot handle
61    plural rules for many languages, and is clumsy for simple selection.
62
63It is tempting to cover only a minimal part of a message string with a complex
64argument (e.g., plural). However, this is difficult for translators for two
65reasons: 1. They might have trouble understanding how the sentence fragments in
66the argument sub-messages interact with the rest of the sentence, and 2. They
67will not know whether and how they can shrink or grow the extent of the part of
68the sentence that is inside the argument to make the whole message work for
69their language.
70
71**Recommendation:** If possible, use complex arguments as the outermost
72structure of a message, and write **full sentences** in their sub-messages. If
73you have nested select and plural arguments, place the **select** arguments
74(with their fixed sets of choices) on the **outside** and nest the plural
75arguments (hopefully at most one) inside.
76
77For example:
78
79{% raw  %}
80
81```text
82"{gender_of_host, select, "
83  "female {"
84    "{num_guests, plural, offset:1 "
85      "=0 {{host} does not give a party.}"
86      "=1 {{host} invites {guest} to her party.}"
87      "=2 {{host} invites {guest} and one other person to her party.}"
88      "other {{host} invites {guest} and # other people to her party.}}}"
89  "male {"
90    "{num_guests, plural, offset:1 "
91      "=0 {{host} does not give a party.}"
92      "=1 {{host} invites {guest} to his party.}"
93      "=2 {{host} invites {guest} and one other person to his party.}"
94      "other {{host} invites {guest} and # other people to his party.}}}"
95  "other {"
96    "{num_guests, plural, offset:1 "
97      "=0 {{host} does not give a party.}"
98      "=1 {{host} invites {guest} to their party.}"
99      "=2 {{host} invites {guest} and one other person to their party.}"
100      "other {{host} invites {guest} and # other people to their party.}}}}"
101```
102
103{% endraw %}
104
105**Note:** In a plural argument like in the example above, if the English message
106has both `=0` and `=1` (up to `=offset`+1) then it does not need a "`one`"
107variant because that would never be selected. It does always need an "`other`"
108variant.
109
110**Note:** *The translation system and the translator together need to add
111["`one`", "`few`" etc. if and as necessary per target
112language](http://cldr.unicode.org/index/cldr-spec/plural-rules).*
113
114### Quoting/Escaping
115
116If syntax characters occur in the text portions, then they need to be quoted by
117enclosing the syntax in pairs of ASCII apostrophes. A pair of ASCII apostrophes
118always represents one ASCII apostrophe, similar to `%%` in `printf` representing one `%`,
119although this rule still applies inside quoted text. ("`This '{isn''t}' obvious`" → "`This {isn't} obvious`")
120
121*   Before ICU 4.8, ASCII apostrophes always started quoted text and had
122    inconsistent behavior in nested sub-messages, which was a source of problems
123    with authoring and translating message pattern strings.
124*   Starting with ICU 4.8, an ASCII apostrophe only starts quoted text if it
125    immediately precedes a character that requires quoting (that is, "only where
126    needed"), and works the same in nested messages as on the top level of the
127    pattern. The new behavior is otherwise compatible; for details see the
128    MessageFormat and MessagePattern (new in ICU 4.8) API docs.
129*   Recommendation: Use the real apostrophe (single quote) character `’` (U+2019)
130    for human-readable text, and use the ASCII apostrophe `'` (U+0027) only in
131    program syntax, like quoting in MessageFormat. See the annotations for
132    U+0027 Apostrophe in The Unicode Standard.
133
134### Argument formatting
135
136Arguments are formatted according to their type, using the default ICU
137formatters for those types, unless otherwise specified. For unknown types the
138Java `MessageFormat` will call `toString()`.
139
140There are also several ways to control the formatting.
141
142#### Predefined styles (recommended)
143
144You can specify the `argStyle` to be one of the predefined values `short`, `medium`,
145`long`, `full` (to get one of the standard forms for dates / times) and `integer`,
146`currency`, `percent` (for number formatting).
147
148#### Skeletons (recommended)
149
150Numbers, dates, and times can use a skeleton in `argStyle`, prefixed with `::` to
151distinguish them from patterns. These are locale-independent ways to specify the
152format, and this is the recommended mechanism if the predefined styles are not
153appropriate.
154
155##### Date skeletons:
156
157- **ICU4J:**
158<https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/SimpleDateFormat.html>
159
160- **ICU4C:** <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classSimpleDateFormat.html>
161
162##### Number formatter skeletons:
163
164- **ICU4J:**
165<https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/number/NumberFormatter.html>
166
167- **ICU4C:** <https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classicu_1_1NumberFormat.html>
168
169#### Format the parameters separately (recommended)
170
171You can format the parameter as you need **before** calling `MessageFormat`, and
172then passing the resulting string as a parameter to `MessageFormat`.
173
174This offers maximum control, and is preferred to using custom format objects
175(see below).
176
177#### String patterns (discouraged)
178
179These can be used for numbers, dates, and times, but they are locale-sensitive,
180and they therefore would need to be localized by your translators, which adds
181complexity to the localization, and placeholder details are often not accessible
182by translators. If such a pattern is not localized, then users see confusing
183formatting. Consider using skeletons instead of patterns in your message
184strings.
185
186Allowing translators to localize date patterns is error-prone, as translators
187might make mistakes (resulting in invalid ICU date formatter syntax). Also, CLDR
188provides curated patterns for many locales, and using your own pattern means
189that you don't benefit from that CLDR data and the results will likely be
190inconsistent with the rest of the patterns that ICU uses.
191
192It is also a bad internationalization practice, because most companies only
193translate into "generic" versions of the languages (French, or Spanish, or
194Arabic). So the translated patterns get used in tens of countries. On the other
195hand, skeletons are localized according to the MessageFormat locale, which
196should include regional variants (e.g., “fr-CA”).
197
198#### Custom Format Objects (discouraged)
199
200The `MessageFormat` class allows setting custom Format objects to format
201arguments, overriding the arguments' pattern specification. This is discouraged:
202For custom formatting of some values it should normally suffice to format them
203externally and to provide the formatted strings to the `MessageFormat.format()`
204methods.
205
206Only the top-level arguments are accessible and settable via `setFormat()`,
207`getFormat()` etc. Arguments inside nested sub-messages, inside
208choice/plural/select arguments, are "invisible" via these API methods.
209
210Some of these methods (the ones corresponding to the original JDK `MessageFormat`
211API) address the top-level arguments in their order of appearance in the pattern
212string, which is usually not useful because it varies with translations. Newer
213methods address arguments by argument number ("index") or name.
214
215### Examples
216
217The following code fragment created this output: "At 4:34 PM on March 23, there
218was a disturbance in the Force on planet 7."
219
220```cpp
221    UErrorCode err = U_ZERO_ERROR;
222    Formattable arguments[] = {
223       (int32_t)7,
224       Formattable(Calendar.getNow(), Formattable::kIsDate),
225       "a disturbance in the Force"
226    };
227
228    UnicodeString result;
229    result = MessageFormat::format(
230       "At {1,time,::jmm} on {1,date,::dMMMM}, there was {2} on planet{0,number,integer}.",
231       arguments,
232       3,
233       result,
234       err);
235```
236
237There are several more usage examples for the `MessageFormat` and `ChoiceFormat`
238classes in [C , C++ and Java](examples.md).
239