• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2layout: default
3title: Formatting
4nav_order: 7
5has_children: true
6---
7<!--
8© 2020 and later: Unicode, Inc. and others.
9License & terms of use: http://www.unicode.org/copyright.html
10-->
11
12# Formatting and Parsing
13{: .no_toc }
14
15## Contents
16{: .no_toc .text-delta }
17
181. TOC
19{:toc}
20
21---
22
23## Overview
24
25Formatters translate between binary data and human-readable textual
26representations of these values. For example, you cannot display the computer
27representation of the number 103. You can only display the numeral 103 as a
28textual representation (using three text characters). The result from a
29formatter is a string that contains text that the user will recognize as
30representing the internal value. A formatter can also parse a string by
31converting a textual representation of some value back into its internal
32representation. For example, it reads the characters 1, 0 and 3 followed by
33something other than a digit, and produces the value 103 as an internal binary
34representation.
35
36These classes encapsulate information about the display of localized times,
37days, numbers, currencies, and messages. Formatting classes do both formatting
38and parsing and allow the separation of the data that the end-user sees from the
39code. Separating the program code from the data allows a program to be more
40easily localized. Formatting is converting a date, time, number, message or
41other object from its internal representation into a string. Parsing is the
42reverse operation. It is the process of converting a string to an internal
43representation of the date, time, number, message or other object.
44
45Using the formatting classes is an important step in internationalizing your
46software because the `format()` and `parse()` methods in each of the classes make
47your software language neutral, by replacing implicit conversions with explicit
48formatting calls.
49
50## Internationalization Formatting Tips
51
52This section discusses some of the ways you can format and parse numbers,
53currencies, dates, times and text messages in your program so that the data is
54separate from the code and can be easily localized. This is the information your
55users see on their computer screens, so it needs to be in a language and format
56that conforms to their local conventions.
57
58Some things you need to keep in mind while you are creating your code are the
59following:
60
61*   Keep your code and your data separate
62
63*   Format the data in a locale-sensitive manner
64
65*   Keep your code locale-independent
66
67*   Avoid writing special routines to handle specific locales
68
69*   String objects formatted by `format()` are parseable by the `parse()` method\*
70
71> :point_right: **Note**: Although parsing is supported in several legacy ICU APIs,
72it is generally considered bad practice to parse localized strings.
73For more information, read [Why You Should Not Parse
74Localized Strings](https://blog.sffc.xyz/post/190943794505/why-you-should-not-parse-localized-strings).
75
76### Numbers and Currencies
77
78Programs store and operate on numbers using a locale-independent binary
79representation. When displaying or printing a number it is converted to a
80locale-specific string. For example, the number 12345.67 is "12,345.67" in the
81US, "12 345,67" in France and "12.345,67" in Germany.
82
83By invoking the methods provided by the `NumberFormat` class, you can format
84numbers, currencies, and percentages according to the specified or default
85locale. `NumberFormat` is locale-sensitive so you need to create a new
86`NumberFormat` for each locale. `NumberFormat` methods format primitive-type
87numbers, such as double and output the number as a locale-specific string.
88
89For currencies you call `getCurrencyInstance` to create a formatter that returns a
90string with the formatted number and the appropriate currency sign. Of course,
91the `NumberFormat` class is unaware of exchange rates so, the number output is the
92same regardless of the specified currency. This means that the same number has
93different monetary values depending on the currency locale. If the number is
949988776.65 the results will be:
95
96*   9 988 776,65 € in France
97
98*   9.988.776,65 € in Germany
99
100*   $9,988,776.65 in the United States
101
102In order to format percentages, create a locale-specific formatter and call the
103`getPercentInstance` method. With this formatter, a decimal fraction such as 0.75
104is displayed as 75%.
105
106#### Customizing Number Formats
107
108If you need to customize a number format you can use the `DecimalFormat` and
109the `DecimalFormatSymbols` classes in the [Formatting
110Numbers](numbers/index#formatting-numbers) chapter. This not usually necessary and
111it makes your code much more complex, but it is available for those rare
112instances where you need it. In general, you would do this by explicitly
113specifying the number format pattern.
114
115If you need to format or parse spelled-out numbers, you can use the
116`RuleBasedNumberFormat` class (see the [Formatting Numbers](numbers/index#formatting-numbers) chapter).
117You can instantiate a default formatter for a locale, or by using the
118`RuleBasedNumberFormat` rule syntax, specify your own.
119
120Using `NumberFormat` class methods (see the [Formatting Numbers](numbers/index#formatting-numbers) chapter)
121with a predefined locale is the easiest and the most accurate way to format numbers, and currencies.
122
123> :point_right: **Note**: *See [Properties and ICU Rule Syntax](../strings/properties) for
124information regarding syntax characters.*
125
126### Date and Times
127
128You display or print a Date by first converting it to a locale-specific string
129that conforms to the conventions of the end user's Locale. For example, Germans
130recognize 20.4.98 as a valid date, and Americans recognize 4/20/98.
131
132> :point_right: **Note**: *The appropriate Calendar support is required for different locales. For
133example, the Buddhist calendar is the official calendar in Thailand so the
134typical assumption of Gregorian Calendar usage should not be used. ICU will pick
135the appropriate Calendar based on the locale you supply when opening a `Calendar`
136or `DateFormat`.*
137
138### Messages
139
140Message format helps make the order of display elements localizable. It helps
141address problems of grammatical differences in languages. For example, consider
142the sentence, "I go to work by car everyday." In Japanese, the grammar
143equivalent can be "Everyday, I to work by car go." Another example will be the
144plurals in text, for example, "no space for rent, one room for rent and many
145rooms for rent," where "for rent" is the only constant text among the three.
146
147## Formatting and Parsing Classes
148
149ICU provides four major areas and twelve classes for formatting numbers, dates
150and messages:
151
152### General Formatting
153
154*   `Format`:
155
156    The abstract superclass of all format classes. It provides the basic methods
157    for formatting and parsing numbers, dates, strings and other objects.
158
159*   `FieldPosition`:
160
161    A concrete class for holding the field constant and the begin and end
162    indices for number and date fields.
163
164*   `ParsePosition`:
165
166    A concrete class for holding the parse position in a string during parsing.
167
168*   `Formattable`:
169
170    `Formattable` objects can be passed to the `Format` class or its subclasses for
171    formatting. It encapsulates a polymorphic piece of data to be formatted and
172    is used with `MessageFormat`. `Formattable` is used by some formatting
173    operations to provide a single "type" that encompasses all formattable
174    values (e.g., it can hold a number, a date, or a string, and so on).
175
176*   `UParseError`:
177
178    `UParseError` is used to returned detailed information about parsing errors.
179    It is used by the ICU parsing engines that parse long rules, patterns, or
180    programs. This is helpful when the text being parsed is long enough that
181    more information than a `UErrorCode` is needed to localize the error.
182
183**Formatting Numbers**
184
185*   [`NumberFormat`](numbers/legacy-numberformat#numberformat)
186
187    The abstract superclass that provides the basic fields and methods for
188    formatting `Number` objects and number primitives to localized strings and
189    parsing localized strings to `Number` objects.
190
191*   [`DecimalFormat`](numbers/legacy-numberformat#decimalformat)
192
193    A concrete class for formatting `Number` objects and number primitives to
194    localized strings and parsing localized strings to `Number` objects, in base 10.
195
196*   [`RuleBasedNumberFormat`](numbers/rbnf)
197
198    A concrete class for formatting `Number` objects and number primitives to
199    localized text, especially spelled-out format such as found in check writing
200    (e.g. "two hundred and thirty-four"), and parsing text into `Number` objects.
201
202*   [`DecimalFormatSymbols`](numbers/legacy-numberformat#decimalformatsymbols)
203
204    A concrete class for accessing localized number strings, such as the
205    grouping separators, decimal separator, and percent sign. Used by
206    `DecimalFormat`.
207
208**Formatting Dates and Times**
209
210*   [`DateFormat`](datetime/index#dateformat)
211
212    The abstract superclass that provides the basic fields and methods for
213    formatting `Date` objects to localized strings and parsing date and time
214    strings to `Date` objects.
215
216*   [`SimpleDateFormat`](datetime/index#simpledateformat)
217
218    A concrete class for formatting `Date` objects to localized strings and
219    parsing date and time strings to `Date` objects, using a `GregorianCalendar`.
220
221*   [`DateFormatSymbols`](datetime/index#dateformatsymbols)
222
223    A concrete class for accessing localized date-time formatting strings, such
224    as names of the months, days of the week and the time zone.
225
226**Formatting Messages**
227
228*   [`MessageFormat`](messages/index#messageformat)
229
230    A concrete class for producing a language-specific user message that
231    contains numbers, currency, percentages, date, time and string variables.
232
233*   [`ChoiceFormat`](messages/examples#choiceformat-class)
234
235    A concrete class for mapping strings to ranges of numbers and for handling
236    plurals and names series in user messages.
237