Technical Reports |
Version | 34 |
Editors | John Emmons (emmo@us.ibm.com) and other CLDR committee members |
For the full header, summary, and status, see Part 1: Core
This document describes parts of an XML format (vocabulary) for the exchange of structured locale data. This format is used in the Unicode Common Locale Data Repository.
This is a partial document, describing only those parts of the LDML that are relevant for number and currency formatting. For the other parts of the LDML see the main LDML document and the links above.
This document has been reviewed by Unicode members and other interested parties, and has been approved for publication by the Unicode Consortium. This is a stable document and may be used as reference material or cited as a normative reference by other specifications.
A Unicode Technical Standard (UTS) is an independent specification. Conformance to the Unicode Standard does not imply conformance to any UTS.
Please submit corrigenda and other comments with the CLDR bug reporting form [Bugs]. Related information that is useful in understanding this document is found in the References. For the latest version of the Unicode Standard see [Unicode]. For a list of current Unicode Technical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions].
The LDML specification is divided into the following parts:
<!ELEMENT numberingSystems ( numberingSystem* ) >
<!ELEMENT numberingSystem EMPTY >
<!ATTLIST
numberingSystem id NMTOKEN #REQUIRED >
<!ATTLIST
numberingSystem type ( numeric | algorithmic ) #REQUIRED >
<!ATTLIST numberingSystem radix NMTOKEN #IMPLIED >
<!ATTLIST numberingSystem digits CDATA #IMPLIED >
<!ATTLIST numberingSystem rules CDATA #IMPLIED >
Numbering systems information is used to define different representations for numeric values to an end user. Numbering systems are defined in CLDR as one of two different types: algorithmic and numeric. Numeric systems are simply a decimal based system that uses a predefined set of digits to represent numbers. Examples are Western ( ASCII digits ), Thai digits, Devanagari digits. Algorithmic systems are more complex in nature, since the proper formatting and presentation of a numeric quantity is based on some algorithm or set of rules. Examples are Chinese numerals, Hebrew numerals, or Roman numerals. In CLDR, the rules for presentation of numbers in an algorithmic system are defined using the RBNF syntax described in Section 6: Rule-Based Number Formatting.
Attributes for the <numberingSystem> element are as follows:
id - Specifies the name of the numbering system that can be used to designate its use in formatting.
type - Specifies whether the numbering system is algorithmic or numeric.
digits - For numeric systems, specifies the digits used to represent numbers, in order, starting from zero.
rules - Specifies the RBNF ruleset to be used for formatting numbers from this numbering system. The rules specifier can contain simply a ruleset name, in which case the ruleset is assumed to be found in the rule set grouping "NumberingSystemRules". Alternatively, the specifier can denote a specific locale, ruleset grouping, and ruleset name, separated by slashes.
Examples:
<numberingSystem id="latn" type="numeric" digits="0123456789"/> <!-- ASCII digits - A numeric system -->
<numberingSystem id="thai" type="numeric" digits="๐๑๒๓๔๕๖๗๘๙"/> <!-- A numeric system using Thai digits -->
<numberingSystem id="geor" type="algorithmic" rules="georgian"/> <!-- An algorithmic system - Georgian numerals , rules found in NumberingSystemRules -->
<numberingSystem id="hant" type="algorithmic" rules="zh_Hant/SpelloutRules/spellout-cardinal"/> <!-- An algorithmic system. Traditional Chinese Numerals -->For general information about the numbering system data, including the BCP47 identifiers, see the main document Section Q.1.1 Numbering System Data. ) >
<!ELEMENT numbers ( alias | ( defaultNumberingSystem*, otherNumberingSystems*, minimumGroupingDigits*, symbols*, decimalFormats*, scientificFormats*, percentFormats*, currencyFormats*, currencies?, miscPatterns*, minimalPairs*, special* ) ) >
The numbers element supplies information for formatting and parsing numbers and currencies. It has the following sub-elements: <defaultNumberingSystem>, <otherNumberingSystems>, <symbols>, <decimalFormats>, <scientificFormats>, <percentFormats>, <currencyFormats>, and <currencies>. The currency IDs are from [ISO4217] (plus some additional common-use codes). For more information, including the pattern structure, see Section 3: Number Format Patterns.
<!ELEMENT defaultNumberingSystem ( #PCDATA )>
This element indicates which numbering system should be used for presentation of numeric quantities in the given locale.
<!ELEMENT otherNumberingSystems ( alias | ( native*, traditional*, finance*)) >
This element defines general categories of numbering systems that are sometimes used in the given locale for formatting numeric quantities. These additional numbering systems are often used in very specific contexts, such as in calendars or for financial purposes. There are currently three defined categories, as follows:
The categories defined for other numbering systems can be used in a Unicode locale identifier to select the proper numbering system without having to know the specific numbering system by name. For example:
For more information on numbering systems and their definitions, see Section 1: Numbering Systems.
<!ELEMENT symbols (alias | (decimal*, group*, list*, percentSign*, nativeZeroDigit*, patternDigit*, plusSign*, minusSign*, exponential*, superscriptingExponent*, perMille*, infinity*, nan*, currencyDecimal*, currencyGroup*, timeSeparator*, special*)) >
Number symbols define the localized symbols that are commonly used when formatting numbers in a given locale. These symbols can be referenced using a number formatting pattern as defined in Section 3: Number Format Patterns.
The available number symbols are as follows:
Note: In CLDR 26 the timeSeparator pattern character was specified to be COLON. This was withdrawn in CLDR 28 due to backward compatibility issues, and no timeSeparator pattern character is currently defined. No CLDR locales are known to have a need to specify timeSeparator symbols that depend on number system; if this changes in the future a different timeSeparator pattern character will be defined.
Example:
<symbols> <decimal>.</decimal> <group>,</group> <list>;</list> <percentSign>%</percentSign> <patternDigit>#</patternDigit> <plusSign>+</plusSign> <minusSign>-</minusSign> <exponential>E</exponential> <superscriptingExponent>×</exponential> <perMille>‰</perMille> <infinity>∞</infinity> <nan>☹</nan> <timeSeparator>:</timeSeparator> </symbols>
<!ATTLIST symbols numberSystem CDATA
#IMPLIED >
The numberSystem attribute is used to specify that the given number
symbols are to be used when the given numbering system is active.
Number symbols can only be defined for numbering systems of the
"numeric" type, since any special symbols required for an algorithmic
numbering system should be specified by the RBNF formatting rules
used for that numbering system. By default, number symbols without a
specific numberSystem attribute are assumed to be used for the
"latn" numbering system, which is western (ASCII) digits.
Locales that specify a numbering system other than "latn"
as the default should also specify number formatting symbols that are
appropriate for use within the context of the given numbering system.
For example, a locale that uses the Arabic-Indic digits as its
default would likely use an Arabic comma for the grouping separator
rather than the ASCII comma.
For more information on
numbering systems and their definitions, see Section 1: Numbering Systems.
<!ELEMENT decimalFormats (alias | (default*, decimalFormatLength*,
special*))>
<!ELEMENT decimalFormatLength (alias |
(default*, decimalFormat*, special*))>
<!ATTLIST
decimalFormatLength type ( full | long | medium | short ) #IMPLIED
>
<!ELEMENT decimalFormat (alias | (pattern*,
special*)) >
(scientificFormats, percentFormats have the same structure)
Number formats are used to define the rules for formatting numeric quantities using the pattern syntax described in Section 3: Number Format Patterns.
Different formats are provided for different contexts, as follows:
Example:
<decimalFormats> <decimalFormatLength type="long"> <decimalFormat> <pattern>#,##0.###</pattern> </decimalFormat> </decimalFormatLength> </decimalFormats>
<scientificFormats> <default type="long"/> <scientificFormatLength type="long"> <scientificFormat> <pattern>0.000###E+00</pattern> </scientificFormat> </scientificFormatLength> <scientificFormatLength type="medium"> <scientificFormat> <pattern>0.00##E+00</pattern> </scientificFormat> </scientificFormatLength> </scientificFormats>
<percentFormats> <percentFormatLength type="long"> <percentFormat> <pattern>#,##0%</pattern> </percentFormat> </percentFormatLength> </percentFormats>
<!ATTLIST symbols numberSystem CDATA
#IMPLIED >
The numberSystem attribute is used to specify that the given number
formatting pattern(s) are to be used when the given numbering system
is active. By default, number formatting patterns without a specific
numberSystem attribute are assumed to be used for the
"latn" numbering system, which is western (ASCII) digits.
Locales that specify a numbering system other than "latn"
as the default should also specify number formatting patterns that
are appropriate for use within the context of the given numbering
system.
For more information on numbering systems and their
definitions, see Section 1:
Numbering Systems.
<decimalFormatLength type="long">
<decimalFormat>
<pattern type="1000" count="one">0 millier</pattern>
<pattern type="1000" count="other">0 milliers</pattern>
<pattern type="10000" count="one">00 mille</pattern>
<pattern type="10000" count="other">00 mille</pattern>
<pattern type="100000" count="one">000 mille</pattern>
<pattern type="100000" count="other">000 mille</pattern>
<pattern type="1000000" count="one">0 million</pattern>
<pattern type="1000000" count="other">0 millions</pattern>
…
</decimalFormat>
</decimalFormatLength>
<decimalFormatLength type="short">
<decimalFormat>
<pattern type="1000" count="one">0 K</pattern>
<pattern type="1000" count="other">0 K</pattern>
<pattern type="10000" count="one">00 K</pattern>
<pattern type="10000" count="other">00 K</pattern>
<pattern type="100000" count="one">000 K</pattern>
<pattern type="100000" count="other">000 K</pattern>
<pattern type="1000000" count="one">0 M</pattern>
<pattern type="1000000" count="other">0 M</pattern>
…
</decimalFormat> … <currencyFormatLength type="short">
<currencyFormat type="standard">
<pattern type="1000" count="one">0 K ¤</pattern>
<pattern type="1000" count="other">0 K ¤</pattern>
<pattern type="10000" count="one">00 K ¤</pattern>
<pattern type="10000" count="other">00 K ¤</pattern>
<pattern type="100000" count="one">000 K ¤</pattern>
<pattern type="100000" count="other">000 K ¤</pattern>
<pattern type="1000000" count="one">0 M ¤</pattern>
<pattern type="1000000" count="other">0 M ¤</pattern>
Formats can be supplied for numbers (as above) or for currencies or other units. They can also be used with ranges of numbers, resulting in formatting strings like “$10K” or “$3–7M”.
To format a number N, the greatest type less than or equal to N is used, with the appropriate plural category. N is divided by the type, after removing the number of zeros in the pattern, less 1. APIs supporting this format should provide control over the number of significant or fraction digits.
The default pattern for any type that is not supplied is the special value “0”, as in the following. The value “0” must be used when a child locale overrides a parent locale to drop the compact pattern for that type and use the default pattern.
<pattern type="1" count="one">0</pattern>
If the value is precisely “0”, either explicit or defaulted, then the normal number format pattern for that sort of object is supplied — either <decimalFormat> or <currencyFormat type="standard"> — with the normal formatting for the locale (such as the grouping separators). However, for the “0” case by default the signficant digits are adjusted for consistency, typically to 2 or 3 digits, and the maximum fractional digits are set to 0 (for both currencies and plain decimal). Thus the output would be $12, not $12.01. APIs may, however, allow these default behaviors to be overridden.
With the data above, N=12345 matches <pattern type="10000"
count="other">00 K</pattern>
. N is divided by 1000 (obtained from10000 after removing
"00" and restoring one "0". The result is
formatted according to the normal decimal pattern. With no fractional
digits, that yields "12 K".
Formatting 1200 in USD would result in “1.2 K $”, while 990 implicitly maps to the special value “0”, which maps to <currencyFormat type="standard"><pattern>#,##0.00 ¤</pattern>, and would result in simply “990 $”.
The short format is designed for UI environments where space is at a premium, and should ideally result in a formatted string no more than about 6 em wide (with no fractional digits).
Pattern for use with currency formatting. This format contains a few additional structural options that allow proper placement of the currency symbol relative to the numeric quantity. Refer to Section 4 - Currencies for additional information on the use of these options.
<!ELEMENT currencyFormats (alias | (default*, currencySpacing*,
currencyFormatLength*, unitPattern*, special*)) >
<!ELEMENT currencySpacing (alias | (beforeCurrency*,
afterCurrency*, special*)) >
<!ELEMENT beforeCurrency
(alias | (currencyMatch*, surroundingMatch*, insertBetween*)) >
<!ELEMENT afterCurrency (alias | (currencyMatch*,
surroundingMatch*, insertBetween*)) >
<!ELEMENT
currencyMatch ( #PCDATA ) >
<!ELEMENT surroundingMatch
( #PCDATA )) >
<!ELEMENT insertBetween ( #PCDATA )
>
<!ELEMENT currencyFormatLength (alias | (default*,
currencyFormat*, special*)) >
<!ATTLIST
currencyFormatLength type ( full | long | medium | short ) #IMPLIED
>
<!ELEMENT currencyFormat (alias | (pattern*,
special*)) >
In addition to a standard currency format, in which negative currency amounts might typically be displayed as something like “-$3.27”, locales may provide an "accounting" form, in which for "en_US" the same example would appear as “($3.27)”.
<currencyFormats> <currencyFormatLength> <currencyFormat type="standard"> <pattern>¤#,##0.00</pattern> </currencyFormat> <currencyFormat type="accounting"> <pattern>¤#,##0.00;(¤#,##0.00)</pattern> </currencyFormat> </currencyFormatLength> </currencyFormats>
<!ELEMENT miscPatterns (alias | (default*, pattern*, special*))
>
<!ATTLIST miscPatterns numberSystem CDATA #IMPLIED
>
The miscPatterns supply additional patterns for special purposes. The currently defined values are:
≤
99” to indicate
that there are 99 items or fewer.For example:
<miscPatterns numberSystem="…">
<pattern type="approximately">~{0}</pattern>
<pattern type="atLeast">≥{0}</pattern>
<pattern type="atMost">≤{0}</pattern>
<pattern type="range">{0}–{1}</pattern>
</miscPatterns>
<!ELEMENT minimalPairs ( alias | ( pluralMinimalPairs*, ordinalMinimalPairs*, special* ) ) >
<!ATTLIST minimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST minimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
<!ELEMENT pluralMinimalPairs ( #PCDATA ) >
<!ATTLIST pluralMinimalPairs count NMTOKEN #IMPLIED >
<!ATTLIST pluralMinimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST pluralMinimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
<!ELEMENT ordinalMinimalPairs ( #PCDATA ) >
<!ATTLIST ordinalMinimalPairs ordinal NMTOKEN #IMPLIED >
<!ATTLIST ordinalMinimalPairs alt NMTOKENS #IMPLIED >
<!ATTLIST ordinalMinimalPairs draft (approved | contributed | provisional | unconfirmed) #IMPLIED >
Minimal pairs provide examples that justify why multiple plural or ordinal categories exist. For more information, see Plural Rules.
Number patterns affect how numbers are interpreted in a localized context. Here are some examples, based on the French locale. The "." shows where the decimal point should go. The "," shows where the thousands separator should go. A "0" indicates zero-padding: if the number is too short, a zero (in the locale's numeric set) will go there. A "#" indicates no padding: if the number is too short, nothing goes there. A "¤" shows where the currency sign will go. The following illustrates the effects of different patterns for the French locale, with the number "1234.567". Notice how the pattern characters ',' and '.' are replaced by the characters appropriate for the locale.
Number Pattern Examples Pattern Currency Text #,##0.## n/a 1 234,57 #,##0.### n/a 1 234,567 ###0.##### n/a 1234,567 ###0.0000# n/a 1234,5670 00000.0000 n/a 01234,5670 #,##0.00 ¤ EUR 1 234,57 € JPY 1 235 ¥JP
The number of # placeholder characters before the decimal do not matter, since no limit is placed on the maximum number of digits. There should, however, be at least one zero someplace in the pattern. In currency formats, the number of digits after the decimal also do not matter, since the information in the supplemental data (see Supplemental Currency Data) is used to override the number of decimal places — and the rounding — according to the currency that is being formatted. That can be seen in the above chart, with the difference between Yen and Euro formatting.
To ensure correct layout, especially in currency patterns in which a a variety of symbols may be used, number patterns may contain (invisible) bidirectional text format characters such as LRM, RLM, and ALM.
When parsing using a pattern, a lenient parse should be used; see Lenient Parsing. As noted there, lenient parsing should ignore bidi format characters.
Many characters in a pattern are taken literally; they are matched during parsing and output unchanged during formatting. Special characters, on the other hand, stand for other characters, strings, or classes of characters. For example, the '#' character is replaced by a localized digit for the chosen numberSystem. Often the replacement character is the same as the pattern character; in the U.S. locale, the ',' grouping character is replaced by ','. However, the replacement is still happening, and if the symbols are modified, the grouping character changes. Some special characters affect the behavior of the formatter by their presence; for example, if the percent character is seen, then the value is multiplied by 100 before being displayed.
To insert a special character in a pattern as a literal, that is, without any special meaning, the character must be quoted. There are some exceptions to this which are noted below. The Localized Replacement column shows the replacement from Section 2.3 Number Symbols or the numberSystem's digits: italic indicates a special function.
Invalid sequences of special characters (such as “¤¤¤¤¤” in current CLDR) should be handled for formatting and parsing as described in Handling Invalid Patterns.
Number Pattern Character Definitions Symbol Location Localized Replacement Meaning 0 Number digit Digit 1-9 Number digit '1' through '9' indicate rounding. @ Number digit Significant digit # Number digit, nothing Digit, omitting leading/trailing zeros . Number decimal, currencyDecimal Decimal separator or monetary decimal separator - Number minusSign Minus sign. Warning: the pattern '-'0.0 is not the same as the pattern -0.0. In the former case, the minus sign is a literal. In the latter case, it is a special symbol, which is replaced by the minusSymbol, and can also be replaced by the plusSymbol for a format like +12% as in Section 3.2.1 Explicit Plus Signs. , Number group, currencyGroup Grouping separator. May occur in both the integer part and the fractional part. The position determines the grouping. E Number exponential, superscriptingExponent Separates mantissa and exponent in scientific notation. Need not be quoted in prefix or suffix. + Exponent or Number (for explicit plus) plusSign Prefix positive exponents with localized plus sign. Used for explicit plus for numbers as well, as described in Section 3.2.1 Explicit Plus Signs.
Need not be quoted in prefix or suffix.% Prefix or suffix percentSign Multiply by 100 and show as percentage ‰
(U+2030)Prefix or suffix perMille Multiply by 1000 and show as per mille (aka “basis points”) ; Subpattern boundary syntax Separates positive and negative subpatterns. When there is no explicit negative subpattern, an implicit negative subpattern is formed from the positive pattern with a prefixed - (ASCII U+002D HYPHEN-MINUS). ¤ (U+00A4) Prefix or suffix currency symbol/name from currency specified in API Any sequence is replaced by the localized currency symbol for the currency being formatted, as in the table below. If present in a pattern, the monetary decimal separator and grouping separators (if available) are used instead of the numeric ones. If data is unavailable for a given sequence in a given locale, the display may fall back to ¤ or ¤¤. See also the formatting forcurrency display names, steps 2 and 4 in Currencies. No. Replacement / Example ¤ Standard currency symbol C$12.00 ¤¤ ISO currency symbol (constant) CAD 12.00 ¤¤¤ Appropriate currency display name for the currency, based on the plural rules in effect for the locale 5.00 Canadian dollars ¤¤¤¤¤ Narrow currency symbol. The same symbols may be used for multiple currencies. Thus the symbol may be ambiguous, and should only be used where the context is clear. $12.00 others Invalid in current CLDR. Reserved for future specification * Prefix or suffix boundary padding character specified in API Pad escape, precedes pad character ' Prefix or suffix syntax-only Used to quote special characters in a prefix or suffix, for example, "'#'#"
formats 123 to"#123"
. To create a single quote itself, use two in a row:"# o''clock"
.
A pattern contains a positive subpattern and may contain a negative subpattern, for example, "#,##0.00;(#,##0.00)". Each subpattern has a prefix, a numeric part, and a suffix. If there is no explicit negative subpattern, the implicit negative subpattern is the ASCII minus sign (-) prefixed to the positive subpattern. That is, "0.00" alone is equivalent to "0.00;-0.00". (The data in CLDR is normalized to remove an explicit negative subpattern where it would be identical to the implicit form.)
Note that if an negative subpattern is used as-is: a minus sign is not added, eg "0.00;0.00" ≠ "0.00;-0.00". Trailing semicolons are ignored, eg "0.00;" = "0.00". Whitespace is not ignored, including those around semicolons, so "0.00; -0.00" ≠ "0.00; -0.00".
If there is an explicit negative subpattern, it serves only to specify the negative prefix and suffix; the number of digits, minimal digits, and other characteristics are ignored in the negative subpattern. That means that "#,##0.0#;(#)" has precisely the same result as "#,##0.0#;(#,##0.0#)". However in the CLDR data, the format is normalized so that the other characteristics are preserved, just for readability.
Note: The thousands separator and decimal separator in patterns are always ASCII ',' and '.'. They are substituted by the code with the correct local values according to other fields in CLDR. The same is true of the - (ASCII minus sign) and other special characters listed above.
Below is a sample of patterns, special characters, and results:
explicit pattern: |
0.00;-0.00 |
0.00;0.00- |
0.00+;0.00- |
|||
---|---|---|---|---|---|---|
decimalSign: |
, |
, |
, |
|||
minusSign: |
∸ |
∸ |
∸ |
|||
plusSign: |
∔ |
∔ |
∔ |
|||
number: |
3.1415 |
-3.1415 |
3.1415 |
-3.1415 |
3.1415 |
-3.1415 |
formatted: |
3,14 |
∸3,14 |
3,14 |
3,14∸ |
3,14∔ |
3,14∸ |
In the above table, ∸ = U+2238 DOT MINUS and ∔ = U+2214 DOT PLUS are used for illustration.
The prefixes, suffixes, and various symbols used for infinity, digits, thousands separators, decimal separators, and so on may be set to arbitrary values, and they will appear properly during formatting. However, care must be taken that the symbols and strings do not conflict, or parsing will be unreliable. For example, either the positive and negative prefixes or the suffixes must be distinct for any parser using this data to be able to distinguish positive from negative values. Another example is that the decimal separator and thousands separator should be distinct characters, or parsing will be impossible.
The grouping separator is a character that separates clusters of integer digits to make large numbers more legible. It is commonly used for thousands, but in some locales it separates ten-thousands. The grouping size is the number of digits between the grouping separators, such as 3 for "100,000,000" or 4 for "1 0000 0000". There are actually two different grouping sizes: One used for the least significant integer digits, the primary grouping size, and one used for all others, the secondary grouping size. In most locales these are the same, but sometimes they are different. For example, if the primary grouping interval is 3, and the secondary is 2, then this corresponds to the pattern "#,##,##0", and the number 123456789 is formatted as "12,34,56,789". If a pattern contains multiple grouping separators, the interval between the last one and the end of the integer defines the primary grouping size, and the interval between the last two defines the secondary grouping size. All others are ignored, so "#,##,###,####" == "###,###,####" == "##,#,###,####".
The grouping separator may also occur in the fractional part, such as in “#,##0.###,#”. This is most commonly done where the grouping separator character is a thin, non-breaking space (U+202F), such as “1.618 033 988 75”. See physics.nist.gov/cuu/Units/checklist.html.
For consistency in the CLDR data, the following conventions are observed:
minimumGroupingDigits |
Pattern
Grouping |
Input
Number |
Formatted |
---|---|---|---|
1 |
3 |
1000 |
1,000 |
1 |
3 |
10000 |
10,000 |
2 |
3 |
1000 |
1000 |
2 |
3 |
10000 |
10,000 |
1 |
4 |
10000 |
1,0000 |
2 |
4 |
10000 |
10000 |
An explicit "plus" format can be formed, so as to show a visible + sign when formatting a non-negative number. The displayed plus sign can be an ASCII plus or another character, such as + U+FF0B FULLWIDTH PLUS SIGN or ➕ U+2795 HEAVY PLUS SIGN; it is taken from whatever is set for plusSign in Section 2.3 Number Symbols.
For an example, see Sample Patterns and Results.
Formatting is guided by several parameters, all of which can be specified either using a pattern or using an external API designed for number formatting. The following description applies to formats that do not use scientific notation or significant digits.
Special Values
NaN
is represented as a single character, typically
(U+FFFD)
. This character is determined by the localized number symbols. This
is the only value for which the prefixes and suffixes are not used.
Infinity is represented as a single character, typically ∞
(U+221E)
, with the positive or negative prefixes and suffixes applied. The
infinity character is determined by the localized number symbols.
Numbers in scientific notation are expressed as the product of a mantissa and a power of ten, for example, 1234 can be expressed as 1.234 x 103. The mantissa is typically in the half-open interval [1.0, 10.0) or sometimes [0.0, 1.0), but it need not be. In a pattern, the exponent character immediately followed by one or more digit characters indicates scientific notation. Example: "0.###E0" formats the number 1234 as "1.234E3".
When using scientific notation, the formatter controls the digit counts using logic for significant digits. The maximum number of significant digits comes from the mantissa portion of the pattern: the string of #, 0, and period (".") characters immediately preceding the E. To get the maximum number of significant digits, use the following algorithm:
Examples:
There are two ways of controlling how many digits are shows: (a) significant digits counts, or (b) integer and fraction digit counts. Integer and fraction digit counts are described above. When a formatter is using significant digits counts, it uses however many integer and fraction digits are required to display the specified number of significant digits. It may ignore min/max integer/fraction digits, or it may use them to the extent possible.
Significant Digits Examples Pattern Minimum significant digits Maximum significant digits Number Output @@@
3 3 12345 12300
@@@
3 3 0.12345 0.123
@@##
2 4 3.14159 3.142
@@##
2 4 1.23004 1.23
'@'
pattern character.
In order to disable significant digits formatting, use a pattern
that does not contain the '@'
pattern
character.
'@'
and '#'
characters. The minimum number of significant digits is the number
of '@'
characters. The maximum number of
significant digits is the number of '@'
characters plus the number of '#'
characters
following on the right. For example, the pattern "@@@"
indicates exactly 3 significant digits. The pattern "@##"
indicates from 1 to 3 significant digits. Trailing zero digits to
the right of the decimal separator are suppressed after the minimum
number of significant digits have been shown. For example, the
pattern "@##"
formats the number 0.1203 as "0.12"
.
'0'
pattern character.
Patterns such as "@00"
or "@.###"
would be disallowed.
'#'
characters may be
prepended to the left of the leftmost '@'
character. These have no effect on the minimum and maximum
significant digits counts, but may be used to position grouping
separators. For example, "#,#@#"
indicates a
minimum of one significant digits, a maximum of two significant
digits, and a grouping size of three.
Minimum Significant
Digits - 1
, and a maximum fraction digit count of Maximum
Significant Digits - 1
. For example, the pattern "@@###E0"
is equivalent to "0.0###E0"
.
Patterns support padding the result to a specific width. In a pattern
the pad escape character, followed by a single pad character, causes
padding to be parsed and formatted. The pad escape character is
'*'. For example,
"$*x#,##0.00"
formats 123 to
"$xx123.00"
, and 1234 to
"$1,234.00"
.
"* #0
o''clock"
, the format width is 10.
Patterns support rounding to a specific increment. For example, 1230 rounded to the nearest 50 is 1250. Mathematically, rounding to specific increments is performed by dividing by the increment, rounding to an integer, then multiplying by the increment. To take a more bizarre example, 1.234 rounded to the nearest 0.65 is 1.3, as follows:
Original: | 1.234 |
---|---|
Divide by increment (0.65): | 1.89846… |
Round: | 2 |
Multiply by increment (0.65): | 1.3 |
To specify a rounding increment in a pattern, include the increment in the pattern itself. "#,#50" specifies a rounding increment of 50. "#,##0.05" specifies a rounding increment of 0.05.
Single quotes, ('), enclose bits of the pattern that should be treated literally. Inside a quoted string, two single quotes ('') are replaced with a single one ('). For example: 'X '#' Q ' -> X 1939 Q (Literal strings underlined.)
<!ELEMENT currencies (alias | (default?, currency*, special*))
>
<!ELEMENT currency (alias | (((pattern+,
displayName*, symbol*) | (displayName+, symbol*, pattern*) |
(symbol+, pattern*))?, decimal*, group*, special*)) >
<!ELEMENT symbol ( #PCDATA ) >
<!ATTLIST symbol
choice ( true | false ) #IMPLIED > <!-- deprecated -->
Note: The term "pattern" appears twice in the above. The first is for consistency with all other cases of pattern + displayName; the second is for backwards compatibility.
<currencies> <currency type="USD"> <displayName>Dollar</displayName> <symbol>$</symbol> </currency> <currency type ="JPY"> <displayName>Yen</displayName> <symbol>¥</symbol> </currency> <currency type="PTE"> <displayName>Escudo</displayName> <symbol>$</symbol> </currency> </currencies>
In formatting currencies, the currency number format is used with the appropriate symbol from <currencies>, according to the currency code. The <currencies> list can contain codes that are no longer in current use, such as PTE. The choice attribute has been deprecated.
The count attribute distinguishes the different plural forms, such as in the following:
<currencyFormats> <unitPattern count="other">{0} {1}</unitPattern> … <currencies>
<currency type="ZWD"> <displayName>Zimbabwe Dollar</displayName> <displayName count="one">Zimbabwe dollar</displayName> <displayName count="other">Zimbabwe dollars</displayName> <symbol>Z$</symbol> </currency>
To format a particular currency value "ZWD" for a particular numeric value n using the (long) display name:
While for English this may seem overly complex, for some other languages different plural forms are used for different unit types; the plural forms for certain unit types may not use all of the plural-form tags defined for the language.
For example, if the the currency is ZWD and the number is 1234, then the latter maps to count="other" for English. The unit pattern for that is "{0} {1}", and the display name is "Zimbabwe dollars". The final formatted number is then "1,234 Zimbabwe dollars".
When the currency symbol is substituted into a pattern, there may be some further modifications, according to the following.
<currencySpacing> <beforeCurrency> <currencyMatch>[:letter:]</currencyMatch> <surroundingMatch>[:digit:]</surroundingMatch> <insertBetween> </insertBetween> </beforeCurrency> <afterCurrency> <currencyMatch>[:letter:]</currencyMatch> <surroundingMatch>[:digit:]</surroundingMatch> <insertBetween> </insertBetween> </afterCurrency> </currencySpacing>
This element controls whether additional characters are inserted on the boundary between the symbol and the pattern. For example, with the above currencySpacing, inserting the symbol "US$" into the pattern "#,##0.00¤" would result in an extra no-break space inserted before the symbol, for example, "#,##0.00 US$". The beforeCurrency element governs this case, since we are looking before the "¤" symbol. The currencyMatch is positive, since the "U" in "US$" is at the start of the currency symbol being substituted. The surroundingMatch is positive, since the character just before the "¤" will be a digit. Because these two conditions are true, the insertion is made.
Conversely, look at the pattern "¤#,##0.00" with the symbol "US$". In this case, there is no insertion; the result is simply "US$#,##0.00". The afterCurrency element governs this case, since we are looking after the "¤" symbol. The surroundingMatch is positive, since the character just after the "¤" will be a digit. However, the currencyMatch is not positive, since the "$" in "US$" is at the end of the currency symbol being substituted. So the insertion is not made.
For more information on the matching used in the currencyMatch and surroundingMatch elements, see the main document Appendix E: Unicode Sets.
Currencies can also contain optional grouping, decimal data, and pattern elements. This data is inherited from the <symbols> in the same locale data (if not present in the chain up to root), so only the differing data will be present. See the main document Section 4.1 Multiple Inheritance.
Note: Currency values should never be interchanged without a known currency code. You never want the number 3.5 interpreted as $3.50 by one user and €3.50 by another. Locale data contains localization information for currencies, not a currency value for a country. A currency amount logically consists of a numeric value, plus an accompanying currency code (or equivalent). The currency code may be implicit in a protocol, such as where USD is implicit. But if the raw numeric value is transmitted without any context, then it has no definitive interpretation.
Notice that the currency code is completely independent of the end-user's language or locale. For example, BGN is the code for Bulgarian Lev. A currency amount of <BGN, 1.23456×10³> would be localized for a Bulgarian user into "1 234,56 лв." (using Cyrillic letters). For an English user it would be localized into the string "BGN 1,234.56" The end-user's language is needed for doing this last localization step; but that language is completely orthogonal to the currency code needed in the data. After all, the same English user could be working with dozens of currencies. Notice also that the currency code is also independent of whether currency values are inter-converted, which requires more interesting financial processing: the rate of conversion may depend on a variety of factors.
Thus logically speaking, once a currency amount is entered into a system, it should be logically accompanied by a currency code in all processing. This currency code is independent of whatever the user's original locale was. Only in badly-designed software is the currency code (or equivalent) not present, so that the software has to "guess" at the currency code based on the user's locale.
Note: The number of decimal places and the rounding for each currency is not locale-specific data, and is not contained in the Locale Data Markup Language format. Those values override whatever is given in the currency numberFormat. For more information, see Supplemental Currency Data.
For background information on currency names, see [CurrencyInfo].
<!ELEMENT currencyData ( fractions*, region+ ) >
<!ELEMENT fractions ( info+ ) >
<!ELEMENT
info EMPTY >
<!ATTLIST info iso4217 NMTOKEN #REQUIRED
>
<!ATTLIST info digits NMTOKEN #IMPLIED >
<!ATTLIST info rounding NMTOKEN #IMPLIED >
<!ATTLIST info cashDigits NMTOKEN #IMPLIED >
<!ATTLIST info cashRounding NMTOKEN #IMPLIED >
<!ELEMENT region ( currency* ) >
<!ATTLIST region
iso3166 NMTOKEN #REQUIRED >
<!ELEMENT
currency ( alternate* ) >
<!ATTLIST currency from
NMTOKEN #IMPLIED >
<!ATTLIST currency to NMTOKEN
#IMPLIED >
<!ATTLIST currency iso4217 NMTOKEN
#REQUIRED >
<!ATTLIST currency tender ( true | false )
#IMPLIED >
Each currencyData element contains one fractions element followed by one or more region elements. Here is an example for illustration.
<supplementalData> <currencyData> <fractions> … <info iso4217="CHF" digits="2" rounding="5"/> … <info iso4217="ITL" digits="0"/> … </fractions> … <region iso3166="IT"> <currency iso4217="EUR" from="1999-01-01"/> <currency iso4217="ITL" from="1862-8-24" to="2002-02-28"/> </region> … <region iso3166="CS"> <currency iso4217="EUR" from="2003-02-04"/> <currency iso4217="CSD" from="2002-05-15"/> <currency iso4217="YUM" from="1994-01-24" to="2002-05-15"/> </region> … </currencyData> … </supplementalData>
The fractions element contains any number of info elements, with the following attributes:
For example, the following line
<info iso4217="CZK" digits="2" rounding="0"/>
should cause the value 2.006 to be displayed as “2.01”, not “2.00”.
Each region element contains one attribute:
And can have any number of currency elements, with the ordered subelements.
<region iso3166="IT"> <!-- Italy --> <currency iso4217="EUR" from="2002-01-01"/> <currency iso4217="ITL" to="2001-12-31"/> </region>
tender: indicates whether or not the ISO currency code represents a currency that was or is legal tender in some country. The default is "true". Certain ISO codes represent things like financial instruments or precious metals, and do not represent normally interchanged currencies.
That is, each currency element will list an interval in which it was valid. The ordering of the elements in the list tells us which was the primary currency during any period in time. Here is an example of such an overlap:
<currency iso4217="CSD" to="2002-05-15"/> <currency iso4217="YUD" from="1994-01-24" to="2002-05-15"/> <currency iso4217="YUN" from="1994-01-01" to="1994-07-22"/>
The from element is limited by the fact that ISO 4217 does not go very far back in time, so there may be no ISO code for the previous currency.
Currencies change relatively frequently. There are different types of changes:
The UN Information is used to determine dates due to country changes.
When a code is no longer in use, it is terminated (see #1, #2, #4, #5)
Example:
- <currency iso4217="EUR" from="2003-02-04" to="2006-06-03"/>
When codes split, each of the new codes inherits (see #2, #3) the previous data. However, some modifications can be made if it is clear that currencies were only in use in one of the parts.
When codes merge, the data is copied from the most populous part.
Example. When CS split into RS and ME:
- RS & ME copy the former CS, except that the line for EUR is dropped from RS
- CS now terminates on Jun 3, 2006 (following the UN info)
<!ELEMENT plurals (pluralRules*, pluralRanges*) >
<!ATTLIST plurals type ( ordinal | cardinal ) #IMPLIED >
<!-- default is cardinal -->
<!ELEMENT
pluralRules (pluralRule*) >
<!ATTLIST pluralRules
locales NMTOKENS #REQUIRED >
<!ELEMENT
pluralRule ( #PCDATA ) >
<!ATTLIST pluralRule count
(zero | one | two | few | many | other) #REQUIRED >
The plural categories are used to format messages with numeric placeholders, expressed as decimal numbers. The fundamental rule for determining plural categories is the existence of minimal pairs: whenever two different numbers may require different versions of the same message, then the numbers have different plural categories.
This happens even if nouns are invariant; even if all English nouns were invariant (like “sheep”), English would still require 2 plural categories because of subject-verb agreement, and pronoun agreement. For example:
For more information, see Determining-Plural-Categories.
English does not have a separate plural category for “zero”, because it does not require a different message for “0”. For example, the same message can be used below, with just the numeric placeholder changing.
You have 3 friends online.
You have 0 friends online.
However, across many languages it is commonly more natural to express "0" messages with a negative (“None of your friends are online.”) and "1" messages also with an alternate form “You have a friend online.”. Thus pluralized message APIs should also offer the ability to specify at least the 0 and 1 cases explicitly; developers can use that ability whenever these values might occur in a placeholder.
The CLDR plural rules are not expected to cover all cases. For example, strictly speaking, there could be more plural and ordinal forms for English. Formally, we have a different plural form where a change in digits forces a change in the rest of the sentence. There is an edge case in English because of the behavior of "a/an".
For example, in changing from 3 to 8:
So numbers of the following forms could have a special plural category and special ordinal category: 8(X), 11(X), 18(X), 8x(X), where x is 0..9 and the optional X is 00, 000, 00000, and so on.
On the other hand, the above constructions are relatively rare in messages constructed using numeric placeholders, so the disruption for implementations currently using CLDR plural categories wouldn't be worth the small gain.
This section defines the types of plural forms that exist in a language—namely, the cardinal and ordinal plural forms. Cardinal plural forms express units such as time, currency or distance, used in conjunction with a number expressed in decimal digits (i.e. "2", not "two", and not an indefinite number such as "some" or "many"). Ordinal plural forms denote the order of items in a set and are always integers. For example, English has two forms for cardinals:
and four forms for ordinals:
Other languages may have additional forms or only one form for each type of plural. CLDR provides the following tags for designating the various plural forms of a language; for a given language, only the tags necessary for that language are defined, along with the specific numeric ranges covered by each tag (for example, the plural form "few" may be used for the numeric range 2–4 in one language and 3–9 in another):
In addition, an "other" tag is always implicitly defined to cover the forms not explicitly designated by the tags defined for a language. This "other" tag is also used for languages that only have a single form (in which case no plural-form tags are explicitly defined for the language). For a more complex example, consider the cardinal rules for Russian and certain other languages:
<pluralRules locales="hr ru sr uk"> <pluralRules count="one">n mod 10 is 1 and n mod 100 is not 11</pluralRule> <pluralRules count="few">n mod 10 in 2..4 and n mod 100 not in 12..14</pluralRule> </pluralRules>
These rules specify that Russian has a "one" form (for 1, 21, 31, 41, 51, …), a "few" form (for 2–4, 22–24, 32–34, …), and implicitly an "other" form (for everything else: 0, 5–20, 25–30, 35–40, …, decimals). Russian does not need additional separate forms for zero, two, or many, so these are not defined.
The plural category for negative numbers is calculated according to the absolute value of the source. (This may change in the future, if we find languages that have different behavior.)
Plural categories may also differ according to the visible decimals. For example, here are some of the behaviors exhibited by different languages:
Behavior | Description | Example |
---|---|---|
Base | The fractions are ignored; the category is the same as the category of the integer. | 1.13 has the same plural category as 1. |
Separate | All fractions by value are in one category (typically ‘other’ = ‘plural’). | 1.01 gets the same class as 9; 1.00 gets the same category as 1. |
Visible | All visible fractions are in one category (typically ‘other’ = ‘plural). | 1.00, 1.01, 3.5 all get the same category. |
Digits | The visible fraction determines the category. | 1.13 gets the same class as 13. |
There are also variants of the above: for example, short fractions may have the Digits behavior, but longer fractions may just look at the final digit of the fraction.
Some types of CLDR data (such as unitPatterns and currency displayNames) allow specification of plural rules for explicit cases “0” and “1”, in addition to the language-specific plural cases specified above: “zero”, “one”, “two” ... “other”. For the language-specific plural rules:
By contrast, for the explicit cases “0” and “1”:
Usage example: In English (which only defines language-specific rules for “one” and “other”) this can be used to have special behavior for 0:
The xml value for each pluralRule is a condition with a boolean result that specifies whether that rule (i.e. that plural form) applies to a given numeric value n, where n can be expressed as a decimal fraction. Clients of CLDR may express all the rules for a locale using the following syntax:
rules = rule (';' rule)*
rule = keyword ':' condition samples
| 'other' ':' samples
keyword = [a-z]+
keyword = [a-z]+
In CLDR, the keyword is the attribute value of 'count'. Those values in CLDR are currently limited to just what is in the DTD, but clients may support other values.
The conditions themselves have the following syntax.
condition = and_condition ('or' and_condition)* samples = ('@integer' sampleList)?
('@decimal' sampleList)? and_condition = relation ('and' relation)*
relation = is_relation | in_relation | within_relation
is_relation = expr 'is' ('not')? value
in_relation = expr (('not')? 'in' | '=' | '!=') range_list
within_relation = expr ('not')? 'within' range_list
expr = operand (('mod' | '%') value)? operand = 'n' | 'i' | 'f' | 't' | 'v' | 'w'
range_list = (range | value) (',' range_list)*
range = value'..'value sampleList = sampleRange (',' sampleRange)* (',' ('…'|'...'))? sampleRange = decimalValue ('~' decimalValue)? value = digit+ decimalValue = value ('.' value)?
digit = 0|1|2|3|4|5|6|7|8|9
The operands have the following meaning:
Symbol | Value |
---|---|
n | absolute value of the source number (integer and decimals). |
i | integer digits of n. |
v | number of visible fraction digits in n, with trailing zeros. |
w | number of visible fraction digits in n, without trailing zeros. |
f | visible fractional digits in n, with trailing zeros. |
t | visible fractional digits in n, without trailing zeros. |
n | i |
v |
w |
f |
t |
---|---|---|---|---|---|
1 | 1 |
0 |
0 |
0 |
0 |
1.0 | 1 |
1 |
0 |
0 |
0 |
1.00 | 1 |
2 |
0 |
0 |
0 |
1.3 | 1 |
1 |
1 |
3 |
3 |
1.30 | 1 |
2 |
1 |
30 |
3 |
1.03 | 1 |
2 |
2 |
3 |
3 |
1.230 | 1 |
3 |
2 |
230 |
23 |
The positive relations are of the format x = y and x = y mod z. The y value can be a comma-separated list, such as n = 3, 5, 7..15, and is treated as if each relation were expanded into an OR statement. The range value a..b is equivalent to listing all the integers between a and b, inclusive. When != is used, it means the entire relation is negated.
Expression | Meaning |
---|---|
x = 2..4, 15 | x = 2 OR x = 3 OR x = 4 OR x = 15 |
x != 2..4, 15 | NOT (x = 2 OR x = 3 OR x = 4 OR x = 15) |
Expression | Value |
---|---|
3.5 = 2..4, 15 | false |
3.5 != 2..4, 15 | true |
3 = 2..4, 15 | true |
3 != 2..4, 15 | false |
The old keywords 'mod', 'in', 'is', and 'within' are present only for backwards compatibility. The preferred form is to use '%' for modulo, and '=' or '!=' for the relations, with the operand 'i' instead of within. (The difference between in and within is that in only includes integers in the specified range, while within includes all values.)
The modulus (% or mod) is a remainder operation as defined in Java; for example, where n = 4.3 the result of n mod 3 is 1.3.
The values of relations are defined according to the operand as follows. Importantly, the results may depend on the visible decimals in the source, including trailing zeros.
Rules | Comments |
---|---|
one: n = 1 few: n = 2..4 |
This defines two rules, for 'one' and 'few'. The condition for 'one' is "n = 1" which means that the number must be equal to 1 for this condition to pass. The condition for 'few' is "n = 2..4" which means that the number must be between 2 and 4 inclusive for this condition to pass. All other numbers are assigned the keyword 'other' by the default rule. |
zero: n = 0 or n != 1 and n mod 100 = 1..19 one: n = 1 |
Each rule must not overlap with other rules. Also note that a modulus is applied to n in the last rule, thus its condition holds for 119, 219, 319… |
one: n = 1 few: n mod 10 = 2..4 and n mod 100 != 12..14 |
This illustrates conjunction and negation. The condition for 'few' has two parts, both of which must be met: "n mod 10 = 2..4" and "n mod 100 != 12..14". The first part applies a modulus to n before the test as in the previous example. The second part applies a different modulus and also uses negation, thus it matches all numbers not in 12, 13, 14, 112, 113, 114, 212, 213, 214… |
Samples are provided if sample indicator (@integer or @decimal) is present on any rule. (CLDR always provides samples.)
Where samples are provided, the absence of one of the sample indicators indicates that no numeric values can satisify that rule. For example, the rule "i = 1 and v = 0" can only have integer samples, so @decimal must not occur.
The sampleRanges have a special notation: start~end. The start and end values must have the same number of decimal digits. The range encompasses all and only values those value v where start ≤ v ≤ end, and where v has the same number of decimal places as start and end.
Samples must indicate whether they are infinite or not. The '…' marker must be present if and only infinitely many values (integer or decimal) can satisfy the rule. If a set is not infinite, it must list all the possible values.
Rules | Comments |
---|---|
@integer 1, 3~5 | 1, 3, 4, 5. |
@integer 3~5, 103~105, … | Infinite set: 3, 4, 5, 103, 104, 105, … |
@decimal 1.3~1.5, 1.03~1.05, … | Infinite set: 1.3, 1.4, 1.5, 1.03, 1.04, 1.05, … |
In determining whether a set of samples is infinite, leading zero integer digits and trailing zero decimals are not significant. Thus "i = 1 and f = 0" is satisfied by 01, 1, 1.0, 1.00, 1.000, etc. but is still considered finite.
Elements such as <currencyFormats>, <currency> and <unit> provide selection among subelements designating various localized cardinal plural forms by tagging each of the relevant subelements with a different count value, or with no count value in some cases. Note that the plural forms for a specific currencyFormat, unit type, or currency type may not use all of the different plural-form tags defined for the language. To format a currency or unit type for a particular numeric value, determine the count value according to the plural rules for the language, then select the appropriate display form for the currency format, currency type or unit type using the rules in those sections:
<!ELEMENT pluralRanges (pluralRange*) >
<!ATTLIST
pluralRanges locales NMTOKENS #REQUIRED >
<!ELEMENT pluralRange ( #PCDATA ) >
<!ATTLIST
pluralRange start (zero|one|two|few|many|other) #IMPLIED >
<!ATTLIST pluralRange end (zero|one|two|few|many|other) #IMPLIED
>
<!ATTLIST pluralRange result
(zero|one|two|few|many|other) #REQUIRED >
Often ranges of numbers are presented to users, such as in “Length: 3.2–4.5 centimeters”. This means any length from 3.2 cm to 4.5 cm, inclusive. However, different languages have different conventions for the pluralization given to a range: should it be “0–1 centimeter” or “0–1 centimeters”? This becomes much more complicated for languages that have many different plural forms, such as Russian or Arabic.
The pluralRanges element provides information allowing an implementation to derive the plural category of a range from the plural categories of the start and end values. If there is no value for a <start,end> pair, the default result is end. However, where that result has been verified for a given language, it is included in the CLDR data.
The data has been gathered presuming that in any usage, the start value is strictly less than the end value, and that no values are negative. Results for any cases that do not meet these criteria are undefined.
<!ELEMENT rbnf ( alias | rulesetGrouping*) >
<!ELEMENT rulesetGrouping ( alias | ruleset*) >
<!ATTLIST rulesetGrouping type NMTOKEN #REQUIRED>
<!ELEMENT ruleset ( alias | rbnfrule*) >
<!ATTLIST
ruleset type NMTOKEN #REQUIRED>
<!ATTLIST ruleset
access ( public | private ) #IMPLIED >
<!ELEMENT rbnfrule ( #PCDATA ) >
<!ATTLIST rbnfrule
value CDATA #REQUIRED >
<!ATTLIST rbnfrule radix CDATA
#IMPLIED >
<!ATTLIST rbnfrule decexp CDATA #IMPLIED
>
The rule-based number format (RBNF) encapsulates a set of rules for mapping binary numbers to and from a readable representation. They are typically used for spelling out numbers, but can also be used for other number systems like roman numerals, Chinese numerals, or for ordinal numbers (1st, 2nd, 3rd,…).
Where, however, the CLDR plurals or ordinals can be used, their usage is recommended in preference to the RBNF data. First, the RBNF data is not completely fleshed out over all languages that otherwise have modern coverage. Secondly, the alternate forms are neither complete, nor useful without additional information. For example, for German there is spellout-cardinal-masculine, and spellout-cardinal-feminine. But a complete solution would have all genders (masculine/feminine/neuter), all cases (nominative, accusative, dative, genitive), plus context (with strong or weak determiner or none). Moreover, even for the alternate forms that do exist, CLDR does not supply any data for when to use one vs another (eg, when to use spellout-cardinal-masculine vs spellout-cardinal-feminine). So these data are inappropriate for general purpose software.
There are 4 common spellout rules. Some languages may provide more
than these 4 types:
In addition to the spellout rules, there are also a numbering system
rules. Even though they may be derived from a specific culture, they
are typically not translated and the rules are in root.
An example of these rules are the Roman numerals where the value 8
comes out as VIII.
With regards to the number range supported for all these number
types, the largest possible number range tries to be supported, but
some languages may not have words for large numbers. For example, the
old Roman numbering system can't support the value 5000 and beyond.
For those unsupported cases, the default number format from CLDR is
used.
Any rules marked as private should never be
referenced externally. Frequently they only support a subrange of
numbers that are used in the public rules.
The syntax used in the CLDR representation of rules is intended to be simply a transcription of ICU based RBNF rules into an XML compatible syntax. The rules are fairly sophisticated; for details see Rule-Based Number Formatter [RBNF].
<ruleSetGrouping>
Used to group rules into functional sets for use with ICU. Currently, the valid types of rule set groupings are "SpelloutRules", "OrdinalRules", and "NumberingSystemRules".
<ruleset>
This element denotes a specific rule set to the number formatter. The ruleset is assumed to be a public ruleset unless the attribute type="private" is specified.
<rule>
Contains the actual formatting rule for a particular number or sequence of numbers. The "value" attribute is used to indicate the starting number to which the rule applies. The actual text of the rule is identical to the ICU syntax, with the exception that Unicode left and right arrow characters are used to replace < and > in the rule text, since < and > are reserved characters in XML. The "radix" attribute is used to indicate an alternate radix to be used in calculating the prefix and postfix values for number formatting. Alternate radix values are typically used for formatting year numbers in formal documents, such as "nineteen hundred seventy-six" instead of "one thousand nine hundred seventy-six".
The following elements are relevant to determining the value of a parsed number:
Other characters should either be ignored, or indicate the end of input, depending on the application. The key point is to disambiguate the sets of characters that might serve in more than one position, based on context. For example, a period might be either the decimal separator, or part of a currency symbol (for example, "NA f."). Similarly, an "E" could be an exponent indicator, or a currency symbol (the Swaziland Lilangeni uses "E" in the "en" locale). An apostrophe might be the decimal separator, or might be the grouping separator.
Here is a set of heuristic rules that may be helpful:
Note: In some environments, applications may independently wish to restrict the decimal digit set to prevent security problems. See [UTR36].
Copyright © 2001–2018 Unicode, Inc. All Rights Reserved. The Unicode Consortium makes no expressed or implied warranty of any kind, and assumes no liability for errors or omissions. No liability is assumed for incidental and consequential damages in connection with or arising out of the use of the information or programs contained or accompanying this technical report. The Unicode Terms of Use apply.
Unicode and the Unicode logo are trademarks of Unicode, Inc., and are registered in some jurisdictions.