No. Date Rel. Note Data Charts Spec Delta SVN Tag DTD Diffs
30΀ 2016-10-05 v30 CLDR30 Charts30 LDML30 Δ30 release-30 ΔDTD30

Overview

Unicode CLDR 30 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release:

Structural additions and changes

The new data fields being added to the release are:

In locale data, 
new element <relativePeriod>,
new attribute count for <dateFormatItem>;
in supplemental data,
new element <weekOfPreference>
For generated periods like “the week of August 10”. Data examples:
  <availableFormats>
    <dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>
    <dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>
...
<field type="week">
  <relativePeriod>the week of {0}</relativePeriod>
Note: the structure and intended usage for these items is still being refined, see Warnings and Errata.
New relative data items for weekdays,
and for “this hour”, “this minute”
Examples:
<field type="sun">
  <relativeTime type="future">
    <relativeTimePattern count=...>in {0} Sundays</relativeTimePattern>
...
<field type="hour">
  <relative type="0">this hour</relative>
New unit patterns for
“per square kilometer”, “per square mile”
 
In locale data, new elements
<characterLabel>,
<characterLabelPatterns>
To generate labels for groups of related characters in character pickers
In annotation data,
new attribute type for <annotation>,
deprecated attribute tts for  <annotation>
Restructured to make the difference clearer between short names (text-to-speech) and other keywords (for predictive typing, search, etc.). See detail below.
New data file
ExtendedPictographic.txt
Specifies property data for “future-proofing” emoji segmentation.
The structure for annotations has changed to make processing simpler:
OLD: <annotation cp='[😀]' tts='grinning face'>face; grin</annotation>
NEW: <annotation cp="😀">face | grin</annotation>
<annotation cp="😀" type="tts">grinning face</annotation

Other changes:

  • The formerly-deprecated number symbols <currencyGroup> element was un-deprecated since it is needed for de_AT
  • The <transformNames> element and its subelements have been deprecated. In their stead, some additional key/type names are added.
For more details, see DTD DeltasModifications.

Data additions and changes

Unicode 9

Support was added for Unicode 9.0; this includes:
  • Updating to Unicode 9.0 Unihan data (significant kMandarin reading improvements) for the pinyin collation and Han-Latin transform
  • New script codes for Adlam, Bhaiksuki, Marchen, Newa, Osage
  • New numbering systems adlm, bhks, newa

Locale item codes and names

Other updates for locale item codes and names, in addition to those for Unicode 9:
  • Some support for new region codes EZ, UN (though names for EZ are not available in languages other than English).
  • Support for new timezones uasf "Europe/Astrakhan",rubax "Asia/Barnaul", rukvx "Europe/Kirov", rutof "Asia/Tomsk",  ruuly "Europe/Ulyanovs
  • Support for new Belarusian ruble code BYN
  • Updated english names for bn/Beng “Bangla”, mic “Mi'kmaq”, or “Odia”
  • Documented the use of script subtag “Zxxx” to indicate spoken or otherwise unwritten content.
  • The set of language and script names for which translations are requested was revamped, leading to a substantial increase in the number of such names.
  • Substantial new data has been added for likely subtags (e.g. to get the main script for each language).

Emoji

Note: The emoji data items listed below were updated to cover the upcoming Emoji 4.0 release, based on a draft version of the emoji 4.0 data. The emoji 4.0 data may be revised after CLDR 30 is released.
  • Emoji short names and keywords were fleshed out for nearly 80 languages.
  • Added data files to support improved and “future-proofed” text segmentation behavior for emoji.
  • Updated emoji collation.

Other

Some other items of note:
  • Support for IANA timezone data through 2016g
  • The time separator for Norwegian locales (nb, nn) was changed to be ':' throughout.
  • Number symbols (plus, minus, percent) and formats (currency, percent) were updated for ar, fa, he in an effort to produce better results. Note that depending on locale and numbering system, those number symbols and currency patterns may contain bidi controls LRM, RLM, and ALM. Use of the ALM is new in CLDR 30; CLDR clients who will be using this data on systems that do not yet support the bidi algorithm of Unicode 6.3 and later should map instances of ALM (U+061C) to RLM (U+200F).
  • Use “lei” as narrow currency symbol for Romanian Leu (RON)
  • Collation: Uzbek (uz) collation added, Church Slavic (cu) collation fixed, Czech (cs) search collation now uses the standard search collator.
  • Some number spellout (RBNF) fixes for French, Portuguese, Spanish, Hebrew, Vietnamese.
  • Transforms:
    • Adds transform from Zawgyi-encoded Myanmar (Burmese) to standard Unicode representation.
    • Adds transliteration of 10 Indic languages to Urdu and Arabic.
    • Adds phonetic transcription of Cherokee to IPA, as well as transliteration of 21 languages to Cherokee.

Growth

CLDR 30 included a Survey Tool data collection phase, and collection of annotation data. The following shows the growth of locale-data over time.


This graph does not include data outside of the main and annotations directories, such as sorting order, transliterations, validity data, and so forth. The following gives the total overview of the change in data items in CLDR.

added items 9.32%
deleted items* 0.12%
changed items 5.90%
total items 818,314

The measurement of the number of items is reflects the different ways that the information is represented. A single data field (element or attribute value) may result in multiple data items. For example, plural rules may be shared by multiple languages, and a single data field contains all the languages to which those rules apply. Sometimes a changed item appears as a deletion+addition, and sequences of items (such as sort order) are not counted as different even if the order changes. For more details, see the Delta Data charts.

JSON data

The JSON-format data and details about it are not yet available, but will be soon.

Survey Tool

  • Added 8-vote requirement to et, fa, en_GB, en_AU.
  • The Survey Tool now shows a link to where inherited data is aliased from. [#9489]

Specification changes

  • Documented the use of script subtag “Zxxx” to indicate spoken or otherwise unwritten content.
  • Documented the Unicode locale extension key "em" to control emoji presentation.
  • Described the DTD annotations used in CLDR.
  • Noted that number patterns may contain bidi controls.
  • Described how to handle fractional seconds S when matching skeletons and adjusting corresponding patterns.
For details, see Modifications.

Migration

Users of the annotation data need to move to the new structure, described above.

There are changes to the bidirectional control characters in number symbols and number patterns for number systems 'arab' and 'arabext', and for number system 'latn' in some locales. These include use of the ALM (Arabic Letter Mark) character, which was new in Unicode 6.3.

Known Issues

“Week of” structure

The structure and intended usage for the “week x of y” patterns is still being refined and may change. This applies especially to dateFormatItems such as the following:

    <dateFormatItem id="MMMMW" count=...>'week' W 'of' MMM</dateFormatItem>
    <dateFormatItem id="yw" count=...>'week' w 'of' y</dateFormatItem>

Areas of discussion include the use of the count attribute and the use of ordinal vs. cardinal numbers.

Emoji data

The emoji-related data in CLDR 30 is based on a draft version of emoji 4.0 data, which may change before it is finalized.

Emoji annotation short names

The process described in LDML 30 for synthesizing short names of emoji sequences may be updated; if so, details will be provided here.

Key

  • The Release Note contains a general description of the contents of the release, and any relevant notes about the release.
  • The Data link points to a set of zip files containing the contents of the release (the files are complete in themselves, and do not require files from earlier releases -- for the structure of the zip file, see Repository Organization).
  • The Spec is the version of UTS #35: LDML that corresponds to the release.
  • The Delta document points to a list of all the bug fixes and features in the release, which be used to get the precise corresponding file changes using BugDiffs.
  • The SVN Tag can be used to get the files via Repository Access.
  • For more details see CLDR Releases (Downloads).

The Unicode Terms of Use apply to CLDR data; in particular, see Exhibit 1.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts.