[Unicode]   Common Locale Data Repository Home | Site Map | Search
 

Unicode CLDR Survey Tool


Most data in the Unicode Common Locale Data Repository is gathered and processed via what is called the Survey Tool, an online tool that can be used to view data for different languages and propose additions or changes. This tool provides a way to propose new localized data, see what others have proposed, and communicate with them to resolve differences.

During each submission period, contributors from Unicode Consortium members, other organizations and the public at large are invited to review the data for their languages and countries, and propose new translations of terms or modifications, including language translations entirely new to the repository. For the release schedule, see CLDR Project.

In this release, new structure has been added to provide for plurals, simple duration formats, more control over the formatting of locale names. There are a number of changes in the tool for usability: for example, only the timezone names that are important to translate are shown. There are also new items for translation, such as new territory codes. We would also like people to focus on getting enough votes for the unapproved items to make them approved.

The following provides a brief description of the process.

Accounts

You don’t need an account to view data for a particular language. If you wish to propose changes or additions, you will need an account: see Survey Tool Accounts.

Locale List

The main screen of the survey tool is located at http://unicode.org/cldr/apps/survey. It displays a list of languages currently available. Languages will vary by script (Arabic vs. Latin, or Simplified vs. Traditional Chinese), and occasionally by country. For historic reasons, this combination of language with script or country is known as a locale.

For each language, the content is what is appropriate for the most populous country, thus the content for English [en] is whatever is appropriate for the United States. Any variation by country for that language will be represented in a country locale: thus content appropriate for the Australia that differs from what is in English [en] will be in a the sublocale English (Australia) [en_AU].

Click on the languages (optionally countries) that you would like to view. You can always get back to this page by clicking on Locales at the top left of the page.

  • If you have permission to modify a locale, after the locale you will see the The image “http://unicode.org/cldr/apps/hand.png” cannot be displayed, because it contains errors. symbol.
  • If you would like to add data for a new locale, please notify your CLDR contact (see Survey Tool Accounts). He or she can add an empty locale which can then have data added into it.

Reviewing and Submitting Data

There is a key explaining the way the windows are laid out at Survey Tool Windows. You should review this before starting. You will then start going through each section: languages, scripts, territories, ... all the way to supplemental.
  • As you go through the sections, you will generally review the Priority items (if any), and fix or add translations.
    • Click on the right option, if it is there.
    • Otherwise click on "change to", then type in the fixed or new text.
    • Important: before you leave any page, click on the Save button to save your changes. It's also a good idea to do this if you are spending a lot of time on a page, just in case there is a problem.
    • Control-F to find something on the page is really useful in moving around on these pages, as are Page up and Page down keys. You can also switch between ordering items by Priority vs Code.
  • More information is in the "Zoomed" view, so be sure to look at that if you have any questions. It's also a good idea to zoom in on at least one item in each section, to review any information for that section.

The locale data should be in the customary form for the target language, in the form that is in most common usage. For example, for the territory name in English one would use "Switzerland" instead of "Swiss Confederation", and use "United Kingdom" instead of "The United Kingdom of Great Britain and Northern Ireland".

Coverage

The warnings about missing items are based on your coverage level. This level can be from comprehensive (all possible items) down to basic (a very minimal set of items). Locales that don't meet at least basic level may not be complete enough to be in the official release (although the data will be kept in the working repository).

  • You can go to My Options (in the top-left corner of each page), and set your coverage level explicitly.
  • If you are from a Unicode member organization, your default coverage level will be set for you. However, you may want to increase your coverage (on My Options) in order to get more warnings about the next-priority items.

Caution: these warnings are mechanically generated, and do not substitute for your judgment: you may want to translate more items based on your knowledge. For example, a Ukrainian speaker may want to translate the names of the neighboring countries, even if those are not warnings at the current coverage level.

Country-Specific Information

The language locale should contain the most broadly used data for that language, and should be appropriate for the most populous region; other specific region locales should only contain data where they need to override individual items, when the "inherited" language locale data would not be customary in that region.

Once you've looked over all the sections in your language, you should go back to the Locale window, and scroll back to your language. You'll see different countries there on the right side of your language. If there are locale variations in the use of your language, according to country, then you can change them now. You only need to do this for cases where the usage in the countries differ from the main language.

Each language has the default content for one of the countries using the language. You won't be able to edit that country locale; instead, any modifications should go in the main language locale. 

Resolving Differences among Translators

After the data submission phase, any differences in the submitted data will be resolved according to the data resolution process. However, even during the submission phase, you should collaborate with the other translators where you have questions, via email and the forums.

Problems?

The tool has undergone substantial revisions based on feedback we received during the last release. There are still some rough edges and we ask for your patience with problems that occur. In particular, the tool is not designed to handle a large number of people working at the same time, so if it appears unresponsive, please try again later on (and save your work as you go).

If you find a problem, you may want to review Known Bugs to see whether it has already been reported (and whether there is a work-around). If not, or if you have suggestions for improvements, please file a bug using the Feedback link at the bottom of each window. If there are other issues, you can raise them on the Unicode CLDR Mailing List.

Special Considerations

Character Repertoire

The data in the locale repository should contain the most appropriate choice of characters for the representation of the text. It may thus include Unicode characters that are not included in a given legacy character set. In particular, the data may contain curly quotes and apostrophes (such as in “can’t”), and similar characters such as the letter modifiers in ʻōlelo Hawaiʻi.

These characters provide more distinctions than are available with the generic ASCII repertoire. They may be “downcast” to the best available characters when the data is imported into systems with a more limited repertoire of supported characters. (Downcasting information is provided with character fallback substitutions.)

Hong Kong, Macau

The territory codes HK and MO are to be translated with the native equivalent of “Hong Kong SAR China” and “Macao SAR China”, respectively. SAR stands for “Special Administrative Region” and can be represented with acronym in the target language. There are alternative, short versions of these that should also be translated; those omit the "SAR China".


Access to Copyright and terms of use