Unicode CLDR Survey Tool
Most data in the
Unicode Common Locale Data Repository is gathered and processed via
what is called the Survey Tool, an online tool that can be used to view
data for different languages and propose additions or changes. This tool provides a way to propose new
localized data, see what others have proposed, and communicate with
them to resolve differences.
During
each
submission period, contributors from Unicode Consortium members, other
organizations and the public at large are invited to review the data
for their languages and countries, and propose new translations of
terms or modifications, including language translations entirely new to
the repository. For the release schedule, see
CLDR
Project.
In this release, new structure has been added to provide for plurals,
simple duration formats, more
control over the formatting of locale names. There are a
number of changes in the tool for usability: for example, only the
timezone names that are important to translate are shown. There are
also new items for translation, such as new territory codes. We would
also
like people to focus on getting enough votes for the unapproved items
to
make them
approved.
The following
provides a brief description of the process.
Accounts
You don’t need an account to view data for a
particular language. If you wish to propose changes or additions, you
will need an account: see Survey
Tool Accounts.
Locale List
The main screen of the survey tool is located at
http://unicode.org/cldr/apps/survey.
It displays a list of languages currently available. Languages will
vary by script (Arabic vs. Latin, or Simplified vs. Traditional
Chinese), and occasionally by country. For historic reasons, this
combination of language with script or country is known as a
locale.
For each language, the content is what is
appropriate for the most populous country, thus the content for English
[en] is whatever is appropriate for the United States. Any variation by
country for that language will be represented in a country locale: thus
content appropriate for the Australia that differs from what is in English
[en] will be in a the sublocale English (Australia) [en_AU].
Click on the languages (optionally countries) that
you would like to view. You can always get back to this page by
clicking on Locales at the top left of the page.
- If you have permission to modify a locale,
after the locale you will see the symbol.
- If you would like to add data for a new
locale, please notify your CLDR contact (see Survey
Tool Accounts). He or she can add an empty locale which can
then have data added into it.
Reviewing and Submitting Data
There is a key explaining the way the windows are laid out at
Survey
Tool Windows. You should review this before starting. You
will then start going through each section:
languages,
scripts,
territories,
... all the way to
supplemental.
- As you go through the sections, you will
generally review the Priority items (if any), and fix or add
translations.
- Click on the right option, if it is there.
- Otherwise click on "change to",
then type in the fixed or new text.
- Important:
before you leave any page, click on the Save button
to save your changes. It's also a good idea to do this if you are
spending a lot of time on a page, just in case there is a problem.
- Control-F to find something on the page is
really useful in moving around on these pages, as are Page up and Page
down keys. You can also switch between ordering items by Priority
vs Code.
- More information is in the "Zoomed" view, so be
sure to look at that if you have any questions. It's also a good idea
to zoom in on at least one item in each section, to review any
information for that section.
The locale data should be in the customary form
for the target language, in the form that is in most common usage. For
example, for the territory name in English one would use "Switzerland"
instead of "Swiss Confederation", and use "United Kingdom" instead of
"The United Kingdom of Great Britain and Northern Ireland".
Coverage
The warnings about missing items are based on your
coverage level. This level can be from comprehensive
(all possible items) down to basic (a very minimal
set of items). Locales that don't meet at least basic
level may not be complete enough to be in the official release
(although the data will be kept in the working repository).
- You can go to My Options
(in the top-left corner of each page), and set your coverage level
explicitly.
- If you are from a Unicode member organization,
your default coverage level will be set for you. However, you may want
to increase your coverage (on My Options)
in order to get more warnings about the next-priority items.
Caution: these warnings are
mechanically generated, and do not substitute for your judgment: you
may want to translate more items based on your knowledge. For example,
a Ukrainian speaker may want to translate the names of the neighboring
countries, even if those are not warnings at the current coverage level.
Country-Specific Information
The language locale should contain the most
broadly used data for that language, and should be appropriate for the
most populous region; other specific region locales should only contain
data where they need to override individual items, when the "inherited"
language locale data would not be customary in that region.
Once you've looked over all the sections in your
language, you should go back to the Locale
window, and scroll back to your language. You'll see different
countries there on the right side of your language. If there are locale
variations in the use of your language, according to country, then you
can change them now. You only need to do this for
cases where the usage in the countries differ from the main language.
Each
language has the default content for one of the countries using the
language. You won't be able to edit that country locale; instead, any
modifications should go in the main language locale.
Resolving Differences among Translators
After the data submission phase, any differences
in the submitted data will be resolved according to the data resolution
process. However,
even during the submission phase, you should collaborate with the other
translators where you have questions, via email and the forums.
Problems?
The tool has undergone substantial revisions based on feedback we
received during the last release. There are still some rough edges and
we ask for your patience with problems that occur. In particular, the
tool is not designed to handle a large number of people working at the
same time, so if it appears unresponsive, please try again later on
(and save your work as you go).
If you find a problem, you may want to review Known
Bugs
to see whether it has already been reported (and whether there is a
work-around). If not, or if you have suggestions for improvements,
please file a bug using the Feedback link at the bottom of each window.
If there are other issues, you can raise them on the Unicode
CLDR Mailing List.
The data in the locale repository
should contain the most appropriate choice of characters for the
representation of the text. It may thus include Unicode characters that
are not included in a given legacy character set. In particular, the
data may contain curly quotes and apostrophes (such as in “can’t”), and
similar characters such as the letter modifiers in ʻōlelo Hawaiʻi.
These characters provide more
distinctions than are available with the generic ASCII repertoire. They
may be “downcast” to the best available characters when the data is
imported into systems with a more limited repertoire of supported
characters. (Downcasting information is provided with character
fallback substitutions.)
Hong Kong, Macau
The territory codes HK and MO are to
be translated with the native equivalent of “Hong Kong SAR China” and
“Macao SAR China”, respectively. SAR stands for “Special Administrative
Region” and can be represented with acronym in the target language.
There are alternative, short versions of these that should also be
translated; those omit the "SAR China".