• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# ******************************************************************************
2# *
3# *   Copyright (C) 1995-2014, International Business Machines
4# *   Corporation and others.  All Rights Reserved.
5# *
6# ******************************************************************************
7
8# If this converter alias table looks very confusing, a much easier to
9# understand view can be found at this demo:
10# http://demo.icu-project.org/icu-bin/convexp
11
12# IMPORTANT NOTE
13#
14# This file is not read directly by ICU. If you change it, you need to
15# run gencnval, and eventually run pkgdata to update the representation that
16# ICU uses for aliases. The gencnval tool will normally compile this file into
17# cnvalias.icu. The gencnval -v verbose option will help you when you edit
18# this file.
19
20# Please be friendly to the rest of us that edit this table by
21# keeping this table free of tabs.
22
23# This is an alias file used by the character set converter.
24# A lot of converter information can be found in unicode/ucnv.h, but here
25# is more information about this file.
26#
27# If you are adding a new converter to this list and want to include it in the
28# icu data library, please be sure to add an entry to the appropriate ucm*.mk file
29# (see ucmfiles.mk for more information).
30#
31# Here is the file format using BNF-like syntax:
32#
33# converterTable ::= tags { converterLine* }
34# converterLine ::= converterName [ tags ] { taggedAlias* }'\n'
35# taggedAlias ::= alias [ tags ]
36# tags ::= '{' { tag+ } '}'
37# tag ::= standard['*']
38# converterName ::= [0-9a-zA-Z:_'-']+
39# alias ::= converterName
40#
41# Except for the converter name, aliases are case insensitive.
42# Names are separated by whitespace.
43# Line continuation and comment sytax are similar to the GNU make syntax.
44# Any lines beginning with whitespace (e.g. U+0020 SPACE or U+0009 HORIZONTAL
45# TABULATION) are presumed to be a continuation of the previous line.
46# The # symbol starts a comment and the comment continues till the end of
47# the line.
48#
49# The converter
50#
51# All names can be tagged by including a space-separated list of tags in
52# curly braces, as in ISO_8859-1:1987{IANA*} iso-8859-1 { MIME* } or
53# some-charset{MIME* IANA*}. The order of tags does not matter, and
54# whitespace is allowed between the tagged name and the tags list.
55#
56# The tags can be used to get standard names using ucnv_getStandardName().
57#
58# The complete list of recognized tags used in this file is defined in
59# the affinity list near the beginning of the file.
60#
61# The * after the standard tag denotes that the previous alias is the
62# preferred (default) charset name for that standard. There can only
63# be one of these default charset names per converter.
64
65
66
67# The world is getting more complicated...
68# Supporting XML parsers, HTML, MIME, and similar applications
69# that mark encodings with a charset name can be difficult.
70# Many of these applications and operating systems will update
71# their codepages over time.
72
73# It means that a new codepage, one that differs from an
74# old one by changing a code point, e.g., to the Euro sign,
75# must not get an old alias, because it would mean that
76# old files with this alias would be interpreted differently.
77
78# If an codepage gets updated by assigning characters to previously
79# unassigned code points, then a new name is not necessary.
80# Also, some codepages map unassigned codepage byte values
81# to the same numbers in Unicode for roundtripping. It may be
82# industry practice to keep the encoding name in such a case, too
83# (example: Windows codepages).
84
85# The aliases listed in the list of character sets
86# that is maintained by the IANA (http://www.iana.org/) must
87# not be changed to mean encodings different from what this
88# list shows. Currently, the IANA list is at
89# http://www.iana.org/assignments/character-sets
90# It should also be mentioned that the exact mapping table used for each
91# IANA names usually isn't specified. This means that some other applications
92# and operating systems are left to interpret the exact mappings for the
93# underspecified aliases. For instance, Shift-JIS on a Solaris platform
94# may be different from Shift-JIS on a Windows platform. This is why
95# some of the aliases can be tagged to differentiate different mapping
96# tables with the same alias. If an alias is given to more than one converter,
97# it is considered to be an ambiguous alias, and the affinity list will
98# choose the converter to use when a standard isn't specified with the alias.
99
100# Name matching is case-insensitive. Also, dashes '-', underscores '_'
101# and spaces ' ' are ignored in names (thus cs-iso_latin-1, csisolatin1
102# and "cs iso latin 1" are the same).
103# However, the names in the left column are directly file names
104# or names of algorithmic converters, and their case must not
105# be changed - or else code and/or file names must also be changed.
106# For example, the converter ibm-921 is expected to be the file ibm-921.cnv.
107
108
109
110# The immediately following list is the affinity list of supported standard tags.
111# When multiple converters have the same alias under different standards,
112# the standard nearest to the top of this list with that alias will
113# be the first converter that will be opened. The ordering of the aliases
114# after this affinity list does not affect the preferred alias, but it may
115# affect the order of the returned list of aliases for a given converter.
116#
117# The general ordering is from specific and frequently used to more general
118# or rarely used at the bottom.
119{
120    UTR22           # Name format specified by http://www.unicode.org/unicode/reports/tr22/
121    HTML            # WHATWG's encoding spec; https://encoding.spec.whatwg.org
122    IANA            # Source: http://www.iana.org/assignments/character-sets
123    MIME            # Source: http://www.iana.org/assignments/character-sets
124    }
125
126UTF-8 { MIME* HTML* }
127    unicode-1-1-utf-8
128    utf8
129
130utf-16be { MIME* HTML* }
131
132utf-16le { MIME* HTML* }
133    utf-16
134
135ibm866-html
136    IBM866 { MIME* HTML* }
137    866
138    cp866
139    csibm866
140
141iso-8859-2-html
142    ISO-8859-2 { MIME* HTML* }
143    csisolatin2
144    iso-ir-101
145    iso8859-2
146    iso88592
147    iso_8859-2
148    iso_8859-2:1987
149    l2
150    latin2
151
152iso-8859-3-html
153    ISO-8859-3 { MIME* HTML* }
154    csisolatin3
155    iso-ir-109
156    iso8859-3
157    iso88593
158    iso_8859-3
159    iso_8859-3:1988
160    l3
161    latin3
162
163iso-8859-4-html
164    ISO-8859-4 { MIME* HTML* }
165    csisolatin4
166    iso-ir-110
167    iso8859-4
168    iso88594
169    iso_8859-4
170    iso_8859-4:1988
171    l4
172    latin4
173
174iso-8859-5-html
175    ISO-8859-5 { MIME* HTML* }
176    csisolatincyrillic
177    cyrillic
178    iso-ir-144
179    iso8859-5
180    iso88595
181    iso_8859-5
182    iso_8859-5:1988
183
184iso-8859-6-html
185    ISO-8859-6 { MIME* HTML* }
186    arabic
187    asmo-708
188    csiso88596e
189    csiso88596i
190    csisolatinarabic
191    ecma-114
192    iso-8859-6-e
193    iso-8859-6-i
194    iso-ir-127
195    iso8859-6
196    iso88596
197    iso_8859-6
198    iso_8859-6:1987
199
200iso-8859-7-html
201    ISO-8859-7 { MIME* HTML* }
202    csisolatingreek
203    ecma-118
204    elot_928
205    greek
206    greek8
207    iso-ir-126
208    iso8859-7
209    iso88597
210    iso_8859-7
211    iso_8859-7:1987
212    sun_eu_greek
213
214iso-8859-8-html
215    ISO-8859-8 { MIME* HTML* }
216    csiso88598e { MIME }
217    csisolatinhebrew
218    hebrew
219    ISO-8859-8-E
220    ISO-8859-8-I
221    iso-ir-138
222    iso8859-8
223    iso88598
224    iso_8859-8
225    iso_8859-8:1988
226    visual
227    # adding this one leads to a failure in encoding-labels.html
228#   csiso88598i
229
230
231# This alias has to be dealt with by TextCodecICU unless
232# multiple encodings can share a single mapping table.
233#ISO-8859-8-I { MIME* HTML* }
234#   csiso88598i
235#   logical
236
237iso-8859-10-html
238    ISO-8859-10 { MIME* HTML* }
239    csisolatin6
240    iso-ir-157
241    iso8859-10
242    iso885910
243    l6
244    latin6
245
246iso-8859-13-html
247    ISO-8859-13 { MIME* HTML* }
248    iso8859-13
249    iso885913
250
251iso-8859-14-html
252    ISO-8859-14 { MIME* HTML* }
253    iso8859-14
254    iso885914
255
256iso-8859-15-html
257    ISO-8859-15 { MIME* HTML* }
258    csisolatin9
259    iso8859-15
260    iso885915
261    iso_8859-15
262    l9
263
264iso-8859-16-html
265    ISO-8859-16 { MIME* HTML* }
266
267koi8-r-html
268    KOI8-R { MIME* HTML* }
269    cskoi8r
270    koi
271    koi8
272    koi8_r
273
274koi8-u-html
275    KOI8-U { MIME* HTML* }
276    koi8-ru
277
278macintosh-html
279    macintosh { MIME* HTML* }
280    csmacintosh
281    mac
282    x-mac-roman
283
284windows-874-html
285    windows-874 { MIME* HTML* }
286    dos-874
287    iso-8859-11
288    iso8859-11
289    iso885911
290    tis-620
291
292windows-1250-html
293    windows-1250 { MIME* HTML* }
294    cp1250
295    x-cp1250
296
297windows-1251-html
298    windows-1251 { MIME* HTML* }
299    cp1251
300    x-cp1251
301
302windows-1252-html
303    windows-1252 { MIME* HTML* }
304    ansi_x3.4-1968
305    ascii
306    cp1252
307    cp819
308    csisolatin1
309    ibm819
310    iso-8859-1
311    iso-ir-100
312    iso8859-1
313    iso88591
314    iso_8859-1
315    iso_8859-1:1987
316    l1
317    latin1
318    us-ascii
319    x-cp1252
320
321windows-1253-html
322    windows-1253 { MIME* HTML* }
323    cp1253
324    x-cp1253
325
326windows-1254-html
327    windows-1254 { MIME* HTML* }
328    cp1254
329    csisolatin5
330    iso-8859-9
331    iso-ir-148
332    iso8859-9
333    iso88599
334    iso_8859-9
335    iso_8859-9:1989
336    l5
337    latin5
338    x-cp1254
339
340windows-1255-html
341    windows-1255 { MIME* HTML* }
342    cp1255
343    x-cp1255
344
345windows-1256-html
346    windows-1256 { MIME* HTML* }
347    cp1256
348    x-cp1256
349
350windows-1257-html
351    windows-1257 { MIME* HTML* }
352    cp1257
353    x-cp1257
354
355windows-1258-html
356    windows-1258 { MIME* HTML* }
357    cp1258
358    x-cp1258
359
360x-mac-cyrillic-html
361    x-mac-cyrillic { MIME* HTML* }
362    x-mac-ukrainian
363
364# Keep GBK and GB18030 separate for now until we decide
365# what to do about them: crbug.com/339862
366# The encoding spec requires that decoding to Unicode should use GB18030
367# while encoding from Unicode should use GBK.
368
369windows-936-2000
370                       GBK { MIME* IANA* }
371                       chinese { IANA }
372                       iso-ir-58 { IANA }
373                       GB2312 { IANA MIME }
374                       GB_2312-80 { IANA }
375                       gb_2312
376                       csGB2312 { IANA }
377                       csiso58gb231280
378                       x-gbk
379
380# GB 18030 is partly algorithmic, using the MBCS converter
381gb18030 { IANA* }      gb18030 { HTML* MIME* }
382
383big5-html
384    Big5 { MIME* HTML* }
385    cn-big5
386    csbig5
387    x-x-big5
388    Big5-HKSCS
389
390euc-jp-html
391    EUC-JP { MIME* HTML* }
392    cseucpkdfmtjapanese
393    x-euc-jp
394
395ISO_2022,locale=ja,version=0
396    ISO-2022-JP { MIME* HTML* }
397    csiso2022jp
398
399shift_jis-html
400    Shift_JIS { MIME* HTML* }
401    csshiftjis
402    ms_kanji
403    ms932
404    shift-jis
405    sjis
406    windows-31j
407    x-sjis
408
409euc-kr-html
410    EUC-KR { MIME* HTML* }
411    cseuckr
412    csksc56011987
413    iso-ir-149
414    korean
415    ks_c_5601-1987
416    ks_c_5601-1989
417    ksc5601
418    ksc_5601
419    windows-949
420
421# We need to keep these aliases so that documents labelled with them
422# are converted to a single U+FFFD instead of being rendered as a gibberish.
423ISO-2022-KR { HTML* MIME* } csISO2022KR { IANA }
424ISO-2022-CN { IANA* HTML* } csISO2022CN  x-ISO-2022-CN-GB
425ISO-2022-CN-EXT { IANA* HTML* }
426HZ-GB-2312 { HTML* IANA* } HZ
427