1# ****************************************************************************** 2# * 3# * Copyright (C) 1995-2014, International Business Machines 4# * Corporation and others. All Rights Reserved. 5# * 6# ****************************************************************************** 7 8# If this converter alias table looks very confusing, a much easier to 9# understand view can be found at this demo: 10# http://demo.icu-project.org/icu-bin/convexp 11 12# IMPORTANT NOTE 13# 14# This file is not read directly by ICU. If you change it, you need to 15# run gencnval, and eventually run pkgdata to update the representation that 16# ICU uses for aliases. The gencnval tool will normally compile this file into 17# cnvalias.icu. The gencnval -v verbose option will help you when you edit 18# this file. 19 20# Please be friendly to the rest of us that edit this table by 21# keeping this table free of tabs. 22 23# This is an alias file used by the character set converter. 24# A lot of converter information can be found in unicode/ucnv.h, but here 25# is more information about this file. 26# 27# If you are adding a new converter to this list and want to include it in the 28# icu data library, please be sure to add an entry to the appropriate ucm*.mk file 29# (see ucmfiles.mk for more information). 30# 31# Here is the file format using BNF-like syntax: 32# 33# converterTable ::= tags { converterLine* } 34# converterLine ::= converterName [ tags ] { taggedAlias* }'\n' 35# taggedAlias ::= alias [ tags ] 36# tags ::= '{' { tag+ } '}' 37# tag ::= standard['*'] 38# converterName ::= [0-9a-zA-Z:_'-']+ 39# alias ::= converterName 40# 41# Except for the converter name, aliases are case insensitive. 42# Names are separated by whitespace. 43# Line continuation and comment sytax are similar to the GNU make syntax. 44# Any lines beginning with whitespace (e.g. U+0020 SPACE or U+0009 HORIZONTAL 45# TABULATION) are presumed to be a continuation of the previous line. 46# The # symbol starts a comment and the comment continues till the end of 47# the line. 48# 49# The converter 50# 51# All names can be tagged by including a space-separated list of tags in 52# curly braces, as in ISO_8859-1:1987{IANA*} iso-8859-1 { MIME* } or 53# some-charset{MIME* IANA*}. The order of tags does not matter, and 54# whitespace is allowed between the tagged name and the tags list. 55# 56# The tags can be used to get standard names using ucnv_getStandardName(). 57# 58# The complete list of recognized tags used in this file is defined in 59# the affinity list near the beginning of the file. 60# 61# The * after the standard tag denotes that the previous alias is the 62# preferred (default) charset name for that standard. There can only 63# be one of these default charset names per converter. 64 65 66 67# The world is getting more complicated... 68# Supporting XML parsers, HTML, MIME, and similar applications 69# that mark encodings with a charset name can be difficult. 70# Many of these applications and operating systems will update 71# their codepages over time. 72 73# It means that a new codepage, one that differs from an 74# old one by changing a code point, e.g., to the Euro sign, 75# must not get an old alias, because it would mean that 76# old files with this alias would be interpreted differently. 77 78# If an codepage gets updated by assigning characters to previously 79# unassigned code points, then a new name is not necessary. 80# Also, some codepages map unassigned codepage byte values 81# to the same numbers in Unicode for roundtripping. It may be 82# industry practice to keep the encoding name in such a case, too 83# (example: Windows codepages). 84 85# The aliases listed in the list of character sets 86# that is maintained by the IANA (http://www.iana.org/) must 87# not be changed to mean encodings different from what this 88# list shows. Currently, the IANA list is at 89# http://www.iana.org/assignments/character-sets 90# It should also be mentioned that the exact mapping table used for each 91# IANA names usually isn't specified. This means that some other applications 92# and operating systems are left to interpret the exact mappings for the 93# underspecified aliases. For instance, Shift-JIS on a Solaris platform 94# may be different from Shift-JIS on a Windows platform. This is why 95# some of the aliases can be tagged to differentiate different mapping 96# tables with the same alias. If an alias is given to more than one converter, 97# it is considered to be an ambiguous alias, and the affinity list will 98# choose the converter to use when a standard isn't specified with the alias. 99 100# Name matching is case-insensitive. Also, dashes '-', underscores '_' 101# and spaces ' ' are ignored in names (thus cs-iso_latin-1, csisolatin1 102# and "cs iso latin 1" are the same). 103# However, the names in the left column are directly file names 104# or names of algorithmic converters, and their case must not 105# be changed - or else code and/or file names must also be changed. 106# For example, the converter ibm-921 is expected to be the file ibm-921.cnv. 107 108 109 110# The immediately following list is the affinity list of supported standard tags. 111# When multiple converters have the same alias under different standards, 112# the standard nearest to the top of this list with that alias will 113# be the first converter that will be opened. The ordering of the aliases 114# after this affinity list does not affect the preferred alias, but it may 115# affect the order of the returned list of aliases for a given converter. 116# 117# The general ordering is from specific and frequently used to more general 118# or rarely used at the bottom. 119{ 120 UTR22 # Name format specified by http://www.unicode.org/unicode/reports/tr22/ 121 HTML # WHATWG's encoding spec; https://encoding.spec.whatwg.org 122 IANA # Source: http://www.iana.org/assignments/character-sets 123 MIME # Source: http://www.iana.org/assignments/character-sets 124 } 125 126UTF-8 { MIME* HTML* } 127 unicode-1-1-utf-8 128 utf8 129 130utf-16be { MIME* HTML* } 131 132utf-16le { MIME* HTML* } 133 utf-16 134 135ibm866-html 136 IBM866 { MIME* HTML* } 137 866 138 cp866 139 csibm866 140 141iso-8859-2-html 142 ISO-8859-2 { MIME* HTML* } 143 csisolatin2 144 iso-ir-101 145 iso8859-2 146 iso88592 147 iso_8859-2 148 iso_8859-2:1987 149 l2 150 latin2 151 152iso-8859-3-html 153 ISO-8859-3 { MIME* HTML* } 154 csisolatin3 155 iso-ir-109 156 iso8859-3 157 iso88593 158 iso_8859-3 159 iso_8859-3:1988 160 l3 161 latin3 162 163iso-8859-4-html 164 ISO-8859-4 { MIME* HTML* } 165 csisolatin4 166 iso-ir-110 167 iso8859-4 168 iso88594 169 iso_8859-4 170 iso_8859-4:1988 171 l4 172 latin4 173 174iso-8859-5-html 175 ISO-8859-5 { MIME* HTML* } 176 csisolatincyrillic 177 cyrillic 178 iso-ir-144 179 iso8859-5 180 iso88595 181 iso_8859-5 182 iso_8859-5:1988 183 184iso-8859-6-html 185 ISO-8859-6 { MIME* HTML* } 186 arabic 187 asmo-708 188 csiso88596e 189 csiso88596i 190 csisolatinarabic 191 ecma-114 192 iso-8859-6-e 193 iso-8859-6-i 194 iso-ir-127 195 iso8859-6 196 iso88596 197 iso_8859-6 198 iso_8859-6:1987 199 200iso-8859-7-html 201 ISO-8859-7 { MIME* HTML* } 202 csisolatingreek 203 ecma-118 204 elot_928 205 greek 206 greek8 207 iso-ir-126 208 iso8859-7 209 iso88597 210 iso_8859-7 211 iso_8859-7:1987 212 sun_eu_greek 213 214iso-8859-8-html 215 ISO-8859-8 { MIME* HTML* } 216 csiso88598e { MIME } 217 csisolatinhebrew 218 hebrew 219 ISO-8859-8-E 220 ISO-8859-8-I 221 iso-ir-138 222 iso8859-8 223 iso88598 224 iso_8859-8 225 iso_8859-8:1988 226 visual 227 # adding this one leads to a failure in encoding-labels.html 228# csiso88598i 229 230 231# This alias has to be dealt with by TextCodecICU unless 232# multiple encodings can share a single mapping table. 233#ISO-8859-8-I { MIME* HTML* } 234# csiso88598i 235# logical 236 237iso-8859-10-html 238 ISO-8859-10 { MIME* HTML* } 239 csisolatin6 240 iso-ir-157 241 iso8859-10 242 iso885910 243 l6 244 latin6 245 246iso-8859-13-html 247 ISO-8859-13 { MIME* HTML* } 248 iso8859-13 249 iso885913 250 251iso-8859-14-html 252 ISO-8859-14 { MIME* HTML* } 253 iso8859-14 254 iso885914 255 256iso-8859-15-html 257 ISO-8859-15 { MIME* HTML* } 258 csisolatin9 259 iso8859-15 260 iso885915 261 iso_8859-15 262 l9 263 264iso-8859-16-html 265 ISO-8859-16 { MIME* HTML* } 266 267koi8-r-html 268 KOI8-R { MIME* HTML* } 269 cskoi8r 270 koi 271 koi8 272 koi8_r 273 274koi8-u-html 275 KOI8-U { MIME* HTML* } 276 koi8-ru 277 278macintosh-html 279 macintosh { MIME* HTML* } 280 csmacintosh 281 mac 282 x-mac-roman 283 284windows-874-html 285 windows-874 { MIME* HTML* } 286 dos-874 287 iso-8859-11 288 iso8859-11 289 iso885911 290 tis-620 291 292windows-1250-html 293 windows-1250 { MIME* HTML* } 294 cp1250 295 x-cp1250 296 297windows-1251-html 298 windows-1251 { MIME* HTML* } 299 cp1251 300 x-cp1251 301 302windows-1252-html 303 windows-1252 { MIME* HTML* } 304 ansi_x3.4-1968 305 ascii 306 cp1252 307 cp819 308 csisolatin1 309 ibm819 310 iso-8859-1 311 iso-ir-100 312 iso8859-1 313 iso88591 314 iso_8859-1 315 iso_8859-1:1987 316 l1 317 latin1 318 us-ascii 319 x-cp1252 320 321windows-1253-html 322 windows-1253 { MIME* HTML* } 323 cp1253 324 x-cp1253 325 326windows-1254-html 327 windows-1254 { MIME* HTML* } 328 cp1254 329 csisolatin5 330 iso-8859-9 331 iso-ir-148 332 iso8859-9 333 iso88599 334 iso_8859-9 335 iso_8859-9:1989 336 l5 337 latin5 338 x-cp1254 339 340windows-1255-html 341 windows-1255 { MIME* HTML* } 342 cp1255 343 x-cp1255 344 345windows-1256-html 346 windows-1256 { MIME* HTML* } 347 cp1256 348 x-cp1256 349 350windows-1257-html 351 windows-1257 { MIME* HTML* } 352 cp1257 353 x-cp1257 354 355windows-1258-html 356 windows-1258 { MIME* HTML* } 357 cp1258 358 x-cp1258 359 360x-mac-cyrillic-html 361 x-mac-cyrillic { MIME* HTML* } 362 x-mac-ukrainian 363 364# Keep GBK and GB18030 separate for now until we decide 365# what to do about them: crbug.com/339862 366# The encoding spec requires that decoding to Unicode should use GB18030 367# while encoding from Unicode should use GBK. 368 369windows-936-2000 370 GBK { MIME* IANA* } 371 chinese { IANA } 372 iso-ir-58 { IANA } 373 GB2312 { IANA MIME } 374 GB_2312-80 { IANA } 375 gb_2312 376 csGB2312 { IANA } 377 csiso58gb231280 378 x-gbk 379 380# GB 18030 is partly algorithmic, using the MBCS converter 381gb18030 { IANA* } gb18030 { HTML* MIME* } 382 383big5-html 384 Big5 { MIME* HTML* } 385 cn-big5 386 csbig5 387 x-x-big5 388 Big5-HKSCS 389 390euc-jp-html 391 EUC-JP { MIME* HTML* } 392 cseucpkdfmtjapanese 393 x-euc-jp 394 395ISO_2022,locale=ja,version=0 396 ISO-2022-JP { MIME* HTML* } 397 csiso2022jp 398 399shift_jis-html 400 Shift_JIS { MIME* HTML* } 401 csshiftjis 402 ms_kanji 403 ms932 404 shift-jis 405 sjis 406 windows-31j 407 x-sjis 408 409euc-kr-html 410 EUC-KR { MIME* HTML* } 411 cseuckr 412 csksc56011987 413 iso-ir-149 414 korean 415 ks_c_5601-1987 416 ks_c_5601-1989 417 ksc5601 418 ksc_5601 419 windows-949 420 421# We need to keep these aliases so that documents labelled with them 422# are converted to a single U+FFFD instead of being rendered as a gibberish. 423ISO-2022-KR { HTML* MIME* } csISO2022KR { IANA } 424ISO-2022-CN { IANA* HTML* } csISO2022CN x-ISO-2022-CN-GB 425ISO-2022-CN-EXT { IANA* HTML* } 426HZ-GB-2312 { HTML* IANA* } HZ 427