# ******************************************************************************* # * # * Copyright (C) 1995-2001, International Business Machines # * Corporation and others. All Rights Reserved. # * # ******************************************************************************* # IMPORTANT NOTE # # This file is not read directly by ICU. If you change it, you need to # run gencnval, and eventually pkgdata to update the representation that # ICU uses for aliases. # This is an alias file used by the character set converter. # # Format: # # Actual file name || Algorithm name alias1 alias2 ... # # except for column 1 (file names) case insensitive. Names are separated # by whitespace. # # All names can be tagged by including a space-separated list of tags in # curly braces, as in ISO_8859-1:1987{IANA} iso-8859-1 { MIME } or # some-charset{MIME IANA}. The order of tags does not matter, and # whitespace is allowed between the tagged name and the tags list. # # The tags can be used to get standard names using ucnv_getStandardName(). # # Here is a list of tags used in this file: # # IANA The IANA charset name, as documented in RFC 1700. # MIME The MIME charset name, used for content type tagging. # The world is getting more complicated... # Supporting XML parsers, HTML, MIME, and similar applications # that mark encodings with unique charset names, we are forced to # make this table much more static than before. # It means that a new encoding, one that differs from an # old one by changing a code point, e.g., to the Euro sign, # must not get an old alias, because it would mean that # old files with this alias would be interpreted differently. # If an encoding gets updated by assigning characters to previously # unassigned code points, then a new name is not necessary. # Also, some codepages map unassigned codepage byte values # to the same numbers in Unicode for roundtripping. It may be # industry practice to keep the encoding name in such a case, too # (example: Windows codepages). # Especially, the aliases listed in the list of character sets # that is maintained by the IANA (http://www.iana.org/) must # not be changed to mean encodings different from what this # list shows. # Currently, the IANA list is at # http://www.isi.edu/in-notes/iana/assignments/character-sets # Name matching is case-insensitive. Also, dashes '-', underscores '_' # and spaces ' ' are ignored in names (thus cs-iso-latin-1 and csisolatin1 # are the same). # However, the names in the left column are directly file names # or names of algorithmic converters, and their case must not # be changed - or else code and/or file names must also be changed. # Fully algorithmic converters UTF-8 { MIME } ibm-1208 cp1208 UTF-16BE { MIME } UTF16_BigEndian x-utf-16be UTF-16LE { MIME } UTF16_LittleEndian x-utf-16le # The ICU UTF-16 converter uses the current platform's endianness. # It does not autodetect endianness from a BOM. UTF-16 { MIME } UTF16_PlatformEndian ISO-10646-UCS-2 { IANA } csUnicode ibm-17584 ibm-13488 ibm-1200 cp1200 ucs-2 UTF16_OppositeEndian UTF-32BE { MIME } UTF32_BigEndian UTF-32LE { MIME } UTF32_LittleEndian # The ICU UTF-32 converter uses the current platform's endianness. # It does not autodetect endianness from a BOM. UTF-32 { MIME } UTF32_PlatformEndian ISO-10646-UCS-4 { IANA } csUCS4 ucs-4 ibm-1232 UTF32_OppositeEndian UTF-7 { IANA MIME } # On UTF-7: # RFC 2152 (http://www.imc.org/rfc2152) allows to encode some US-ASCII # characters directly or in base64. Especially, the characters in set O # as defined in the RFC (!"#$%&*;<=>@[]^_`{|}) may be encoded directly but are not # allowed in, e.g., email headers. # By default, the ICU UTF-7 converter encodes set O directly. # By choosing the option "version=1", set O will be escaped instead. # For example: # utf7Converter=ucnv_open("UTF-7,version=1"); SCSU { IANA } ISO-8859-1 { MIME } LATIN_1 ibm-819 cp819 latin1 8859-1 csisolatin1 iso-ir-100 cp367 ISO_8859-1:1987 { IANA } l1 ANSI_X3.110-1983 819 #!!!!! There's whole lot of names for this US-ASCII { MIME } ascii ascii-7 ANSI_X3.4-1968 { IANA } ANSI_X3.4-1986 ISO_646.irv:1991 iso646-us us csASCII 646 iso-ir-6