interface for the encoding conversion functions needed for XML basic encoding and iconv() support. Related specs are rfc2044 (UTF-8 and UTF-16) F. Yergeau Alis Technologies [ISO-10646] UTF-8 and UTF-16 in Annexes [ISO-8859-1] ISO Latin-1 characters codes. [UNICODE] The Unicode Consortium, "The Unicode Standard -- Worldwide Character Encoding -- Version 1.0", Addison- Wesley, Volume 1, 1991, Volume 2, 1992. UTF-8 is described in Unicode Technical Report #4. [US-ASCII] Coded Character Set--7-bit American Standard Code for Information Interchange, ANSI X3.4-1986. Table of Contents#define ICU_PIVOT_BUF_SIZE Structure uconv_t struct _uconv_t
Enum xmlCharEncoding
Structure xmlCharEncodingHandler struct _xmlCharEncodingHandler
Typedef xmlCharEncodingHandler * xmlCharEncodingHandlerPtr
int UTF8Toisolat1 (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
int isolat1ToUTF8 (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
int xmlAddEncodingAlias (const char * name, const char * alias)
int xmlCharEncCloseFunc (xmlCharEncodingHandler * handler)
int xmlCharEncFirstLine (xmlCharEncodingHandler * handler, xmlBufferPtr out, xmlBufferPtr in)
int xmlCharEncInFunc (xmlCharEncodingHandler * handler, xmlBufferPtr out, xmlBufferPtr in)
int xmlCharEncOutFunc (xmlCharEncodingHandler * handler, xmlBufferPtr out, xmlBufferPtr in)
Function type: xmlCharEncodingInputFunc
int xmlCharEncodingInputFunc (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
Function type: xmlCharEncodingOutputFunc
int xmlCharEncodingOutputFunc (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
void xmlCleanupCharEncodingHandlers (void)
void xmlCleanupEncodingAliases (void)
int xmlDelEncodingAlias (const char * alias)
xmlCharEncoding xmlDetectCharEncoding (const unsigned char * in, int len)
xmlCharEncodingHandlerPtr xmlFindCharEncodingHandler (const char * name)
xmlCharEncodingHandlerPtr xmlGetCharEncodingHandler (xmlCharEncoding enc)
const char * xmlGetCharEncodingName (xmlCharEncoding enc)
const char * xmlGetEncodingAlias (const char * alias)
void xmlInitCharEncodingHandlers (void)
xmlCharEncodingHandlerPtr xmlNewCharEncodingHandler (const char * name, xmlCharEncodingInputFunc input, xmlCharEncodingOutputFunc output)
xmlCharEncoding xmlParseCharEncoding (const char * name)
void xmlRegisterCharEncodingHandler (xmlCharEncodingHandlerPtr handler)
Description
Macro: ICU_PIVOT_BUF_SIZE#define ICU_PIVOT_BUF_SIZE
Structure uconv_t struct _uconv_t {
UConverter * uconv : for conversion between an encoding and
UConverter * utf8 : for conversion between UTF-8 and UTF-16
UCharpivot_buf[ICU_PIVOT_BUF_SIZE] pivot_buf
UChar * pivot_source
UChar * pivot_target
} Enum xmlCharEncoding {
XML_CHAR_ENCODING_ERROR = -1 : No char encoding detected
XML_CHAR_ENCODING_NONE = 0 : No char encoding detected
XML_CHAR_ENCODING_UTF8 = 1 : UTF-8
XML_CHAR_ENCODING_UTF16LE = 2 : UTF-16 little endian
XML_CHAR_ENCODING_UTF16BE = 3 : UTF-16 big endian
XML_CHAR_ENCODING_UCS4LE = 4 : UCS-4 little endian
XML_CHAR_ENCODING_UCS4BE = 5 : UCS-4 big endian
XML_CHAR_ENCODING_EBCDIC = 6 : EBCDIC uh!
XML_CHAR_ENCODING_UCS4_2143 = 7 : UCS-4 unusual ordering
XML_CHAR_ENCODING_UCS4_3412 = 8 : UCS-4 unusual ordering
XML_CHAR_ENCODING_UCS2 = 9 : UCS-2
XML_CHAR_ENCODING_8859_1 = 10 : ISO-8859-1 ISO Latin 1
XML_CHAR_ENCODING_8859_2 = 11 : ISO-8859-2 ISO Latin 2
XML_CHAR_ENCODING_8859_3 = 12 : ISO-8859-3
XML_CHAR_ENCODING_8859_4 = 13 : ISO-8859-4
XML_CHAR_ENCODING_8859_5 = 14 : ISO-8859-5
XML_CHAR_ENCODING_8859_6 = 15 : ISO-8859-6
XML_CHAR_ENCODING_8859_7 = 16 : ISO-8859-7
XML_CHAR_ENCODING_8859_8 = 17 : ISO-8859-8
XML_CHAR_ENCODING_8859_9 = 18 : ISO-8859-9
XML_CHAR_ENCODING_2022_JP = 19 : ISO-2022-JP
XML_CHAR_ENCODING_SHIFT_JIS = 20 : Shift_JIS
XML_CHAR_ENCODING_EUC_JP = 21 : EUC-JP
XML_CHAR_ENCODING_ASCII = 22 : pure ASCII
}
Structure xmlCharEncodingHandler struct _xmlCharEncodingHandler {
char * name
xmlCharEncodingInputFunc input
xmlCharEncodingOutputFunc output
iconv_t iconv_in
iconv_t iconv_out
uconv_t * uconv_in
uconv_t * uconv_out
} Function: UTF8Toisolat1int UTF8Toisolat1 (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
Take a block of UTF-8 chars in and try to convert it to an ISO Latin 1 block of chars out.
out: | a pointer to an array of bytes to store the result | outlen: | the length of @out | in: | a pointer to an array of UTF-8 chars | inlen: | the length of @in | Returns: | the number of bytes written if success, -2 if the transcoding fails, or -1 otherwise The value of @inlen after return is the number of octets consumed if the return value is positive, else unpredictable. The value of @outlen after return is the number of octets produced. |
Function: isolat1ToUTF8int isolat1ToUTF8 (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
Take a block of ISO Latin 1 chars in and try to convert it to an UTF-8 block of chars out.
out: | a pointer to an array of bytes to store the result | outlen: | the length of @out | in: | a pointer to an array of ISO Latin 1 chars | inlen: | the length of @in | Returns: | the number of bytes written if success, or -1 otherwise The value of @inlen after return is the number of octets consumed if the return value is positive, else unpredictable. The value of @outlen after return is the number of octets produced. |
Function: xmlAddEncodingAliasint xmlAddEncodingAlias (const char * name, const char * alias)
Registers an alias @alias for an encoding named @name. Existing alias will be overwritten.
name: | the encoding name as parsed, in UTF-8 format (ASCII actually) | alias: | the alias name as parsed, in UTF-8 format (ASCII actually) | Returns: | 0 in case of success, -1 in case of error |
Function: xmlCharEncCloseFuncint xmlCharEncCloseFunc (xmlCharEncodingHandler * handler)
Generic front-end for encoding handler close function
handler: | char encoding transformation data structure | Returns: | 0 if success, or -1 in case of error |
Function: xmlCharEncFirstLineint xmlCharEncFirstLine (xmlCharEncodingHandler * handler, xmlBufferPtr out, xmlBufferPtr in)
Front-end for the encoding handler input function, but handle only the very first line, i.e. limit itself to 45 chars.
handler: | char encoding transformation data structure | out: | an xmlBuffer for the output. | in: | an xmlBuffer for the input | Returns: | the number of byte written if success, or -1 general error -2 if the transcoding fails (for *in is not valid utf8 string or the result of transformation can't fit into the encoding we want), or |
Function: xmlCharEncInFuncint xmlCharEncInFunc (xmlCharEncodingHandler * handler, xmlBufferPtr out, xmlBufferPtr in)
Generic front-end for the encoding handler input function
handler: | char encoding transformation data structure | out: | an xmlBuffer for the output. | in: | an xmlBuffer for the input | Returns: | the number of byte written if success, or -1 general error -2 if the transcoding fails (for *in is not valid utf8 string or the result of transformation can't fit into the encoding we want), or |
Function: xmlCharEncOutFuncint xmlCharEncOutFunc (xmlCharEncodingHandler * handler, xmlBufferPtr out, xmlBufferPtr in)
Generic front-end for the encoding handler output function a first call with @in == NULL has to be made firs to initiate the output in case of non-stateless encoding needing to initiate their state or the output (like the BOM in UTF16). In case of UTF8 sequence conversion errors for the given encoder, the content will be automatically remapped to a CharRef sequence.
handler: | char encoding transformation data structure | out: | an xmlBuffer for the output. | in: | an xmlBuffer for the input | Returns: | the number of byte written if success, or -1 general error -2 if the transcoding fails (for *in is not valid utf8 string or the result of transformation can't fit into the encoding we want), or |
Function type: xmlCharEncodingInputFuncFunction type: xmlCharEncodingInputFunc
int xmlCharEncodingInputFunc (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
Take a block of chars in the original encoding and try to convert it to an UTF-8 block of chars out. out: | a pointer to an array of bytes to store the UTF-8 result | outlen: | the length of @out | in: | a pointer to an array of chars in the original encoding | inlen: | the length of @in | Returns: | the number of bytes written, -1 if lack of space, or -2 if the transcoding failed. The value of @inlen after return is the number of octets consumed if the return value is positive, else unpredictiable. The value of @outlen after return is the number of octets consumed. |
Function type: xmlCharEncodingOutputFuncFunction type: xmlCharEncodingOutputFunc
int xmlCharEncodingOutputFunc (unsigned char * out, int * outlen, const unsigned char * in, int * inlen)
Take a block of UTF-8 chars in and try to convert it to another encoding. Note: a first call designed to produce heading info is called with in = NULL. If stateful this should also initialize the encoder state. out: | a pointer to an array of bytes to store the result | outlen: | the length of @out | in: | a pointer to an array of UTF-8 chars | inlen: | the length of @in | Returns: | the number of bytes written, -1 if lack of space, or -2 if the transcoding failed. The value of @inlen after return is the number of octets consumed if the return value is positive, else unpredictiable. The value of @outlen after return is the number of octets produced. |
Function: xmlCleanupCharEncodingHandlersvoid xmlCleanupCharEncodingHandlers (void)
Cleanup the memory allocated for the char encoding support, it unregisters all the encoding handlers and the aliases.
Function: xmlCleanupEncodingAliasesvoid xmlCleanupEncodingAliases (void)
Unregisters all aliases
Function: xmlDelEncodingAliasint xmlDelEncodingAlias (const char * alias)
Unregisters an encoding alias @alias
alias: | the alias name as parsed, in UTF-8 format (ASCII actually) | Returns: | 0 in case of success, -1 in case of error |
Function: xmlDetectCharEncodingxmlCharEncoding xmlDetectCharEncoding (const unsigned char * in, int len)
Guess the encoding of the entity using the first bytes of the entity content according to the non-normative appendix F of the XML-1.0 recommendation.
in: | a pointer to the first bytes of the XML entity, must be at least 2 bytes long (at least 4 if encoding is UTF4 variant). | len: | pointer to the length of the buffer | Returns: | one of the XML_CHAR_ENCODING_... values. |
Function: xmlFindCharEncodingHandlerxmlCharEncodingHandlerPtr xmlFindCharEncodingHandler (const char * name)
Search in the registered set the handler able to read/write that encoding.
name: | a string describing the char encoding. | Returns: | the handler or NULL if not found |
Function: xmlGetCharEncodingHandlerxmlCharEncodingHandlerPtr xmlGetCharEncodingHandler (xmlCharEncoding enc)
Search in the registered set the handler able to read/write that encoding.
Function: xmlGetCharEncodingNameconst char * xmlGetCharEncodingName (xmlCharEncoding enc)
The "canonical" name for XML encoding. C.f. http://www.w3.org/TR/REC-xml#charencoding Section 4.3.3 Character Encoding in Entities
enc: | the encoding | Returns: | the canonical name for the given encoding |
Function: xmlGetEncodingAliasconst char * xmlGetEncodingAlias (const char * alias)
Lookup an encoding name for the given alias.
alias: | the alias name as parsed, in UTF-8 format (ASCII actually) | Returns: | NULL if not found, otherwise the original name |
Function: xmlInitCharEncodingHandlersvoid xmlInitCharEncodingHandlers (void)
Initialize the char encoding support, it registers the default encoding supported. NOTE: while public, this function usually doesn't need to be called in normal processing.
Function: xmlNewCharEncodingHandlerxmlCharEncodingHandlerPtr xmlNewCharEncodingHandler (const char * name, xmlCharEncodingInputFunc input, xmlCharEncodingOutputFunc output)
Create and registers an xmlCharEncodingHandler.
Function: xmlParseCharEncodingxmlCharEncoding xmlParseCharEncoding (const char * name)
Compare the string to the encoding schemes already known. Note that the comparison is case insensitive accordingly to the section [XML] 4.3.3 Character Encoding in Entities.
name: | the encoding name as parsed, in UTF-8 format (ASCII actually) | Returns: | one of the XML_CHAR_ENCODING_... values or XML_CHAR_ENCODING_NONE if not recognized. |
Function: xmlRegisterCharEncodingHandlervoid xmlRegisterCharEncodingHandler (xmlCharEncodingHandlerPtr handler)
Register the char encoding handler, surprising, isn't it ?
Daniel Veillard |