1<!-- ##### SECTION Title ##### --> 2Unicode Manipulation 3 4<!-- ##### SECTION Short_Description ##### --> 5functions operating on Unicode characters and UTF-8 strings 6 7<!-- ##### SECTION Long_Description ##### --> 8<para> 9This section describes a number of functions for dealing with 10Unicode characters and strings. There are analogues of the 11traditional <filename>ctype.h</filename> character classification 12and case conversion functions, UTF-8 analogues of some string utility 13functions, functions to perform normalization, case conversion and 14collation on UTF-8 strings and finally functions to convert between 15the UTF-8, UTF-16 and UCS-4 encodings of Unicode. 16</para> 17 18<para> 19The implementations of the Unicode functions in GLib are based 20on the Unicode Character Data tables, which are available from 21<ulink url="http://www.unicode.org/">www.unicode.org</ulink>. 22GLib 2.8 supports Unicode 4.0, GLib 2.10 supports Unicode 4.1, 23GLib 2.12 supports Unicode 5.0, GLib 2.16.3 supports Unicode 5.1. 24</para> 25 26<!-- ##### SECTION See_Also ##### --> 27<para> 28<variablelist> 29 30<varlistentry> 31<term>g_locale_to_utf8(), g_locale_from_utf8()</term> 32<listitem><para> 33Convenience functions for converting between UTF-8 and the locale encoding. 34</para></listitem> 35</varlistentry> 36 37</variablelist> 38</para> 39 40<!-- ##### SECTION Stability_Level ##### --> 41 42 43<!-- ##### TYPEDEF gunichar ##### --> 44<para> 45A type which can hold any UTF-32 or UCS-4 character code, also known 46as a Unicode code point. 47</para> 48<para> 49To print/scan values of this type to/from text you need to convert 50to/from UTF-8, using g_utf32_to_utf8()/g_utf8_to_utf32(). 51</para> 52<para> 53To print/scan values of this type as integer, use 54%G_GINT32_MODIFIER and/or %G_GUINT32_FORMAT. 55</para> 56<para> 57The notation to express a Unicode code point in running text is as a 58hexadecimal number with four to six digits and uppercase letters, prefixed 59by the string "U+". Leading zeros are omitted, unless the code point would 60have fewer than four hexadecimal digits. 61For example, "U+0041 LATIN CAPITAL LETTER A". 62To print a code point in the U+-notation, use the format string 63"U+%04"G_GINT32_FORMAT"X". 64To scan, use the format string "U+%06"G_GINT32_FORMAT"X". 65<informalexample> 66<programlisting> 67gunichar c; 68sscanf ("U+0041", "U+%06"G_GINT32_FORMAT"X", &c) 69g_print ("Read U+%04"G_GINT32_FORMAT"X", c); 70</programlisting> 71</informalexample> 72</para> 73 74 75<!-- ##### TYPEDEF gunichar2 ##### --> 76<para> 77A type which can hold any UTF-16 code 78point<footnote id="utf16_surrogate_pairs">UTF-16 also has so called 79<firstterm>surrogate pairs</firstterm> to encode characters beyond the 80BMP as pairs of 16bit numbers. Surrogate pairs cannot be stored in a 81single gunichar2 field, but all GLib functions accepting gunichar2 arrays 82will correctly interpret surrogate pairs.</footnote>. 83</para> 84<para> 85To print/scan values of this type to/from text you need to convert 86to/from UTF-8, using g_utf16_to_utf8()/g_utf8_to_utf16(). 87</para> 88<para> 89To print/scan values of this type as integer, use 90%G_GINT16_MODIFIER and/or %G_GUINT16_FORMAT. 91</para> 92 93 94<!-- ##### FUNCTION g_unichar_validate ##### --> 95<para> 96 97</para> 98 99@ch: 100@Returns: 101 102 103<!-- ##### FUNCTION g_unichar_isalnum ##### --> 104<para> 105 106</para> 107 108@c: 109@Returns: 110 111 112<!-- ##### FUNCTION g_unichar_isalpha ##### --> 113<para> 114 115</para> 116 117@c: 118@Returns: 119 120 121<!-- ##### FUNCTION g_unichar_iscntrl ##### --> 122<para> 123 124</para> 125 126@c: 127@Returns: 128 129 130<!-- ##### FUNCTION g_unichar_isdefined ##### --> 131<para> 132 133</para> 134 135@c: 136@Returns: 137 138 139<!-- ##### FUNCTION g_unichar_isdigit ##### --> 140<para> 141 142</para> 143 144@c: 145@Returns: 146 147 148<!-- ##### FUNCTION g_unichar_isgraph ##### --> 149<para> 150 151</para> 152 153@c: 154@Returns: 155 156 157<!-- ##### FUNCTION g_unichar_islower ##### --> 158<para> 159 160</para> 161 162@c: 163@Returns: 164 165 166<!-- ##### FUNCTION g_unichar_ismark ##### --> 167<para> 168 169</para> 170 171@c: 172@Returns: 173 174 175<!-- ##### FUNCTION g_unichar_isprint ##### --> 176<para> 177 178</para> 179 180@c: 181@Returns: 182 183 184<!-- ##### FUNCTION g_unichar_ispunct ##### --> 185<para> 186 187</para> 188 189@c: 190@Returns: 191 192 193<!-- ##### FUNCTION g_unichar_isspace ##### --> 194<para> 195 196</para> 197 198@c: 199@Returns: 200 201 202<!-- ##### FUNCTION g_unichar_istitle ##### --> 203<para> 204 205</para> 206 207@c: 208@Returns: 209 210 211<!-- ##### FUNCTION g_unichar_isupper ##### --> 212<para> 213 214</para> 215 216@c: 217@Returns: 218 219 220<!-- ##### FUNCTION g_unichar_isxdigit ##### --> 221<para> 222 223</para> 224 225@c: 226@Returns: 227 228 229<!-- ##### FUNCTION g_unichar_iswide ##### --> 230<para> 231 232</para> 233 234@c: 235@Returns: 236 237 238<!-- ##### FUNCTION g_unichar_iswide_cjk ##### --> 239<para> 240 241</para> 242 243@c: 244@Returns: 245 246 247<!-- ##### FUNCTION g_unichar_iszerowidth ##### --> 248<para> 249 250</para> 251 252@c: 253@Returns: 254 255 256<!-- ##### FUNCTION g_unichar_toupper ##### --> 257<para> 258 259</para> 260 261@c: 262@Returns: 263 264 265<!-- ##### FUNCTION g_unichar_tolower ##### --> 266<para> 267 268</para> 269 270@c: 271@Returns: 272 273 274<!-- ##### FUNCTION g_unichar_totitle ##### --> 275<para> 276 277</para> 278 279@c: 280@Returns: 281 282 283<!-- ##### FUNCTION g_unichar_digit_value ##### --> 284<para> 285 286</para> 287 288@c: 289@Returns: 290 291 292<!-- ##### FUNCTION g_unichar_xdigit_value ##### --> 293<para> 294 295</para> 296 297@c: 298@Returns: 299 300 301<!-- ##### ENUM GUnicodeType ##### --> 302<para> 303These are the possible character classifications from the 304Unicode specification. 305See <ulink url="http://www.unicode.org/Public/UNIDATA/UnicodeData.html" 306>http://www.unicode.org/Public/UNIDATA/UnicodeData.html</ulink>. 307</para> 308 309@G_UNICODE_CONTROL: General category "Other, Control" (Cc) 310@G_UNICODE_FORMAT: General category "Other, Format" (Cf) 311@G_UNICODE_UNASSIGNED: General category "Other, Not Assigned" (Cn) 312@G_UNICODE_PRIVATE_USE: General category "Other, Private Use" (Co) 313@G_UNICODE_SURROGATE: General category "Other, Surrogate" (Cs) 314@G_UNICODE_LOWERCASE_LETTER: General category "Letter, Lowercase" (Ll) 315@G_UNICODE_MODIFIER_LETTER: General category "Letter, Modifier" (Lm) 316@G_UNICODE_OTHER_LETTER: General category "Letter, Other" (Lo) 317@G_UNICODE_TITLECASE_LETTER: General category "Letter, Titlecase" (Lt) 318@G_UNICODE_UPPERCASE_LETTER: General category "Letter, Uppercase" (Lu) 319@G_UNICODE_COMBINING_MARK: General category "Mark, Spacing Combining" (Mc) 320@G_UNICODE_ENCLOSING_MARK: General category "Mark, Enclosing" (Me) 321@G_UNICODE_NON_SPACING_MARK: General category "Mark, Nonspacing" (Mn) 322@G_UNICODE_DECIMAL_NUMBER: General category "Number, Decimal Digit" (Nd) 323@G_UNICODE_LETTER_NUMBER: General category "Number, Letter" (Nl) 324@G_UNICODE_OTHER_NUMBER: General category "Number, Other" (No) 325@G_UNICODE_CONNECT_PUNCTUATION: General category "Punctuation, Connector" (Pc) 326@G_UNICODE_DASH_PUNCTUATION: General category "Punctuation, Dash" (Pd) 327@G_UNICODE_CLOSE_PUNCTUATION: General category "Punctuation, Close" (Pe) 328@G_UNICODE_FINAL_PUNCTUATION: General category "Punctuation, Final quote" (Pf) 329@G_UNICODE_INITIAL_PUNCTUATION: General category "Punctuation, Initial quote" (Pi) 330@G_UNICODE_OTHER_PUNCTUATION: General category "Punctuation, Other" (Po) 331@G_UNICODE_OPEN_PUNCTUATION: General category "Punctuation, Open" (Ps) 332@G_UNICODE_CURRENCY_SYMBOL: General category "Symbol, Currency" (Sc) 333@G_UNICODE_MODIFIER_SYMBOL: General category "Symbol, Modifier" (Sk) 334@G_UNICODE_MATH_SYMBOL: General category "Symbol, Math" (Sm) 335@G_UNICODE_OTHER_SYMBOL: General category "Symbol, Other" (So) 336@G_UNICODE_LINE_SEPARATOR: General category "Separator, Line" (Zl) 337@G_UNICODE_PARAGRAPH_SEPARATOR: General category "Separator, Paragraph" (Zp) 338@G_UNICODE_SPACE_SEPARATOR: General category "Separator, Space" (Zs) 339 340<!-- ##### FUNCTION g_unichar_type ##### --> 341<para> 342 343</para> 344 345@c: 346@Returns: 347 348 349<!-- ##### ENUM GUnicodeBreakType ##### --> 350<para> 351These are the possible line break classifications. 352The five Hangul types were added in Unicode 4.1, so, has been 353introduced in GLib 2.10. Note that new types may be added in the future. 354Applications should be ready to handle unknown values. 355They may be regarded as %G_UNICODE_BREAK_UNKNOWN. 356See <ulink url="http://www.unicode.org/unicode/reports/tr14/" 357>http://www.unicode.org/unicode/reports/tr14/</ulink>. 358</para> 359 360@G_UNICODE_BREAK_MANDATORY: Mandatory Break (BK) 361@G_UNICODE_BREAK_CARRIAGE_RETURN: Carriage Return (CR) 362@G_UNICODE_BREAK_LINE_FEED: Line Feed (LF) 363@G_UNICODE_BREAK_COMBINING_MARK: Attached Characters and Combining Marks (CM) 364@G_UNICODE_BREAK_SURROGATE: Surrogates (SG) 365@G_UNICODE_BREAK_ZERO_WIDTH_SPACE: Zero Width Space (ZW) 366@G_UNICODE_BREAK_INSEPARABLE: Inseparable (IN) 367@G_UNICODE_BREAK_NON_BREAKING_GLUE: Non-breaking ("Glue") (GL) 368@G_UNICODE_BREAK_CONTINGENT: Contingent Break Opportunity (CB) 369@G_UNICODE_BREAK_SPACE: Space (SP) 370@G_UNICODE_BREAK_AFTER: Break Opportunity After (BA) 371@G_UNICODE_BREAK_BEFORE: Break Opportunity Before (BB) 372@G_UNICODE_BREAK_BEFORE_AND_AFTER: Break Opportunity Before and After (B2) 373@G_UNICODE_BREAK_HYPHEN: Hyphen (HY) 374@G_UNICODE_BREAK_NON_STARTER: Nonstarter (NS) 375@G_UNICODE_BREAK_OPEN_PUNCTUATION: Opening Punctuation (OP) 376@G_UNICODE_BREAK_CLOSE_PUNCTUATION: Closing Punctuation (CL) 377@G_UNICODE_BREAK_QUOTATION: Ambiguous Quotation (QU) 378@G_UNICODE_BREAK_EXCLAMATION: Exclamation/Interrogation (EX) 379@G_UNICODE_BREAK_IDEOGRAPHIC: Ideographic (ID) 380@G_UNICODE_BREAK_NUMERIC: Numeric (NU) 381@G_UNICODE_BREAK_INFIX_SEPARATOR: Infix Separator (Numeric) (IS) 382@G_UNICODE_BREAK_SYMBOL: Symbols Allowing Break After (SY) 383@G_UNICODE_BREAK_ALPHABETIC: Ordinary Alphabetic and Symbol Characters (AL) 384@G_UNICODE_BREAK_PREFIX: Prefix (Numeric) (PR) 385@G_UNICODE_BREAK_POSTFIX: Postfix (Numeric) (PO) 386@G_UNICODE_BREAK_COMPLEX_CONTEXT: Complex Content Dependent (South East Asian) (SA) 387@G_UNICODE_BREAK_AMBIGUOUS: Ambiguous (Alphabetic or Ideographic) (AI) 388@G_UNICODE_BREAK_UNKNOWN: Unknown (XX) 389@G_UNICODE_BREAK_NEXT_LINE: Next Line (NL) 390@G_UNICODE_BREAK_WORD_JOINER: Word Joiner (WJ) 391@G_UNICODE_BREAK_HANGUL_L_JAMO: Hangul L Jamo (JL) 392@G_UNICODE_BREAK_HANGUL_V_JAMO: Hangul V Jamo (JV) 393@G_UNICODE_BREAK_HANGUL_T_JAMO: Hangul T Jamo (JT) 394@G_UNICODE_BREAK_HANGUL_LV_SYLLABLE: Hangul LV Syllable (H2) 395@G_UNICODE_BREAK_HANGUL_LVT_SYLLABLE: Hangul LVT Syllable (H3) 396 397<!-- ##### FUNCTION g_unichar_break_type ##### --> 398<para> 399 400</para> 401 402@c: 403@Returns: 404 405 406<!-- ##### FUNCTION g_unichar_combining_class ##### --> 407<para> 408 409</para> 410 411@uc: 412@Returns: 413 414 415<!-- ##### FUNCTION g_unicode_canonical_ordering ##### --> 416<para> 417 418</para> 419 420@string: 421@len: 422 423 424<!-- ##### FUNCTION g_unicode_canonical_decomposition ##### --> 425<para> 426 427</para> 428 429@ch: 430@result_len: 431@Returns: 432 433 434<!-- ##### FUNCTION g_unichar_get_mirror_char ##### --> 435<para> 436 437</para> 438 439@ch: 440@mirrored_ch: 441@Returns: 442 443 444<!-- ##### ENUM GUnicodeScript ##### --> 445<para> 446The #GUnicodeScript enumeration identifies different writing 447systems. The values correspond to the names as defined in the 448Unicode standard. The enumeration has been added in GLib 2.14, 449and is interchangeable with #PangoScript. 450Note that new types may be added in the future. Applications 451should be ready to handle unknown values. 452See <ulink 453url="http://www.unicode.org/reports/tr24/">Unicode Standard Annex 454#24: Script names</ulink>. 455</para> 456 457@G_UNICODE_SCRIPT_INVALID_CODE: a value never returned from g_unichar_get_script() 458@G_UNICODE_SCRIPT_COMMON: a character used by multiple different scripts 459@G_UNICODE_SCRIPT_INHERITED: a mark glyph that takes its script from the 460 base glyph to which it is attached 461@G_UNICODE_SCRIPT_ARABIC: Arabic 462@G_UNICODE_SCRIPT_ARMENIAN: Armenian 463@G_UNICODE_SCRIPT_BENGALI: Bengali 464@G_UNICODE_SCRIPT_BOPOMOFO: Bopomofo 465@G_UNICODE_SCRIPT_CHEROKEE: Cherokee 466@G_UNICODE_SCRIPT_COPTIC: Coptic 467@G_UNICODE_SCRIPT_CYRILLIC: Cyrillic 468@G_UNICODE_SCRIPT_DESERET: Deseret 469@G_UNICODE_SCRIPT_DEVANAGARI: Devanagari 470@G_UNICODE_SCRIPT_ETHIOPIC: Ethiopic 471@G_UNICODE_SCRIPT_GEORGIAN: Georgian 472@G_UNICODE_SCRIPT_GOTHIC: Gothic 473@G_UNICODE_SCRIPT_GREEK: Greek 474@G_UNICODE_SCRIPT_GUJARATI: Gujarati 475@G_UNICODE_SCRIPT_GURMUKHI: Gurmukhi 476@G_UNICODE_SCRIPT_HAN: Han 477@G_UNICODE_SCRIPT_HANGUL: Hangul 478@G_UNICODE_SCRIPT_HEBREW: Hebrew 479@G_UNICODE_SCRIPT_HIRAGANA: Hiragana 480@G_UNICODE_SCRIPT_KANNADA: Kannada 481@G_UNICODE_SCRIPT_KATAKANA: Katakana 482@G_UNICODE_SCRIPT_KHMER: Khmer 483@G_UNICODE_SCRIPT_LAO: Lao 484@G_UNICODE_SCRIPT_LATIN: Latin 485@G_UNICODE_SCRIPT_MALAYALAM: Malayalam 486@G_UNICODE_SCRIPT_MONGOLIAN: Mongolian 487@G_UNICODE_SCRIPT_MYANMAR: Myanmar 488@G_UNICODE_SCRIPT_OGHAM: Ogham 489@G_UNICODE_SCRIPT_OLD_ITALIC: Old Italic 490@G_UNICODE_SCRIPT_ORIYA: Oriya 491@G_UNICODE_SCRIPT_RUNIC: Runic 492@G_UNICODE_SCRIPT_SINHALA: Sinhala 493@G_UNICODE_SCRIPT_SYRIAC: Syriac 494@G_UNICODE_SCRIPT_TAMIL: Tamil 495@G_UNICODE_SCRIPT_TELUGU: Telugu 496@G_UNICODE_SCRIPT_THAANA: Thaana 497@G_UNICODE_SCRIPT_THAI: Thai 498@G_UNICODE_SCRIPT_TIBETAN: Tibetan 499@G_UNICODE_SCRIPT_CANADIAN_ABORIGINAL: 500 Canadian Aboriginal 501@G_UNICODE_SCRIPT_YI: Yi 502@G_UNICODE_SCRIPT_TAGALOG: Tagalog 503@G_UNICODE_SCRIPT_HANUNOO: Hanunoo 504@G_UNICODE_SCRIPT_BUHID: Buhid 505@G_UNICODE_SCRIPT_TAGBANWA: Tagbanwa 506@G_UNICODE_SCRIPT_BRAILLE: Braille 507@G_UNICODE_SCRIPT_CYPRIOT: Cypriot 508@G_UNICODE_SCRIPT_LIMBU: Limbu 509@G_UNICODE_SCRIPT_OSMANYA: Osmanya 510@G_UNICODE_SCRIPT_SHAVIAN: Shavian 511@G_UNICODE_SCRIPT_LINEAR_B: Linear B 512@G_UNICODE_SCRIPT_TAI_LE: Tai Le 513@G_UNICODE_SCRIPT_UGARITIC: Ugaritic 514@G_UNICODE_SCRIPT_NEW_TAI_LUE: New Tai Lue 515@G_UNICODE_SCRIPT_BUGINESE: Buginese 516@G_UNICODE_SCRIPT_GLAGOLITIC: Glagolitic 517@G_UNICODE_SCRIPT_TIFINAGH: Tifinagh 518@G_UNICODE_SCRIPT_SYLOTI_NAGRI: Syloti Nagri 519@G_UNICODE_SCRIPT_OLD_PERSIAN: Old Persian 520@G_UNICODE_SCRIPT_KHAROSHTHI: Kharoshthi 521@G_UNICODE_SCRIPT_UNKNOWN: an unassigned code point 522@G_UNICODE_SCRIPT_BALINESE: Balinese 523@G_UNICODE_SCRIPT_CUNEIFORM: Cuneiform 524@G_UNICODE_SCRIPT_PHOENICIAN: Phoenician 525@G_UNICODE_SCRIPT_PHAGS_PA: Phags-pa 526@G_UNICODE_SCRIPT_NKO: N'Ko 527@G_UNICODE_SCRIPT_KAYAH_LI: Kayah Li. Since 2.16.3 528@G_UNICODE_SCRIPT_LEPCHA: Lepcha. Since 2.16.3 529@G_UNICODE_SCRIPT_REJANG: Rejang. Since 2.16.3 530@G_UNICODE_SCRIPT_SUNDANESE: Sundanese. Since 2.16.3 531@G_UNICODE_SCRIPT_SAURASHTRA: Saurashtra. Since 2.16.3 532@G_UNICODE_SCRIPT_CHAM: Cham. Since 2.16.3 533@G_UNICODE_SCRIPT_OL_CHIKI: Ol Chiki. Since 2.16.3 534@G_UNICODE_SCRIPT_VAI: Vai. Since 2.16.3 535@G_UNICODE_SCRIPT_CARIAN: Carian. Since 2.16.3 536@G_UNICODE_SCRIPT_LYCIAN: Lycian. Since 2.16.3 537@G_UNICODE_SCRIPT_LYDIAN: Lydian. Since 2.16.3 538 539<!-- ##### FUNCTION g_unichar_get_script ##### --> 540<para> 541 542</para> 543 544@ch: 545@Returns: 546 547 548<!-- ##### MACRO g_utf8_next_char ##### --> 549<para> 550Skips to the next character in a UTF-8 string. The string must be 551valid; this macro is as fast as possible, and has no error-checking. 552You would use this macro to iterate over a string character by 553character. The macro returns the start of the next UTF-8 character. 554Before using this macro, use g_utf8_validate() to validate strings 555that may contain invalid UTF-8. 556</para> 557 558@p: Pointer to the start of a valid UTF-8 character. 559 560 561<!-- ##### FUNCTION g_utf8_get_char ##### --> 562<para> 563 564</para> 565 566@p: 567@Returns: 568 569 570<!-- ##### FUNCTION g_utf8_get_char_validated ##### --> 571<para> 572 573</para> 574 575@p: 576@max_len: 577@Returns: 578 579 580<!-- ##### FUNCTION g_utf8_offset_to_pointer ##### --> 581<para> 582 583</para> 584 585@str: 586@offset: 587@Returns: 588 589 590<!-- ##### FUNCTION g_utf8_pointer_to_offset ##### --> 591<para> 592 593</para> 594 595@str: 596@pos: 597@Returns: 598 599 600<!-- ##### FUNCTION g_utf8_prev_char ##### --> 601<para> 602 603</para> 604 605@p: 606@Returns: 607 608 609<!-- ##### FUNCTION g_utf8_find_next_char ##### --> 610<para> 611 612</para> 613 614@p: 615@end: 616@Returns: 617 618 619<!-- ##### FUNCTION g_utf8_find_prev_char ##### --> 620<para> 621 622</para> 623 624@str: 625@p: 626@Returns: 627 628 629<!-- ##### FUNCTION g_utf8_strlen ##### --> 630<para> 631 632</para> 633 634@p: 635@max: 636@Returns: 637 638 639<!-- ##### FUNCTION g_utf8_strncpy ##### --> 640<para> 641 642</para> 643 644@dest: 645@src: 646@n: 647@Returns: 648 649 650<!-- ##### FUNCTION g_utf8_strchr ##### --> 651<para> 652 653</para> 654 655@p: 656@len: 657@c: 658@Returns: 659 660 661<!-- ##### FUNCTION g_utf8_strrchr ##### --> 662<para> 663 664</para> 665 666@p: 667@len: 668@c: 669@Returns: 670 671 672<!-- ##### FUNCTION g_utf8_strreverse ##### --> 673<para> 674 675</para> 676 677@str: 678@len: 679@Returns: 680 681 682<!-- ##### FUNCTION g_utf8_validate ##### --> 683<para> 684 685</para> 686 687@str: 688@max_len: 689@end: 690@Returns: 691 692 693<!-- ##### FUNCTION g_utf8_strup ##### --> 694<para> 695 696</para> 697 698@str: 699@len: 700@Returns: 701 702 703<!-- ##### FUNCTION g_utf8_strdown ##### --> 704<para> 705 706</para> 707 708@str: 709@len: 710@Returns: 711 712 713<!-- ##### FUNCTION g_utf8_casefold ##### --> 714<para> 715 716</para> 717 718@str: 719@len: 720@Returns: 721 722 723<!-- ##### FUNCTION g_utf8_normalize ##### --> 724<para> 725 726</para> 727 728@str: 729@len: 730@mode: 731@Returns: 732 733 734<!-- ##### ENUM GNormalizeMode ##### --> 735<para> 736Defines how a Unicode string is transformed in a canonical 737form, standardizing such issues as whether a character with an accent is 738represented as a base character and combining accent or as a single precomposed 739character. Unicode strings should generally be normalized before comparing them. 740</para> 741 742@G_NORMALIZE_DEFAULT: standardize differences that do not affect the 743 text content, such as the above-mentioned accent representation. 744@G_NORMALIZE_NFD: another name for %G_NORMALIZE_DEFAULT. 745@G_NORMALIZE_DEFAULT_COMPOSE: like %G_NORMALIZE_DEFAULT, but with composed 746 forms rather than a maximally decomposed form. 747@G_NORMALIZE_NFC: another name for %G_NORMALIZE_DEFAULT_COMPOSE. 748@G_NORMALIZE_ALL: beyond %G_NORMALIZE_DEFAULT also standardize the 749 "compatibility" characters in Unicode, such as SUPERSCRIPT THREE to the 750 standard forms (in this case DIGIT THREE). Formatting information may be 751 lost but for most text operations such characters should be considered the 752 same. 753@G_NORMALIZE_NFKD: another name for %G_NORMALIZE_ALL. 754@G_NORMALIZE_ALL_COMPOSE: like %G_NORMALIZE_ALL, but with composed 755 forms rather than a maximally decomposed form. 756@G_NORMALIZE_NFKC: another name for %G_NORMALIZE_ALL_COMPOSE. 757 758<!-- ##### FUNCTION g_utf8_collate ##### --> 759<para> 760 761</para> 762 763@str1: 764@str2: 765@Returns: 766 767 768<!-- ##### FUNCTION g_utf8_collate_key ##### --> 769<para> 770 771</para> 772 773@str: 774@len: 775@Returns: 776 777 778<!-- ##### FUNCTION g_utf8_collate_key_for_filename ##### --> 779<para> 780 781</para> 782 783@str: 784@len: 785@Returns: 786 787 788<!-- ##### FUNCTION g_utf8_to_utf16 ##### --> 789<para> 790 791</para> 792 793@str: 794@len: 795@items_read: 796@items_written: 797@error: 798@Returns: 799 800 801<!-- ##### FUNCTION g_utf8_to_ucs4 ##### --> 802<para> 803 804</para> 805 806@str: 807@len: 808@items_read: 809@items_written: 810@error: 811@Returns: 812 813 814<!-- ##### FUNCTION g_utf8_to_ucs4_fast ##### --> 815<para> 816 817</para> 818 819@str: 820@len: 821@items_written: 822@Returns: 823 824 825<!-- ##### FUNCTION g_utf16_to_ucs4 ##### --> 826<para> 827 828</para> 829 830@str: 831@len: 832@items_read: 833@items_written: 834@error: 835@Returns: 836 837 838<!-- ##### FUNCTION g_utf16_to_utf8 ##### --> 839<para> 840 841</para> 842 843@str: 844@len: 845@items_read: 846@items_written: 847@error: 848@Returns: 849 850 851<!-- ##### FUNCTION g_ucs4_to_utf16 ##### --> 852<para> 853 854</para> 855 856@str: 857@len: 858@items_read: 859@items_written: 860@error: 861@Returns: 862 863 864<!-- ##### FUNCTION g_ucs4_to_utf8 ##### --> 865<para> 866 867</para> 868 869@str: 870@len: 871@items_read: 872@items_written: 873@error: 874@Returns: 875 876 877<!-- ##### FUNCTION g_unichar_to_utf8 ##### --> 878<para> 879 880</para> 881 882@c: 883@outbuf: 884@Returns: 885 886 887