1// 2// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh) 3// 4// Distributed under the Boost Software License, Version 1.0. (See 5// accompanying file LICENSE_1_0.txt or copy at 6// http://www.boost.org/LICENSE_1_0.txt) 7// 8 9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen 10/*! 11\page conversions Text Conversions 12 13There is a set of functions that perform basic string conversion operations: 14upper, lower and \ref term_title_case "title case" conversions, \ref term_case_folding "case folding" 15and Unicode \ref term_normalization "normalization". These are \ref boost::locale::to_upper "to_upper" , \ref boost::locale::to_lower "to_lower", \ref boost::locale::to_title "to_title", \ref boost::locale::fold_case "fold_case" and \ref boost::locale::normalize "normalize". 16 17All these functions receive an \c std::locale object as parameter or use a global locale by default. 18 19Global locale is used in all examples below. 20 21\section conversions_case Case Handing 22 23For example: 24\code 25 std::string grussen = "grüßEN"; 26 std::cout <<"Upper "<< boost::locale::to_upper(grussen) << std::endl 27 <<"Lower "<< boost::locale::to_lower(grussen) << std::endl 28 <<"Title "<< boost::locale::to_title(grussen) << std::endl 29 <<"Fold "<< boost::locale::fold_case(grussen) << std::endl; 30\endcode 31 32Would print: 33 34\verbatim 35Upper GRÜSSEN 36Lower grüßen 37Title Grüßen 38Fold grüssen 39\endverbatim 40 41You may notice that there are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library. 42The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions. 43 44For example: 45 46\code 47 std::wstring grussen = L"grüßen"; 48 std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl; 49\endcode 50 51Would give in output: 52 53\verbatim 54GRÜßEN GRÜSSEN 55\endverbatim 56 57Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet. 58 59This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all. 60For example, this code 61 62\code 63 std::string grussen = "grüßen"; 64 std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl; 65\endcode 66 67Would modify ASCII characters only 68 69\verbatim 70GRüßEN GRÜSSEN 71\endverbatim 72 73\section conversions_normalization Unicode Normalization 74 75Unicode normalization is the process of converting strings to a standard form, suitable for text processing and 76comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the 77diaeresis "¨". Normalization is an important part of Unicode text processing. 78 79Unicode defines four normalization forms. Each specific form is selected by a flag passed 80to \ref boost::locale::normalize() "normalize" function: 81 82- NFD - Canonical decomposition - boost::locale::norm_nfd 83- NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default 84- NFKD - Compatibility decomposition - boost::locale::norm_nfkd 85- NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc 86 87For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>. 88 89\section conversions_notes Notes 90 91- \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the 92 character width. So be careful when using non-UTF encodings as they may be treated incorrectly. 93- \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to 94 determine the 8-bit encoding. 95- All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always 96 return a newly created STL string. 97- The length of the string may change, see the above example. 98*/ 99 100 101