• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1//
2//  Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
3//
4//  Distributed under the Boost Software License, Version 1.0. (See
5//  accompanying file LICENSE_1_0.txt or copy at
6//  http://www.boost.org/LICENSE_1_0.txt)
7//
8
9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
10/*!
11\page conversions Text Conversions
12
13There is a set of functions that perform basic string conversion operations:
14upper, lower and \ref term_title_case "title case" conversions, \ref term_case_folding "case folding"
15and Unicode \ref term_normalization "normalization". These are \ref boost::locale::to_upper "to_upper" , \ref boost::locale::to_lower "to_lower", \ref boost::locale::to_title "to_title", \ref boost::locale::fold_case "fold_case" and \ref boost::locale::normalize "normalize".
16
17All these functions receive an \c std::locale object as parameter or use a global locale by default.
18
19Global locale is used in all examples below.
20
21\section conversions_case Case Handing
22
23For example:
24\code
25    std::string grussen = "grüßEN";
26    std::cout   <<"Upper "<< boost::locale::to_upper(grussen) << std::endl
27                <<"Lower "<< boost::locale::to_lower(grussen) << std::endl
28                <<"Title "<< boost::locale::to_title(grussen) << std::endl
29                <<"Fold  "<< boost::locale::fold_case(grussen) << std::endl;
30\endcode
31
32Would print:
33
34\verbatim
35Upper GRÜSSEN
36Lower grüßen
37Title Grüßen
38Fold  grüssen
39\endverbatim
40
41You may notice that there are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library.
42The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.
43
44For example:
45
46\code
47    std::wstring grussen = L"grüßen";
48    std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
49\endcode
50
51Would give in output:
52
53\verbatim
54GRÜßEN GRÜSSEN
55\endverbatim
56
57Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet.
58
59This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all.
60For example, this code
61
62\code
63    std::string grussen = "grüßen";
64    std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
65\endcode
66
67Would modify ASCII characters only
68
69\verbatim
70GRüßEN GRÜSSEN
71\endverbatim
72
73\section conversions_normalization Unicode Normalization
74
75Unicode normalization is the process of converting strings to a standard form, suitable for text processing and
76comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the
77diaeresis "¨". Normalization is an important part of Unicode text processing.
78
79Unicode defines four normalization forms. Each specific form is selected by a flag passed
80to \ref boost::locale::normalize() "normalize" function:
81
82- NFD - Canonical decomposition - boost::locale::norm_nfd
83- NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default
84- NFKD - Compatibility decomposition - boost::locale::norm_nfkd
85- NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc
86
87For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.
88
89\section conversions_notes Notes
90
91-   \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the
92    character width. So be careful when using non-UTF encodings as they may be treated incorrectly.
93-   \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to
94    determine the 8-bit encoding.
95-   All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always
96    return a newly created STL string.
97-   The length of the string may change, see the above example.
98*/
99
100
101