• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1//
2//  Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
3//
4//  Distributed under the Boost Software License, Version 1.0. (See
5//  accompanying file LICENSE_1_0.txt or copy at
6//  http://www.boost.org/LICENSE_1_0.txt)
7//
8
9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
10/*!
11\page std_locales Introduction to C++ Standard Library localization support
12
13\section std_locales_basics Getting familiar with standard C++ Locales
14
15The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c
16std::locale class, the container that holds all the required information about a specific culture, such as number formatting
17patterns, date and time formatting, currency, case conversion etc.
18
19All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are
20packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class
21keeps reference counters on installed facets and can be efficiently copied.
22
23Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example,
24the \c std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this:
25
26\code
27std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale);
28char upper_a = ctype_facet.toupper('a');
29\endcode
30
31A locale object can be imbued into an \c iostream so it would format information according to the locale:
32
33\code
34cout.imbue(std::locale("en_US.UTF-8"));
35cout << 1345.45 << endl;
36cout.imbue(std::locale("ru_RU.UTF-8"));
37cout << 1345.45 << endl;
38\endcode
39
40Would display:
41
42\verbatim
43    1,345.45 1.345,45
44\endverbatim
45
46You can also create your own facets and install them into existing locale objects. For example:
47
48\code
49    class measure : public std::locale::facet {
50    public:
51        typedef enum { inches, ... } measure_type;
52        measure(measure_type m,size_t refs=0)
53        double from_metric(double value) const;
54        std::string name() const;
55        ...
56    };
57\endcode
58And now you can simply provide this information to a locale:
59
60\code
61    std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches)));
62    /// Create default locale built from en_US locale and add paper size facet.
63\endcode
64
65
66Now you can print a distance according to the correct locale:
67
68\code
69    void print_distance(std::ostream &out,double value)
70    {
71        measure const &m = std::use_facet<measure>(out.getloc());
72        // Fetch locale information from stream
73        out << m.from_metric(value) << " " << m.name();
74    }
75\endcode
76
77This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using
78the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones.
79
80\section std_locales_common Common Critical Problems with the Standard Library
81
82There are numerous issues in the standard library that prevent the use of its full power, and there are several
83additional issues:
84
85-   Setting the global locale has bad side effects.
86    \n
87    Consider following code:
88    \n
89    \code
90        int main()
91        {
92            std::locale::global(std::locale(""));
93            // Set system's default locale as global
94            std::ofstream csv("test.csv");
95            csv << 1.1 << ","  << 1.3 << std::endl;
96        }
97    \endcode
98    \n
99    What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3"
100    rather than what you had expected.
101    \n
102    More than that it affects even \c printf and libraries like \c boost::lexical_cast giving
103    incorrect or unexpected formatting. In fact many third-party libraries are broken in such a
104    situation.
105    \n
106    Unlike the standard localization library, Boost.Locale never changes the basic number formatting,
107    even when it uses \c std based localization backends, so by default, numbers are always
108    formatted using C-style locale. Localized number formatting requires specific flags.
109    \n
110-   Number formatting is broken on some locales.
111    \n
112    Some locales use the non-breakable space u00A0 character for thousands separator, thus
113    in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space
114    is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle
115    this correctly, for example GCC and SunStudio display a "\xC2" character instead of
116    the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and
117    actually generate invalid UTF-8.
118    \n
119-   Locale names are not standardized. For example, under MSVC you need to provide the name
120    \c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8
121    or \c en_US.ISO-8859-1
122    \n
123    More than that, MSVC does not support UTF-8 locales at all.
124    \n
125-   Many standard libraries provide only the C and POSIX locales, thus GCC supports localization
126    only under Linux. On all other platforms, attempting to create locales other than "C" or
127    "POSIX" would fail.
128
129*/
130
131