1// 2// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh) 3// 4// Distributed under the Boost Software License, Version 1.0. (See 5// accompanying file LICENSE_1_0.txt or copy at 6// http://www.boost.org/LICENSE_1_0.txt) 7// 8 9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen 10/*! 11\page std_locales Introduction to C++ Standard Library localization support 12 13\section std_locales_basics Getting familiar with standard C++ Locales 14 15The C++ standard library offers a simple and powerful way to provide locale-specific information. It is done via the \c 16std::locale class, the container that holds all the required information about a specific culture, such as number formatting 17patterns, date and time formatting, currency, case conversion etc. 18 19All this information is provided by facets, special classes derived from the \c std::locale::facet base class. Such facets are 20packed into the \c std::locale class and allow you to provide arbitrary information about the locale. The \c std::locale class 21keeps reference counters on installed facets and can be efficiently copied. 22 23Each facet that was installed into the \c std::locale object can be fetched using the \c std::use_facet function. For example, 24the \c std::ctype<Char> facet provides rules for case conversion, so you can convert a character to upper-case like this: 25 26\code 27std::ctype<char> const &ctype_facet = std::use_facet<std::ctype<char> >(some_locale); 28char upper_a = ctype_facet.toupper('a'); 29\endcode 30 31A locale object can be imbued into an \c iostream so it would format information according to the locale: 32 33\code 34cout.imbue(std::locale("en_US.UTF-8")); 35cout << 1345.45 << endl; 36cout.imbue(std::locale("ru_RU.UTF-8")); 37cout << 1345.45 << endl; 38\endcode 39 40Would display: 41 42\verbatim 43 1,345.45 1.345,45 44\endverbatim 45 46You can also create your own facets and install them into existing locale objects. For example: 47 48\code 49 class measure : public std::locale::facet { 50 public: 51 typedef enum { inches, ... } measure_type; 52 measure(measure_type m,size_t refs=0) 53 double from_metric(double value) const; 54 std::string name() const; 55 ... 56 }; 57\endcode 58And now you can simply provide this information to a locale: 59 60\code 61 std::locale::global(std::locale(std::locale("en_US.UTF-8"),new measure(measure::inches))); 62 /// Create default locale built from en_US locale and add paper size facet. 63\endcode 64 65 66Now you can print a distance according to the correct locale: 67 68\code 69 void print_distance(std::ostream &out,double value) 70 { 71 measure const &m = std::use_facet<measure>(out.getloc()); 72 // Fetch locale information from stream 73 out << m.from_metric(value) << " " << m.name(); 74 } 75\endcode 76 77This technique was adopted by the Boost.Locale library in order to provide powerful and correct localization. Instead of using 78the very limited C++ standard library facets, it uses ICU under the hood to create its own much more powerful ones. 79 80\section std_locales_common Common Critical Problems with the Standard Library 81 82There are numerous issues in the standard library that prevent the use of its full power, and there are several 83additional issues: 84 85- Setting the global locale has bad side effects. 86 \n 87 Consider following code: 88 \n 89 \code 90 int main() 91 { 92 std::locale::global(std::locale("")); 93 // Set system's default locale as global 94 std::ofstream csv("test.csv"); 95 csv << 1.1 << "," << 1.3 << std::endl; 96 } 97 \endcode 98 \n 99 What would be the content of \c test.csv ? It may be "1.1,1.3" or it may be "1,1,1,3" 100 rather than what you had expected. 101 \n 102 More than that it affects even \c printf and libraries like \c boost::lexical_cast giving 103 incorrect or unexpected formatting. In fact many third-party libraries are broken in such a 104 situation. 105 \n 106 Unlike the standard localization library, Boost.Locale never changes the basic number formatting, 107 even when it uses \c std based localization backends, so by default, numbers are always 108 formatted using C-style locale. Localized number formatting requires specific flags. 109 \n 110- Number formatting is broken on some locales. 111 \n 112 Some locales use the non-breakable space u00A0 character for thousands separator, thus 113 in \c ru_RU.UTF-8 locale number 1024 should be displayed as "1 024" where the space 114 is a Unicode character with codepoint u00A0. Unfortunately many libraries don't handle 115 this correctly, for example GCC and SunStudio display a "\xC2" character instead of 116 the first character in the UTF-8 sequence "\xC2\xA0" that represents this code point, and 117 actually generate invalid UTF-8. 118 \n 119- Locale names are not standardized. For example, under MSVC you need to provide the name 120 \c en-US or \c English_USA.1252 , when on POSIX platforms it would be \c en_US.UTF-8 121 or \c en_US.ISO-8859-1 122 \n 123 More than that, MSVC does not support UTF-8 locales at all. 124 \n 125- Many standard libraries provide only the C and POSIX locales, thus GCC supports localization 126 only under Linux. On all other platforms, attempting to create locales other than "C" or 127 "POSIX" would fail. 128 129*/ 130 131