1// 2// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh) 3// 4// Distributed under the Boost Software License, Version 1.0. (See 5// accompanying file LICENSE_1_0.txt or copy at 6// http://www.boost.org/LICENSE_1_0.txt) 7// 8 9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen 10/*! 11\page messages_formatting Messages Formatting (Translation) 12 13- \ref messages_formatting_into 14- \ref msg_loading_dictionaries 15- \ref message_translation 16 - \ref indirect_message_translation 17 - \ref plural_forms 18 - \ref multiple_gettext_domain 19 - \ref direct_message_translation 20- \ref extracting_messages_from_code 21- \ref custom_file_system_support 22- \ref msg_non_ascii_keys 23- \ref msg_qna 24 25\section messages_formatting_into Introduction 26 27Messages formatting is probably the most important part of 28the localization - making your application speak in the user's language. 29 30Boost.Locale uses the <a href="http://www.gnu.org/software/gettext/">GNU Gettext</a> localization model. 31We recommend you read the general <a href="http://www.gnu.org/software/gettext/manual/gettext.html">documentation</a> 32of GNU Gettext, as it is outside the scope of this document. 33 34The model is following: 35 36- First, our application \c foo is prepared for localization by calling the \ref boost::locale::translate() "translate" function 37 for each message used in user interface. 38 \n 39 For example: 40 \code 41 cout << "Hello World" << endl; 42 \endcode 43 Is changed to 44 \n 45 \code 46 cout << translate("Hello World") << endl; 47 \endcode 48- Then all messages are extracted from the source code and a special \c foo.po file is generated that contains all of the 49 original English strings. 50 \n 51 \verbatim 52 ... 53 msgid "Hello World" 54 msgstr "" 55 ... 56 \endverbatim 57- The \c foo.po file is translated for the supported locales. For example, \c de.po, \c ar.po, \c en_CA.po , and \c he.po. 58 \n 59 \verbatim 60 ... 61 msgid "Hello World" 62 msgstr "שלום עולם" 63 \endverbatim 64 And then compiled to the binary \c mo format and stored in the following file structure: 65 \n 66 \verbatim 67 de 68 de/LC_MESSAGES 69 de/LC_MESSAGES/foo.mo 70 en_CA/ 71 en_CA/LC_MESSAGES 72 en_CA/LC_MESSAGES/foo.mo 73 ... 74 \endverbatim 75 \n 76 When the application starts, it loads the required dictionaries. Then when the \c translate function is called and the message is written 77 to an output stream, a dictionary lookup is performed and the localized message is written out instead. 78 79\section msg_loading_dictionaries Loading dictionaries 80 81All the dictionaries are loaded by the \ref boost::locale::generator "generator" class. 82Using localized strings in the application, requires specification 83of the following parameters: 84 85-# The search path of the dictionaries 86-# The application domain (or name) 87 88This is done by calling the following member functions of the \ref boost::locale::generator "generator" class: 89 90- \ref boost::locale::generator::add_messages_path() "add_messages_path" - add the root path to the dictionaries. 91 \n 92 For example: if the dictionary is located at \c /usr/share/locale/ar/LC_MESSAGES/foo.mo, then path should be \c /usr/share/locale. 93 \n 94- \ref boost::locale::generator::add_messages_domain() "add_messages_domain" - add the domain (name) of the application. In the above case it would be "foo". 95 96\note At least one domain and one path should be specified in order to load dictionaries. 97 98This is an example of our first fully localized program: 99 100\code 101#include <boost/locale.hpp> 102#include <iostream> 103 104using namespace std; 105using namespace boost::locale; 106 107int main() 108{ 109 generator gen; 110 111 // Specify location of dictionaries 112 gen.add_messages_path("."); 113 gen.add_messages_domain("hello"); 114 115 // Generate locales and imbue them to iostream 116 locale::global(gen("")); 117 cout.imbue(locale()); 118 119 // Display a message using current system locale 120 cout << translate("Hello World") << endl; 121} 122\endcode 123 124 125\section message_translation Message Translation 126 127There are two ways to translate messages: 128 129- using \ref boost_locale_translate_family "boost::locale::translate()" family of functions: 130 \n 131 These functions create a special proxy object \ref boost::locale::basic_message "basic_message" 132 that can be converted to string according to given locale or written to \c std::ostream 133 formatting the message in the \c std::ostream's locale. 134 \n 135 It is very convenient for working with \c std::ostream object and for postponing message 136 translation 137- Using \ref boost_locale_gettext_family "boost::locale::gettext()" family of functions: 138 \n 139 These are functions that are used for direct message translation: they receive as a parameter 140 an original message or a key and convert it to the \c std::basic_string in given locale. 141 \n 142 These functions have similar names to thous used in the GNU Gettext library. 143 144\subsection indirect_message_translation Indirect Message Translation 145 146The basic function that allows us to translate a message is \ref boost_locale_translate_family "boost::locale::translate()" family of functions. 147 148These functions use a character type \c CharType as template parameter and receive either <tt>CharType const *</tt> or <tt>std::basic_string<CharType></tt> as input. 149 150These functions receive an original message and return a special proxy 151object - \ref boost::locale::basic_message "basic_message<CharType>". 152This object holds all the required information for the message formatting. 153 154When this object is written to an output \c ostream, it performs a dictionary lookup of the message according to the locale 155imbued in \c iostream. 156 157If the message is found in the dictionary it is written to the output stream, 158otherwise the original string is written to the stream. 159 160For example: 161 162\code 163// Translate a simple message "Hello World!" 164std::cout << boost::locale::translate("Hello World!") << std::endl; 165\endcode 166 167This allows the program to postpone translation of the message until the translation is actually needed, even to different 168locale targets. 169 170\code 171// Several output stream that we write a message to 172// English, Japanese, Hebrew etc. 173// Each one them has installed std::locale object that represents 174// their specific locale 175std::ofstream en,ja,he,de,ar; 176 177// Send single message to multiple streams 178void send_to_all(message const &msg) 179{ 180 // in each of the cases below 181 // the message is translated to different 182 // language 183 en << msg; 184 ja << msg; 185 he << msg; 186 de << msg; 187 ar << msg; 188} 189 190int main() 191{ 192 ... 193 send_to_all(translate("Hello World")); 194} 195\endcode 196 197\note 198 199- \ref boost::locale::basic_message "basic_message" can be implicitly converted 200 to an apopriate std::basic_string using 201 the global locale: 202 \n 203 \code 204 std::wstring msg = translate(L"Do you want to open the file?"); 205 \endcode 206- \ref boost::locale::basic_message "basic_message" can be explicitly converted 207 to a string using the \ref boost::locale::basic_message::str() "str()" member function for a specific locale. 208 \n 209 \code 210 std::locale ru_RU = ... ; 211 std::string msg = translate("Do you want to open the file?").str(ru_RU); 212 \endcode 213 214 215\subsection plural_forms Plural Forms 216 217GNU Gettext catalogs have simple, robust and yet powerful plural forms support. We recommend to read the 218original GNU documentation <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">here</a>. 219 220Let's try to solve a simple problem, displaying a message to the user: 221 222\code 223 if(files == 1) 224 cout << translate("You have 1 file in the directory") << endl; 225 else 226 cout << format(translate("You have {1} files in the directory")) % files << endl; 227\endcode 228 229This very simple task becomes quite complicated when we deal with languages other than English. Many languages have more 230than two plural forms. For example, in Hebrew there are special forms for single, double, plural, and plural above 10. 231They can't be distinguished by the simple rule "is n 1 or not" 232 233The correct solution is to give a translator an ability to choose a plural form on its own. Thus the translate 234function can receive two additional parameters English plural form a number: <tt>translate(single,plural,count)</tt> 235 236For example: 237 238\code 239cout << format(translate( "You have {1} file in the directory", 240 "You have {1} files in the directory", 241 files)) % files << endl; 242\endcode 243 244A special entry in the dictionary specifies the rule to choose the correct plural form in the target language. 245For example, the Slavic language family has 3 plural forms, that can be chosen using following equation: 246 247\code 248 plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2; 249\endcode 250 251Such equation is stored in the message catalog itself and it is evaluated during translation to supply the correct form. 252 253So the code above would display 3 different forms in Russian locale for values of 1, 3 and 5: 254 255\verbatim 256У вас есть 1 файл в каталоге 257У вас есть 3 файла в каталоге 258У вас есть 5 файлов в каталоге 259\endverbatim 260 261And for Japanese that does not have plural forms at all it would display the same message 262for any numeric value. 263 264For more detailed information please refer to GNU Gettext: <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">11.2.6 Additional functions for plural forms</a> 265 266 267\subsection adding_context_information Adding Context Information 268 269In many cases it is not sufficient to provide only the original English string to get the correct translation. 270You sometimes need to provide some context information. In German, for example, a button labeled "open" is translated to 271"öffnen" in the context of "opening a file", or to "aufbauen" in the context of opening an internet connection. 272 273In these cases you must add some context information to the original string, by adding a comment. 274 275\code 276button->setLabel(translate("File","open")); 277\endcode 278 279The context information is provided as the first parameter to the \ref boost::locale::translate() "translate" 280function in both singular and plural forms. The translator would see this context information and would be able to translate the 281"open" string correctly. 282 283For example, this is how the \c po file would look: 284 285\code 286msgctxt "File" 287msgid "open" 288msgstr "öffnen" 289 290msgctxt "Internet Connection" 291msgid "open" 292msgstr "aufbauen" 293\endcode 294 295\note Context information requires more recent versions of the gettext tools (>=0.15) for extracting strings and 296formatting message catalogs. 297 298 299\subsection multiple_gettext_domain Working with multiple messages domains 300 301In some cases it is useful to work with multiple message domains. 302 303For example, if an application consists of several independent modules, it may 304have several domains - a separate domain for each module. 305 306For example, developing a FooBar office suite we might have: 307 308- a FooBar Word Processor, using the "foobarwriter" domain 309- a FooBar Spreadsheet, using the "foobarspreadsheet" domain 310- a FooBar Spell Checker, using the "foobarspell" domain 311- a FooBar File handler, using the "foobarodt" domain 312 313There are three ways to use non-default domains: 314 315- When working with \c iostream, you can use the parameterized manipulator \ref 316 boost::locale::as::domain "as::domain(std::string const &)", which allows switching domains in a stream: 317 \n 318 \code 319 cout << as::domain("foo") << translate("Hello") << as::domain("bar") << translate("Hello"); 320 // First translation is taken from dictionary foo and the other from dictionary bar 321 \endcode 322- You can specify the domain explicitly when converting a \c message object to a string: 323 \code 324 std::wstring foo_msg = translate(L"Hello World").str("foo"); 325 std::wstring bar_msg = translate(L"Hello World").str("bar"); 326 \endcode 327- You can specify the domain directly using a \ref direct_message_translation "convenience" interface: 328 \code 329 MessageBox(dgettext("gui","Error Occurred")); 330 \endcode 331 332\subsection direct_message_translation Direct translation (Convenience Interface) 333 334Many applications do not write messages directly to an output stream or use only one locale in the process, so 335calling <tt>translate("Hello World").str()</tt> for a single message would be annoying. Thus Boost.Locale provides 336GNU Gettext-like localization functions for direct translation of the messages. However, unlike the GNU Gettext functions, 337the Boost.Locale translation functions provide an additional optional parameter (locale), and support wide, u16 and u32 strings. 338 339The GNU Gettext like functions prototypes can be found \ref boost_locale_gettext_family "in this section". 340 341 342All of these functions can have different prefixes for different forms: 343 344- \c d - translation in specific domain 345- \c n - plural form translation 346- \c p - translation in specific context 347 348\code 349 MessageBoxW(0,pgettext(L"File Dialog",L"Open?").c_str(),gettext(L"Question").c_str(),MB_YESNO); 350\endcode 351 352 353\section extracting_messages_from_code Extracting messages from the source code 354 355There are many tools to extract messages from the source code into the \c .po file format. The most 356popular and "native" tool is \c xgettext which is installed by default on most Unix systems and freely downloadable 357for Windows (see \ref gettext_for_windows). 358 359For example, we have a source file called \c dir.cpp that prints: 360 361\code 362 cout << format(translate("Listing of catalog {1}:")) % file_name << endl; 363 cout << format(translate("Catalog {1} contains 1 file","Catalog {1} contains {2,num} files",files_no)) 364 % file_name % files_no << endl; 365\endcode 366 367Now we run: 368 369\verbatim 370xgettext --keyword=translate:1,1t --keyword=translate:1,2,3t dir.cpp 371\endverbatim 372 373And a file called \c messages.po created that looks like this (approximately): 374 375\code 376#: dir.cpp:1 377msgid "Listing of catalog {1}:" 378msgstr "" 379 380#: dir.cpp:2 381msgid "Catalog {1} contains 1 file" 382msgid_plural "Catalog {1} contains {2,num} files" 383msgstr[0] "" 384msgstr[1] "" 385\endcode 386 387This file can be given to translators to adapt it to specific languages. 388 389We used the \c --keyword parameter of \c xgettext to make it suitable for extracting messages from 390source code localized with Boost.Locale, searching for <tt>translate()</tt> function calls instead of the default <tt>gettext()</tt> 391and <tt>ngettext()</tt> ones. 392The first parameter <tt>--keyword=translate:1,1t</tt> provides the template for basic messages: a \c translate function that is 393called with 1 argument (1t) and the first message is taken as the key. The second one <tt>--keyword=translate:1,2,3t</tt> is used 394for plural forms. 395It tells \c xgettext to use a <tt>translate()</tt> function call with 3 parameters (3t) and take the 1st and 2nd parameter as keys. An 396additional marker \c Nc can be used to mark context information. 397 398The full set of xgettext parameters suitable for Boost.Locale is: 399 400\code 401xgettext --keyword=translate:1,1t --keyword=translate:1c,2,2t \ 402 --keyword=translate:1,2,3t --keyword=translate:1c,2,3,4t \ 403 --keyword=gettext:1 --keyword=pgettext:1c,2 \ 404 --keyword=ngettext:1,2 --keyword=npgettext:1c,2,3 \ 405 source_file_1.cpp ... source_file_N.cpp 406\endcode 407 408Of course, if you do not use "gettext" like translation you 409may ignore some of these parameters. 410 411\subsection custom_file_system_support Custom Filesystem Support 412 413When the access to actual file system is limited like in ActiveX controls or 414when the developer wants to ship all-in-one executable file, 415it is useful to be able to load \c gettext catalogs from a custom location - 416a custom file system. 417 418Boost.Locale provides an option to install boost::locale::message_format facet 419with customized options provided in boost::locale::gnu_gettext::messages_info structure. 420 421This structure contains \c boost::function based 422\ref boost::locale::gnu_gettext::messages_info::callback_type "callback" 423that allows user to provide custom functionality to load message catalog files. 424 425For example: 426 427\code 428// Configure all options for message catalog 429namespace blg = boost::locale::gnu_gettext; 430blg::messages_info info; 431info.language = "he"; 432info.country = "IL"; 433info.encoding="UTF-8"; 434info.paths.push_back(""); // You need some even empty path 435info.domains.push_back(blg::messages_info::domain("my_app")); 436info.callback = some_file_loader; // Provide a callback 437 438// Create a basic locale without messages support 439boost::locale::generator gen; 440std::locale base_locale = gen("he_IL.UTF-8"); 441 442// Install messages catalogs for "char" support to the final locale 443// we are going to use 444std::locale real_locale(base_locale,blg::create_messages_facet<char>(info)); 445\endcode 446 447In order to setup \ref boost::locale::gnu_gettext::messages_info::language "language", \ref boost::locale::gnu_gettext::messages_info::country "country" and other members you may use \ref boost::locale::info facet for convenience, 448 449\code 450// Configure all options for message catalog 451namespace blg = boost::locale::gnu_gettext; 452blg::messages_info info; 453 454info.paths.push_back(""); // You need some even empty path 455info.domains.push_back(blg::messages_info::domain("my_app")); 456info.callback = some_file_loader; // Provide a callback 457 458// Create an object with default locale 459std::locale base_locale = gen(""); 460 461// Use boost::locale::info to configure all parameters 462 463boost::locale::info const &properties = std::use_facet<boost::locale::info>(base_locale); 464info.language = properties.language(); 465info.country = properties.country(); 466info.encoding = properties.encoding(); 467info.variant = properties.variant(); 468 469// Install messages catalogs to the final locale 470std::locale real_locale(base_locale,blg::create_messages_facet<char>(info)); 471\endcode 472 473\section msg_non_ascii_keys Non US-ASCII Keys 474 475Boost.Locale assumes that you use English for original text messages. And the best 476practice is to use US-ASCII characters for original keys. 477 478However in some cases it us useful in insert some Unicode characters in text like 479for example Copyright "©" character. 480 481As long as your narrow character string encoding is UTF-8 nothing further should be done. 482 483Boost.Locale assumes that your sources are encoded in UTF-8 and the input narrow 484string use UTF-8 - which is the default for most compilers around (with notable 485exception of Microsoft Visual C++). 486 487However if your narrow strings encoding in the source file is not UTF-8 but some other 488encoding like windows-1252, the string would be misinterpreted. 489 490You can specify the character set of the original strings when you specify the 491domain name for the application. 492 493\code 494#include <boost/locale.hpp> 495#include <iostream> 496 497using namespace std; 498using namespace boost::locale; 499 500int main() 501{ 502 generator gen; 503 504 // Specify location of dictionaries 505 gen.add_messages_path("."); 506 // Specify the encoding of the source string 507 gen.add_messages_domain("copyrighted/windows-1255"); 508 509 // Generate locales and imbue them to iostream 510 locale::global(gen("")); 511 cout.imbue(locale()); 512 513 // In Windows 1255 (C) symbol is encoded as 0xA9 514 cout << translate("© 2001 All Rights Reserved") << endl; 515} 516\endcode 517 518Thus if the programs runs in UTF-8 locale the copyright symbol would 519be automatically converted to an appropriate UTF-8 sequence if the 520key is missing in the dictionary. 521 522 523\subsection msg_qna Questions and Answers 524 525- Do I need GNU Gettext to use Boost.Locale? 526 \n 527 Boost.Locale provides a run-time environment to load and use GNU Gettext message catalogs, but it does 528 not provide tools for generation, translation, compilation and management of these catalogs. 529 Boost.Locale only reimplements the GNU Gettext libintl. 530 \n 531 You would probably need: 532 \n 533 -# Boost.Locale itself -- for runtime. 534 -# A tool for extracting strings from source code, and managing them: GNU Gettext provides good tools, but other 535 implementations are available as well. 536 -# A good translation program like <a href="http://userbase.kde.org/Lokalize">Lokalize</a>, <a href="http://www.poedit.net/">Pedit</a> or <a href="http://projects.gnome.org/gtranslator/">GTranslator</a>. 537 538- Why doesn't Boost.Locale provide tools for extracting and management of message catalogs. Why should 539 I use GPL-ed software? Are my programs or message catalogs affected by its license? 540 \n 541 -# Boost.Locale does not link to or use any of the GNU Gettext code, so you need not worry about your code as 542 the runtime library is fully reimplemented. 543 -# You may freely use GPL-ed software for extracting and managing catalogs, the same way as you are free to use 544 a GPL-ed editor. It does not affect your message catalogs or your code. 545 -# I see no reason to reimplement well debugged, working tools like \c xgettext, \c msgfmt, \c msgmerge that 546 do a very fine job, especially as they are freely available for download and support almost any platform. 547 All Linux distributions, BSD Flavors, Mac OS X and other Unix like operating systems provide GNU Gettext tools 548 as a standard package.\n 549 Windows users can get GNU Gettext utilities via MinGW project. See \ref gettext_for_windows. 550 551 552- Is there any reason to prefer the Boost.Locale implementation to the original GNU Gettext runtime library? 553 In either case I would probably need some of the GNU tools. 554 \n 555 There are two important differences between the GNU Gettext runtime library and the Boost.Locale implementation: 556 \n 557 -# The GNU Gettext runtime supports only one locale per process. It is not thread-safe to use multiple locales 558 and encodings in the same process. This is perfectly fine for applications that interact directly with 559 a single user like most GUI applications, but is problematic for services and servers. 560 -# The GNU Gettext API supports only 8-bit encodings, making it irrelevant in environments that natively use 561 wide strings. 562 -# The GNU Gettext runtime library distributed under LGPL license which may be not convenient for some users. 563 564*/ 565 566 567