• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1//
2//  Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
3//
4//  Distributed under the Boost Software License, Version 1.0. (See
5//  accompanying file LICENSE_1_0.txt or copy at
6//  http://www.boost.org/LICENSE_1_0.txt)
7//
8
9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
10/*!
11\page messages_formatting Messages Formatting (Translation)
12
13- \ref messages_formatting_into
14- \ref msg_loading_dictionaries
15- \ref message_translation
16    - \ref indirect_message_translation
17    - \ref plural_forms
18    - \ref multiple_gettext_domain
19    - \ref direct_message_translation
20- \ref extracting_messages_from_code
21- \ref custom_file_system_support
22- \ref msg_non_ascii_keys
23- \ref msg_qna
24
25\section messages_formatting_into Introduction
26
27Messages formatting is probably the most important part of
28the localization - making your application speak in the user's language.
29
30Boost.Locale uses the <a href="http://www.gnu.org/software/gettext/">GNU Gettext</a> localization model.
31We recommend you read the general <a href="http://www.gnu.org/software/gettext/manual/gettext.html">documentation</a>
32of GNU Gettext, as it is outside the scope of this document.
33
34The model is following:
35
36-   First, our application \c foo is prepared for localization by calling the \ref boost::locale::translate() "translate" function
37    for each message used in user interface.
38    \n
39    For example:
40    \code
41    cout << "Hello World" << endl;
42    \endcode
43    Is changed to
44    \n
45    \code
46    cout << translate("Hello World") << endl;
47    \endcode
48-   Then all messages are extracted from the source code and a special \c foo.po file is generated that contains all of the
49    original English strings.
50    \n
51    \verbatim
52    ...
53    msgid "Hello World"
54    msgstr ""
55    ...
56    \endverbatim
57-   The \c foo.po file is translated for the supported locales. For example, \c de.po, \c ar.po, \c en_CA.po , and \c he.po.
58    \n
59    \verbatim
60    ...
61    msgid "Hello World"
62    msgstr "שלום עולם"
63    \endverbatim
64    And then compiled to the binary \c mo format and stored in the following file structure:
65    \n
66    \verbatim
67    de
68    de/LC_MESSAGES
69    de/LC_MESSAGES/foo.mo
70    en_CA/
71    en_CA/LC_MESSAGES
72    en_CA/LC_MESSAGES/foo.mo
73    ...
74    \endverbatim
75    \n
76    When the application starts, it loads the required dictionaries. Then when the \c translate function is called and the message is written
77    to an output stream, a dictionary lookup is performed and the localized message is written out instead.
78
79\section msg_loading_dictionaries Loading dictionaries
80
81All the dictionaries are loaded by the \ref boost::locale::generator "generator" class.
82Using localized strings in the application, requires specification
83of the following parameters:
84
85-# The search path of the dictionaries
86-# The application domain (or name)
87
88This is done by calling the following member functions of the \ref boost::locale::generator "generator" class:
89
90-   \ref boost::locale::generator::add_messages_path() "add_messages_path" - add the root path to the dictionaries.
91    \n
92    For example: if the dictionary is located at \c /usr/share/locale/ar/LC_MESSAGES/foo.mo, then path should be \c /usr/share/locale.
93    \n
94-   \ref boost::locale::generator::add_messages_domain() "add_messages_domain" - add the domain (name) of the application. In the above case it would be "foo".
95
96\note At least one domain and one path should be specified in order to load dictionaries.
97
98This is an example of our first fully localized program:
99
100\code
101#include <boost/locale.hpp>
102#include <iostream>
103
104using namespace std;
105using namespace boost::locale;
106
107int main()
108{
109    generator gen;
110
111    // Specify location of dictionaries
112    gen.add_messages_path(".");
113    gen.add_messages_domain("hello");
114
115    // Generate locales and imbue them to iostream
116    locale::global(gen(""));
117    cout.imbue(locale());
118
119    // Display a message using current system locale
120    cout << translate("Hello World") << endl;
121}
122\endcode
123
124
125\section message_translation Message Translation
126
127There are two ways to translate messages:
128
129-   using \ref boost_locale_translate_family "boost::locale::translate()" family of functions:
130    \n
131    These functions create a special proxy object \ref boost::locale::basic_message "basic_message"
132    that can be converted to string according to given locale or written to \c std::ostream
133    formatting the message in the \c std::ostream's locale.
134    \n
135    It is very convenient for working with \c std::ostream object and for postponing message
136    translation
137-   Using \ref boost_locale_gettext_family "boost::locale::gettext()" family of functions:
138    \n
139    These are functions that are used for direct message translation: they receive as a parameter
140    an original message or a key and convert it to the \c std::basic_string in given locale.
141    \n
142    These functions have similar names to thous used in the GNU Gettext library.
143
144\subsection indirect_message_translation Indirect Message Translation
145
146The basic function that allows us to translate a message is \ref boost_locale_translate_family  "boost::locale::translate()" family of functions.
147
148These functions use a character type \c CharType as template parameter and receive either <tt>CharType const *</tt> or <tt>std::basic_string<CharType></tt> as input.
149
150These functions receive an original message and return a special proxy
151object - \ref boost::locale::basic_message "basic_message<CharType>".
152This object holds all the required information for the message formatting.
153
154When this object is written to an output \c ostream, it performs a dictionary lookup of the message according to the locale
155imbued in \c iostream.
156
157If the message is found in the dictionary it is written to the output stream,
158otherwise the original string is written to the stream.
159
160For example:
161
162\code
163// Translate a simple message "Hello World!"
164std::cout << boost::locale::translate("Hello World!") << std::endl;
165\endcode
166
167This allows the program to postpone translation of the message until the translation is actually needed, even to different
168locale targets.
169
170\code
171// Several output stream that we write a message to
172// English, Japanese, Hebrew etc.
173// Each one them has installed std::locale object that represents
174// their specific locale
175std::ofstream en,ja,he,de,ar;
176
177// Send single message to multiple streams
178void send_to_all(message const &msg)
179{
180    // in each of the cases below
181    // the message is translated to different
182    // language
183    en << msg;
184    ja << msg;
185    he << msg;
186    de << msg;
187    ar << msg;
188}
189
190int main()
191{
192    ...
193    send_to_all(translate("Hello World"));
194}
195\endcode
196
197\note
198
199-   \ref boost::locale::basic_message "basic_message" can be implicitly converted
200    to an apopriate std::basic_string using
201    the global locale:
202    \n
203    \code
204        std::wstring msg = translate(L"Do you want to open the file?");
205    \endcode
206-   \ref boost::locale::basic_message "basic_message" can be explicitly converted
207    to a string using the \ref boost::locale::basic_message::str() "str()" member function for a specific locale.
208    \n
209    \code
210    std::locale ru_RU = ... ;
211    std::string msg = translate("Do you want to open the file?").str(ru_RU);
212    \endcode
213
214
215\subsection plural_forms Plural Forms
216
217GNU Gettext catalogs have simple, robust and yet powerful plural forms support. We recommend to read the
218original GNU documentation <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">here</a>.
219
220Let's try to solve a simple problem, displaying a message to the user:
221
222\code
223    if(files == 1)
224        cout << translate("You have 1 file in the directory") << endl;
225    else
226        cout << format(translate("You have {1} files in the directory")) % files << endl;
227\endcode
228
229This very simple task becomes quite complicated when we deal with languages other than English. Many languages have more
230than two plural forms. For example, in Hebrew there are special forms for single, double, plural, and plural above 10.
231They can't be distinguished by the simple rule "is n 1 or not"
232
233The correct solution is to give a translator an ability to choose a plural form on its own. Thus the translate
234function can receive two additional parameters English plural form a number: <tt>translate(single,plural,count)</tt>
235
236For example:
237
238\code
239cout << format(translate( "You have {1} file in the directory",
240                          "You have {1} files in the directory",
241                          files)) % files << endl;
242\endcode
243
244A special entry in the dictionary specifies the rule to choose the correct plural form in the target language.
245For example, the Slavic language family has 3 plural forms, that can be chosen using following equation:
246
247\code
248    plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
249\endcode
250
251Such equation is stored in the message catalog itself and it is evaluated during translation to supply the correct form.
252
253So the code above would display 3 different forms in Russian locale for values of 1, 3 and 5:
254
255\verbatim
256У вас есть 1 файл в каталоге
257У вас есть 3 файла в каталоге
258У вас есть 5 файлов в каталоге
259\endverbatim
260
261And for Japanese that does not have plural forms at all it would display the same message
262for any numeric value.
263
264For more detailed information please refer to GNU Gettext: <a href="http://www.gnu.org/software/gettext/manual/gettext.html#Plural-forms">11.2.6 Additional functions for plural forms</a>
265
266
267\subsection adding_context_information Adding Context Information
268
269In many cases it is not sufficient to provide only the original English string to get the correct translation.
270You sometimes need to provide some context information. In German, for example, a button labeled "open" is translated to
271"öffnen" in the context of "opening a file", or to "aufbauen" in the context of opening an internet connection.
272
273In these cases you must add some context information to the original string, by adding a comment.
274
275\code
276button->setLabel(translate("File","open"));
277\endcode
278
279The context information is provided as the first parameter to the \ref boost::locale::translate() "translate"
280function in both singular and plural forms. The translator would see this context information and would be able to translate the
281"open" string correctly.
282
283For example, this is how the \c po file would look:
284
285\code
286msgctxt "File"
287msgid "open"
288msgstr "öffnen"
289
290msgctxt "Internet Connection"
291msgid "open"
292msgstr "aufbauen"
293\endcode
294
295\note Context information requires more recent versions of the gettext tools (>=0.15) for extracting strings and
296formatting message catalogs.
297
298
299\subsection multiple_gettext_domain Working with multiple messages domains
300
301In some cases it is useful to work with multiple message domains.
302
303For example, if an application consists of several independent modules, it may
304have several domains - a separate domain for each module.
305
306For example, developing a FooBar office suite we might have:
307
308- a FooBar Word Processor, using the "foobarwriter" domain
309- a FooBar Spreadsheet, using the "foobarspreadsheet" domain
310- a FooBar Spell Checker, using the "foobarspell" domain
311- a FooBar File handler, using the "foobarodt" domain
312
313There are three ways to use non-default domains:
314
315-   When working with \c iostream, you can use the parameterized manipulator \ref
316    boost::locale::as::domain "as::domain(std::string const &)", which allows switching domains in a stream:
317    \n
318    \code
319    cout << as::domain("foo") << translate("Hello") << as::domain("bar") << translate("Hello");
320    // First translation is taken from dictionary foo and the other from dictionary bar
321    \endcode
322-   You can specify the domain explicitly when converting a \c message object to a string:
323    \code
324    std::wstring foo_msg = translate(L"Hello World").str("foo");
325    std::wstring bar_msg = translate(L"Hello World").str("bar");
326    \endcode
327-   You can specify the domain directly using a \ref direct_message_translation "convenience" interface:
328    \code
329    MessageBox(dgettext("gui","Error Occurred"));
330    \endcode
331
332\subsection direct_message_translation Direct translation (Convenience Interface)
333
334Many applications do not write messages directly to an output stream or use only one locale in the process, so
335calling <tt>translate("Hello World").str()</tt>  for a single message would be annoying. Thus Boost.Locale provides
336GNU Gettext-like localization functions for direct translation of the messages. However, unlike the GNU Gettext functions,
337the Boost.Locale translation functions provide an additional optional parameter (locale), and support wide, u16 and u32 strings.
338
339The GNU Gettext like functions prototypes can be found \ref boost_locale_gettext_family "in this section".
340
341
342All of these functions can have different prefixes for different forms:
343
344-  \c d - translation in specific domain
345-  \c n - plural form translation
346-  \c p - translation in specific context
347
348\code
349    MessageBoxW(0,pgettext(L"File Dialog",L"Open?").c_str(),gettext(L"Question").c_str(),MB_YESNO);
350\endcode
351
352
353\section extracting_messages_from_code Extracting messages from the source code
354
355There are many tools to extract messages from the source code into the \c .po file format. The most
356popular and "native" tool is \c xgettext which is installed by default on most Unix systems and freely downloadable
357for Windows (see \ref gettext_for_windows).
358
359For example, we have a source file called \c dir.cpp that prints:
360
361\code
362    cout << format(translate("Listing of catalog {1}:")) % file_name << endl;
363    cout << format(translate("Catalog {1} contains 1 file","Catalog {1} contains {2,num} files",files_no))
364            % file_name % files_no << endl;
365\endcode
366
367Now we run:
368
369\verbatim
370xgettext --keyword=translate:1,1t --keyword=translate:1,2,3t dir.cpp
371\endverbatim
372
373And a file called \c messages.po created that looks like this (approximately):
374
375\code
376#: dir.cpp:1
377msgid "Listing of catalog {1}:"
378msgstr ""
379
380#: dir.cpp:2
381msgid "Catalog {1} contains 1 file"
382msgid_plural "Catalog {1} contains {2,num} files"
383msgstr[0] ""
384msgstr[1] ""
385\endcode
386
387This file can be given to translators to adapt it to specific languages.
388
389We used the \c --keyword  parameter of \c xgettext to make it suitable for extracting messages from
390source code localized with Boost.Locale, searching for <tt>translate()</tt> function calls instead of the default <tt>gettext()</tt>
391and <tt>ngettext()</tt> ones.
392The first parameter <tt>--keyword=translate:1,1t</tt> provides the template for basic messages: a \c translate function that is
393called with 1 argument (1t) and the first message is taken as the key. The second one <tt>--keyword=translate:1,2,3t</tt> is used
394for plural forms.
395It tells \c xgettext to use a <tt>translate()</tt> function call with 3 parameters (3t) and take the 1st and 2nd parameter as keys. An
396additional marker \c Nc can be used to mark context information.
397
398The full set of xgettext parameters suitable for Boost.Locale is:
399
400\code
401xgettext --keyword=translate:1,1t --keyword=translate:1c,2,2t       \
402         --keyword=translate:1,2,3t --keyword=translate:1c,2,3,4t   \
403         --keyword=gettext:1 --keyword=pgettext:1c,2                \
404         --keyword=ngettext:1,2 --keyword=npgettext:1c,2,3          \
405         source_file_1.cpp ... source_file_N.cpp
406\endcode
407
408Of course, if you do not use "gettext" like translation you
409may ignore some of these parameters.
410
411\subsection custom_file_system_support Custom Filesystem Support
412
413When the access to actual file system is limited like in ActiveX controls or
414when the developer wants to ship all-in-one executable file,
415it is useful to be able to load \c gettext  catalogs from a custom location -
416a custom file system.
417
418Boost.Locale provides an option to install boost::locale::message_format facet
419with customized options provided in boost::locale::gnu_gettext::messages_info structure.
420
421This structure contains \c boost::function based
422\ref boost::locale::gnu_gettext::messages_info::callback_type "callback"
423that allows user to provide custom functionality to load message catalog files.
424
425For example:
426
427\code
428// Configure all options for message catalog
429namespace blg = boost::locale::gnu_gettext;
430blg::messages_info info;
431info.language = "he";
432info.country = "IL";
433info.encoding="UTF-8";
434info.paths.push_back(""); // You need some even empty path
435info.domains.push_back(blg::messages_info::domain("my_app"));
436info.callback = some_file_loader; // Provide a callback
437
438// Create a basic locale without messages support
439boost::locale::generator gen;
440std::locale base_locale = gen("he_IL.UTF-8");
441
442// Install messages catalogs for "char" support to the final locale
443// we are going to use
444std::locale real_locale(base_locale,blg::create_messages_facet<char>(info));
445\endcode
446
447In order to setup \ref boost::locale::gnu_gettext::messages_info::language "language", \ref boost::locale::gnu_gettext::messages_info::country "country" and other members you may use \ref boost::locale::info facet for convenience,
448
449\code
450// Configure all options for message catalog
451namespace blg = boost::locale::gnu_gettext;
452blg::messages_info info;
453
454info.paths.push_back(""); // You need some even empty path
455info.domains.push_back(blg::messages_info::domain("my_app"));
456info.callback = some_file_loader; // Provide a callback
457
458// Create an object with default locale
459std::locale base_locale = gen("");
460
461// Use boost::locale::info to configure all parameters
462
463boost::locale::info const &properties = std::use_facet<boost::locale::info>(base_locale);
464info.language = properties.language();
465info.country  = properties.country();
466info.encoding = properties.encoding();
467info.variant  = properties.variant();
468
469// Install messages catalogs to the final locale
470std::locale real_locale(base_locale,blg::create_messages_facet<char>(info));
471\endcode
472
473\section msg_non_ascii_keys Non US-ASCII Keys
474
475Boost.Locale assumes that you use English for original text messages. And the best
476practice is to use US-ASCII characters for original keys.
477
478However in some cases it us useful in insert some Unicode characters in text like
479for example Copyright "©" character.
480
481As long as your narrow character string encoding is UTF-8 nothing further should be done.
482
483Boost.Locale assumes that your sources are encoded in UTF-8 and the input narrow
484string use UTF-8 - which is the default for most compilers around (with notable
485exception of Microsoft Visual C++).
486
487However if your narrow strings encoding in the source file is not UTF-8 but some other
488encoding like windows-1252, the string would be misinterpreted.
489
490You can specify the character set of the original strings when you specify the
491domain name for the application.
492
493\code
494#include <boost/locale.hpp>
495#include <iostream>
496
497using namespace std;
498using namespace boost::locale;
499
500int main()
501{
502    generator gen;
503
504    // Specify location of dictionaries
505    gen.add_messages_path(".");
506    // Specify the encoding of the source string
507    gen.add_messages_domain("copyrighted/windows-1255");
508
509    // Generate locales and imbue them to iostream
510    locale::global(gen(""));
511    cout.imbue(locale());
512
513    // In Windows 1255 (C) symbol is encoded as 0xA9
514    cout << translate("© 2001 All Rights Reserved") << endl;
515}
516\endcode
517
518Thus if the programs runs in UTF-8 locale the copyright symbol would
519be automatically converted to an appropriate UTF-8 sequence if the
520key is missing in the dictionary.
521
522
523\subsection msg_qna Questions and Answers
524
525-   Do I need GNU Gettext to use Boost.Locale?
526    \n
527    Boost.Locale provides a run-time environment to load and use GNU Gettext message catalogs, but it does
528    not provide tools for generation, translation, compilation and management of these catalogs.
529    Boost.Locale only reimplements the GNU Gettext libintl.
530    \n
531    You would probably need:
532    \n
533    -#  Boost.Locale itself -- for runtime.
534    -#  A tool for extracting strings from source code, and managing them: GNU Gettext provides good tools, but other
535        implementations are available as well.
536    -#  A good translation program like <a href="http://userbase.kde.org/Lokalize">Lokalize</a>, <a href="http://www.poedit.net/">Pedit</a> or <a href="http://projects.gnome.org/gtranslator/">GTranslator</a>.
537
538-   Why doesn't Boost.Locale provide tools for extracting and management of message catalogs. Why should
539    I use GPL-ed software? Are my programs or message catalogs affected by its license?
540    \n
541    -#  Boost.Locale does not link to or use any of the GNU Gettext code, so you need not worry about your code as
542        the runtime library is fully reimplemented.
543    -#  You may freely use GPL-ed software for extracting and managing catalogs, the same way as you are free to use
544        a GPL-ed editor. It does not affect your message catalogs or your code.
545    -#  I see no reason to reimplement well debugged, working tools like \c xgettext, \c msgfmt, \c msgmerge that
546        do a very fine job, especially as they are freely available for download and support almost any platform.
547    All Linux distributions, BSD Flavors, Mac OS X and other Unix like operating systems provide GNU Gettext tools
548    as a standard package.\n
549    Windows users can get GNU Gettext utilities via MinGW project. See \ref gettext_for_windows.
550
551
552-   Is there any reason to prefer the Boost.Locale implementation to the original GNU Gettext runtime library?
553    In either case I would probably need some of the GNU tools.
554    \n
555    There are two important differences between the GNU Gettext runtime library and the Boost.Locale implementation:
556    \n
557    -#  The GNU Gettext runtime supports only one locale per process. It is not thread-safe to use multiple locales
558        and encodings in the same process. This is perfectly fine for applications that interact directly with
559        a single user like most GUI applications, but is problematic for services and servers.
560    -#  The GNU Gettext API supports only 8-bit encodings, making it irrelevant in environments that natively use
561        wide strings.
562    -#  The GNU Gettext runtime library distributed under LGPL license which may be not convenient for some users.
563
564*/
565
566
567