• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1//
2//  Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
3//
4//  Distributed under the Boost Software License, Version 1.0. (See
5//  accompanying file LICENSE_1_0.txt or copy at
6//  http://www.boost.org/LICENSE_1_0.txt)
7//
8
9// vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 filetype=cpp.doxygen
10/*!
11\page recommendations_and_myths Recommendations and Myths
12
13\section recommendations Recommendations
14
15-   The first and most important recommendation: prefer UTF-8 encoding for narrow strings --- it represents all
16    supported Unicode characters and is more convenient for general use than encodings like Latin1.
17-   Remember, there are many different cultures. You can assume very little about the user's language. His calendar
18    may not have "January". It may be not possible to convert strings to integers using \c atoi because
19    they may not use the "ordinary" digits 0..9 at all. You can't assume that "space" characters are frequent
20    because in Chinese the space character does not separate words. The text may be written from Right-to-Left or
21    from Up-to-Down, and so on.
22-   Using message formatting, try to provide as much context information as you can. Prefer translating entire
23    sentences over single words. When translating words, \b always add some context information.
24
25
26\section myths Myths
27
28\subsection myths_wide To use Unicode in my application I should use wide strings everywhere.
29
30Unicode is not limited to wide strings. Both \c std::string and \c std::wstring
31can hold and process Unicode text. More than that, the semantics of \c std::string
32are much cleaner in multi-platform applications, because all "Unicode" strings are
33UTF-8. "Wide" strings may be encoded in "UTF-16" or "UTF-32", depending
34on the platform, so they may be even less convenient when dealing with Unicode than
35\c char based strings.
36
37\subsection myths_utf16 UTF-16 is the best encoding to work with.
38
39There is common assumption that UTF-16 is the best encoding for storing information because it gives "shortest" representation
40of strings.
41
42In fact, it is probably the most error-prone encoding to work with. The biggest issue is code points that lay outside of the BMP,
43which must be represented with surrogate pairs. These characters are very rare and many applications are not tested with them.
44
45For example:
46
47-   Qt3 could not deal with characters outside of the BMP.
48-   Editing a character with a codepoint above 0xFFFF often shows an unpleasant bug: for example, to erase
49    such a character in Windows Notepad you have to press backspace twice.
50
51So UTF-16 can be used for Unicode, in fact ICU and many other applications use UTF-16 as their internal Unicode representation, but
52you should be very careful and never assume one-code-point == one-utf16-character.
53
54*/
55
56
57