1<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> 2<html xmlns="http://www.w3.org/1999/xhtml"> 3<head> 4<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/> 5<meta http-equiv="X-UA-Compatible" content="IE=9"/> 6<meta name="generator" content="Doxygen 1.8.6"/> 7<title>Boost.Locale: Text Conversions</title> 8<link href="tabs.css" rel="stylesheet" type="text/css"/> 9<script type="text/javascript" src="jquery.js"></script> 10<script type="text/javascript" src="dynsections.js"></script> 11<link href="navtree.css" rel="stylesheet" type="text/css"/> 12<script type="text/javascript" src="resize.js"></script> 13<script type="text/javascript" src="navtree.js"></script> 14<script type="text/javascript"> 15 $(document).ready(initResizable); 16 $(window).load(resizeHeight); 17</script> 18<link href="doxygen.css" rel="stylesheet" type="text/css" /> 19</head> 20<body> 21<div id="top"><!-- do not remove this div, it is closed by doxygen! --> 22<div id="titlearea"> 23<table cellspacing="0" cellpadding="0"> 24 <tbody> 25 <tr style="height: 56px;"> 26 <td id="projectlogo"><img alt="Logo" src="boost-small.png"/></td> 27 <td style="padding-left: 0.5em;"> 28 <div id="projectname">Boost.Locale 29 </div> 30 </td> 31 </tr> 32 </tbody> 33</table> 34</div> 35<!-- end header part --> 36<!-- Generated by Doxygen 1.8.6 --> 37 <div id="navrow1" class="tabs"> 38 <ul class="tablist"> 39 <li><a href="index.html"><span>Main Page</span></a></li> 40 <li class="current"><a href="pages.html"><span>Related Pages</span></a></li> 41 <li><a href="modules.html"><span>Modules</span></a></li> 42 <li><a href="namespaces.html"><span>Namespaces</span></a></li> 43 <li><a href="annotated.html"><span>Classes</span></a></li> 44 <li><a href="files.html"><span>Files</span></a></li> 45 <li><a href="examples.html"><span>Examples</span></a></li> 46 </ul> 47 </div> 48</div><!-- top --> 49<div id="side-nav" class="ui-resizable side-nav-resizable"> 50 <div id="nav-tree"> 51 <div id="nav-tree-contents"> 52 <div id="nav-sync" class="sync"></div> 53 </div> 54 </div> 55 <div id="splitbar" style="-moz-user-select:none;" 56 class="ui-resizable-handle"> 57 </div> 58</div> 59<script type="text/javascript"> 60$(document).ready(function(){initNavTree('conversions.html','');}); 61</script> 62<div id="doc-content"> 63<div class="header"> 64 <div class="headertitle"> 65<div class="title">Text Conversions </div> </div> 66</div><!--header--> 67<div class="contents"> 68<div class="textblock"><p>There is a set of functions that perform basic string conversion operations: upper, lower and <a class="el" href="glossary.html#term_title_case">title case</a> conversions, <a class="el" href="glossary.html#term_case_folding">case folding</a> and Unicode <a class="el" href="glossary.html#term_normalization">normalization</a>. These are <a class="el" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">to_upper</a> , <a class="el" href="group__convert.html#ga4a3eb15f42f5cbae7bdd00c9e9cac222">to_lower</a>, <a class="el" href="group__convert.html#ga684efb375e060c71cd3e1799a6329f7f">to_title</a>, <a class="el" href="group__convert.html#gadf59d16355babd955766deef89d470ea">fold_case</a> and <a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a>.</p> 69<p>All these functions receive an <code>std::locale</code> object as parameter or use a global locale by default.</p> 70<p>Global locale is used in all examples below.</p> 71<h1><a class="anchor" id="conversions_case"></a> 72Case Handing</h1> 73<p>For example: </p> 74<div class="fragment"><div class="line">std::string grussen = <span class="stringliteral">"grüßEN"</span>;</div> 75<div class="line">std::cout <<<span class="stringliteral">"Upper "</span><< <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) << std::endl</div> 76<div class="line"> <<<span class="stringliteral">"Lower "</span><< <a class="code" href="group__convert.html#ga4a3eb15f42f5cbae7bdd00c9e9cac222">boost::locale::to_lower</a>(grussen) << std::endl</div> 77<div class="line"> <<<span class="stringliteral">"Title "</span><< <a class="code" href="group__convert.html#ga684efb375e060c71cd3e1799a6329f7f">boost::locale::to_title</a>(grussen) << std::endl</div> 78<div class="line"> <<<span class="stringliteral">"Fold "</span><< <a class="code" href="group__convert.html#gadf59d16355babd955766deef89d470ea">boost::locale::fold_case</a>(grussen) << std::endl;</div> 79</div><!-- fragment --><p>Would print:</p> 80<pre class="fragment">Upper GRÜSSEN 81Lower grüßen 82Title Grüßen 83Fold grüssen 84</pre><p>You may notice that there are existing functions <code>to_upper</code> and <code>to_lower</code> in the Boost.StringAlgo library. The difference is that these function operate over an entire string instead of performing incorrect character-by-character conversions.</p> 85<p>For example:</p> 86<div class="fragment"><div class="line">std::wstring grussen = L<span class="stringliteral">"grüßen"</span>;</div> 87<div class="line">std::wcout << boost::algorithm::to_upper_copy(grussen) << <span class="stringliteral">" "</span> << <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) << std::endl;</div> 88</div><!-- fragment --><p>Would give in output:</p> 89<pre class="fragment">GRÜßEN GRÜSSEN 90</pre><p>Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of <code>std::ctype</code> facet.</p> 91<p>This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all. For example, this code</p> 92<div class="fragment"><div class="line">std::string grussen = <span class="stringliteral">"grüßen"</span>;</div> 93<div class="line">std::cout << boost::algorithm::to_upper_copy(grussen) << <span class="stringliteral">" "</span> << <a class="code" href="group__convert.html#ga2ceae621801e8cf4f77c60d1e3047ae8">boost::locale::to_upper</a>(grussen) << std::endl;</div> 94</div><!-- fragment --><p>Would modify ASCII characters only</p> 95<pre class="fragment">GRüßEN GRÜSSEN 96</pre><h1><a class="anchor" id="conversions_normalization"></a> 97Unicode Normalization</h1> 98<p>Unicode normalization is the process of converting strings to a standard form, suitable for text processing and comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the diaeresis "¨". Normalization is an important part of Unicode text processing.</p> 99<p>Unicode defines four normalization forms. Each specific form is selected by a flag passed to <a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a> function:</p> 100<ul> 101<li>NFD - Canonical decomposition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa6648d0eabb931f2e9d258570b297e98f" title="Canonical decomposition. ">boost::locale::norm_nfd</a></li> 102<li>NFC - Canonical decomposition followed by canonical composition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9faf6fe7be275e5e13df415ab258105ada0" title="Canonical decomposition followed by canonical composition. ">boost::locale::norm_nfc</a> or <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9faa29173d73d9be7fefcbb18c8712465d2" title="Default normalization - canonical decomposition followed by canonical composition. ">boost::locale::norm_default</a></li> 103<li>NFKD - Compatibility decomposition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa0fbc2ac042fc6f58af5818bfd06d5379" title="Compatibility decomposition. ">boost::locale::norm_nfkd</a></li> 104<li>NFKC - Compatibility decomposition followed by canonical composition - <a class="el" href="group__convert.html#gga6a595a415b83b8a0c8f14c34eb66cc9fa0305c1f3405ea70facf4c6a5ffa40583" title="Compatibility decomposition followed by canonical composition. ">boost::locale::norm_nfkc</a></li> 105</ul> 106<p>For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this article</a>.</p> 107<h1><a class="anchor" id="conversions_notes"></a> 108Notes</h1> 109<ul> 110<li><a class="el" href="group__convert.html#ga867733c9d4455aaa13a42cf67367d575">normalize</a> operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the character width. So be careful when using non-UTF encodings as they may be treated incorrectly.</li> 111<li><a class="el" href="group__convert.html#gadf59d16355babd955766deef89d470ea">fold_case</a> is generally a locale-independent operation, but it receives a locale as a parameter to determine the 8-bit encoding.</li> 112<li>All of these functions can work with an STL string, a NUL terminated string, or a range defined by two pointers. They always return a newly created STL string.</li> 113<li>The length of the string may change, see the above example. </li> 114</ul> 115</div></div><!-- contents --> 116</div><!-- doc-content --> 117 118 <li class="footer"> 119© Copyright 2009-2012 Artyom Beilis, Distributed under the <a href="http://www.boost.org/LICENSE_1_0.txt">Boost Software License</a>, Version 1.0. 120 </li> 121 </ul> 122 </div> 123</body> 124</html> 125