1<html> 2<head> 3<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> 4<title>Unicode and Boost.Regex</title> 5<link rel="stylesheet" href="../../../../../doc/src/boostbook.css" type="text/css"> 6<meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> 7<link rel="home" href="../index.html" title="Boost.Regex 5.1.4"> 8<link rel="up" href="../index.html" title="Boost.Regex 5.1.4"> 9<link rel="prev" href="intro.html" title="Introduction and Overview"> 10<link rel="next" href="captures.html" title="Understanding Marked Sub-Expressions and Captures"> 11</head> 12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> 13<table cellpadding="2" width="100%"><tr> 14<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td> 15<td align="center"><a href="../../../../../index.html">Home</a></td> 16<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td> 17<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td> 18<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td> 19<td align="center"><a href="../../../../../more/index.htm">More</a></td> 20</tr></table> 21<hr> 22<div class="spirit-nav"> 23<a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 24</div> 25<div class="section"> 26<div class="titlepage"><div><div><h2 class="title" style="clear: both"> 27<a name="boost_regex.unicode"></a><a class="link" href="unicode.html" title="Unicode and Boost.Regex">Unicode and Boost.Regex</a> 28</h2></div></div></div> 29<p> 30 There are two ways to use Boost.Regex with Unicode strings: 31 </p> 32<h5> 33<a name="boost_regex.unicode.h0"></a> 34 <span class="phrase"><a name="boost_regex.unicode.rely_on_wchar_t"></a></span><a class="link" href="unicode.html#boost_regex.unicode.rely_on_wchar_t">Rely 35 on wchar_t</a> 36 </h5> 37<p> 38 If your platform's <code class="computeroutput"><span class="keyword">wchar_t</span></code> type 39 can hold Unicode strings, and your platform's C/C++ runtime correctly handles 40 wide character constants (when passed to <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswspace</span></code> 41 <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">iswlower</span></code> etc), then you can use <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">wregex</span></code> 42 to process Unicode. However, there are several disadvantages to this approach: 43 </p> 44<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "> 45<li class="listitem"> 46 It's not portable: there's no guarantee on the width of <code class="computeroutput"><span class="keyword">wchar_t</span></code>, 47 or even whether the runtime treats wide characters as Unicode at all, most 48 Windows compilers do so, but many Unix systems do not. 49 </li> 50<li class="listitem"> 51 There's no support for Unicode-specific character classes: <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Nd</span><span class="special">:]]</span></code>, <code class="computeroutput"><span class="special">[[:</span><span class="identifier">Po</span><span class="special">:]]</span></code> 52 etc. 53 </li> 54<li class="listitem"> 55 You can only search strings that are encoded as sequences of wide characters, 56 it is not possible to search UTF-8, or even UTF-16 on many platforms. 57 </li> 58</ul></div> 59<h5> 60<a name="boost_regex.unicode.h1"></a> 61 <span class="phrase"><a name="boost_regex.unicode.use_a_unicode_aware_regular_expr"></a></span><a class="link" href="unicode.html#boost_regex.unicode.use_a_unicode_aware_regular_expr">Use 62 a Unicode Aware Regular Expression Type.</a> 63 </h5> 64<p> 65 If you have the <a href="http://www.ibm.com/software/globalization/icu/" target="_top">ICU 66 library</a>, then Boost.Regex can be <a class="link" href="install.html#boost_regex.install.building_with_unicode_and_icu_su">configured 67 to make use of it</a>, and provide a distinct regular expression type (boost::u32regex), 68 that supports both Unicode specific character properties, and the searching 69 of text that is encoded in either UTF-8, UTF-16, or UTF-32. See: <a class="link" href="ref/non_std_strings/icu.html" title="Working With Unicode and ICU String Types">ICU 70 string class support</a>. 71 </p> 72</div> 73<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr> 74<td align="left"></td> 75<td align="right"><div class="copyright-footer">Copyright © 1998-2013 John Maddock<p> 76 Distributed under the Boost Software License, Version 1.0. (See accompanying 77 file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>) 78 </p> 79</div></td> 80</tr></table> 81<hr> 82<div class="spirit-nav"> 83<a accesskey="p" href="intro.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="captures.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a> 84</div> 85</body> 86</html> 87