1[/ 2 / Copyright (c) 2008 Eric Niebler 3 / 4 / Distributed under the Boost Software License, Version 1.0. (See accompanying 5 / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) 6 /] 7 8[section String Substitutions] 9 10Regular expressions are not only good for searching text; they're good at ['manipulating] it. And one of the 11most common text manipulation tasks is search-and-replace. xpressive provides the _regex_replace_ algorithm for 12searching and replacing. 13 14[h2 regex_replace()] 15 16Performing search-and-replace using _regex_replace_ is simple. All you need is an input sequence, a regex object, 17and a format string or a formatter object. There are several versions of the _regex_replace_ algorithm. Some accept 18the input sequence as a bidirectional container such as `std::string` and returns the result in a new container 19of the same type. Others accept the input as a null terminated string and return a `std::string`. Still others 20accept the input sequence as a pair of iterators and writes the result into an output iterator. The substitution 21may be specified as a string with format sequences or as a formatter object. Below are some simple examples of 22using string-based substitutions. 23 24 std::string input("This is his face"); 25 sregex re = as_xpr("his"); // find all occurrences of "his" ... 26 std::string format("her"); // ... and replace them with "her" 27 28 // use the version of regex_replace() that operates on strings 29 std::string output = regex_replace( input, re, format ); 30 std::cout << output << '\n'; 31 32 // use the version of regex_replace() that operates on iterators 33 std::ostream_iterator< char > out_iter( std::cout ); 34 regex_replace( out_iter, input.begin(), input.end(), re, format ); 35 36The above program prints out the following: 37 38[pre 39Ther is her face 40Ther is her face 41] 42 43Notice that ['all] the occurrences of `"his"` have been replaced with `"her"`. 44 45Click [link boost_xpressive.user_s_guide.examples.replace_all_sub_strings_that_match_a_regex here] to see 46a complete example program that shows how to use _regex_replace_. And check the _regex_replace_ reference 47to see a complete list of the available overloads. 48 49[h2 Replace Options] 50 51The _regex_replace_ algorithm takes an optional bitmask parameter to control the formatting. The 52possible values of the bitmask are: 53 54[table Format Flags 55 [[Flag] [Meaning]] 56 [[`format_default`] [Recognize the ECMA-262 format sequences (see below).]] 57 [[`format_first_only`] [Only replace the first match, not all of them.]] 58 [[`format_no_copy`] [Don't copy the parts of the input sequence that didn't match the regex 59 to the output sequence.]] 60 [[`format_literal`] [Treat the format string as a literal; that is, don't recognize any 61 escape sequences.]] 62 [[`format_perl`] [Recognize the Perl format sequences (see below).]] 63 [[`format_sed`] [Recognize the sed format sequences (see below).]] 64 [[`format_all`] [In addition to the Perl format sequences, recognize some 65 Boost-specific format sequences.]] 66] 67 68These flags live in the `xpressive::regex_constants` namespace. If the substitution parameter is 69a function object instead of a string, the flags `format_literal`, `format_perl`, `format_sed`, and 70`format_all` are ignored. 71 72[h2 The ECMA-262 Format Sequences] 73 74When you haven't specified a substitution string dialect with one of the format flags above, 75you get the dialect defined by ECMA-262, the standard for ECMAScript. The table below shows 76the escape sequences recognized in ECMA-262 mode. 77 78[table Format Escape Sequences 79 [[Escape Sequence] [Meaning]] 80 [[[^$1], [^$2], etc.] [the corresponding sub-match]] 81 [[[^$&]] [the full match]] 82 [[[^$\`]] [the match prefix]] 83 [[[^$']] [the match suffix]] 84 [[[^$$]] [a literal `'$'` character]] 85] 86 87Any other sequence beginning with `'$'` simply represents itself. For example, if the format string were 88`"$a"` then `"$a"` would be inserted into the output sequence. 89 90[h2 The Sed Format Sequences] 91 92When specifying the `format_sed` flag to _regex_replace_, the following escape sequences 93are recognized: 94 95[table Sed Format Escape Sequences 96 [[Escape Sequence] [Meaning]] 97 [[[^\\1], [^\\2], etc.] [The corresponding sub-match]] 98 [[[^&]] [the full match]] 99 [[[^\\a]] [A literal `'\a'`]] 100 [[[^\\e]] [A literal `char_type(27)`]] 101 [[[^\\f]] [A literal `'\f'`]] 102 [[[^\\n]] [A literal `'\n'`]] 103 [[[^\\r]] [A literal `'\r'`]] 104 [[[^\\t]] [A literal `'\t'`]] 105 [[[^\\v]] [A literal `'\v'`]] 106 [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]] 107 [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]] 108 [[[^\\cX]] [The control character [^['X]]]] 109] 110 111[h2 The Perl Format Sequences] 112 113When specifying the `format_perl` flag to _regex_replace_, the following escape sequences 114are recognized: 115 116[table Perl Format Escape Sequences 117 [[Escape Sequence] [Meaning]] 118 [[[^$1], [^$2], etc.] [the corresponding sub-match]] 119 [[[^$&]] [the full match]] 120 [[[^$\`]] [the match prefix]] 121 [[[^$']] [the match suffix]] 122 [[[^$$]] [a literal `'$'` character]] 123 [[[^\\a]] [A literal `'\a'`]] 124 [[[^\\e]] [A literal `char_type(27)`]] 125 [[[^\\f]] [A literal `'\f'`]] 126 [[[^\\n]] [A literal `'\n'`]] 127 [[[^\\r]] [A literal `'\r'`]] 128 [[[^\\t]] [A literal `'\t'`]] 129 [[[^\\v]] [A literal `'\v'`]] 130 [[[^\\xFF]] [A literal `char_type(0xFF)`, where [^['F]] is any hex digit]] 131 [[[^\\x{FFFF}]] [A literal `char_type(0xFFFF)`, where [^['F]] is any hex digit]] 132 [[[^\\cX]] [The control character [^['X]]]] 133 [[[^\\l]] [Make the next character lowercase]] 134 [[[^\\L]] [Make the rest of the substitution lowercase until the next [^\\E]]] 135 [[[^\\u]] [Make the next character uppercase]] 136 [[[^\\U]] [Make the rest of the substitution uppercase until the next [^\\E]]] 137 [[[^\\E]] [Terminate [^\\L] or [^\\U]]] 138 [[[^\\1], [^\\2], etc.] [The corresponding sub-match]] 139 [[[^\\g<name>]] [The named backref /name/]] 140] 141 142[h2 The Boost-Specific Format Sequences] 143 144When specifying the `format_all` flag to _regex_replace_, the escape sequences 145recognized are the same as those above for `format_perl`. In addition, conditional 146expressions of the following form are recognized: 147 148[pre 149?Ntrue-expression:false-expression 150] 151 152where /N/ is a decimal digit representing a sub-match. If the corresponding sub-match 153participated in the full match, then the substitution is /true-expression/. Otherwise, 154it is /false-expression/. In this mode, you can use parens [^()] for grouping. If you 155want a literal paren, you must escape it as [^\\(]. 156 157[h2 Formatter Objects] 158 159Format strings are not always expressive enough for all your text substitution 160needs. Consider the simple example of wanting to map input strings to output 161strings, as you may want to do with environment variables. Rather than a format 162/string/, for this you would use a formatter /object/. Consider the following 163code, which finds embedded environment variables of the form `"$(XYZ)"` and 164computes the substitution string by looking up the environment variable in a 165map. 166 167 #include <map> 168 #include <string> 169 #include <iostream> 170 #include <boost/xpressive/xpressive.hpp> 171 using namespace boost; 172 using namespace xpressive; 173 174 std::map<std::string, std::string> env; 175 176 std::string const &format_fun(smatch const &what) 177 { 178 return env[what[1].str()]; 179 } 180 181 int main() 182 { 183 env["X"] = "this"; 184 env["Y"] = "that"; 185 186 std::string input("\"$(X)\" has the value \"$(Y)\""); 187 188 // replace strings like "$(XYZ)" with the result of env["XYZ"] 189 sregex envar = "$(" >> (s1 = +_w) >> ')'; 190 std::string output = regex_replace(input, envar, format_fun); 191 std::cout << output << std::endl; 192 } 193 194In this case, we use a function, `format_fun()` to compute the substitution string 195on the fly. It accepts a _match_results_ object which contains the results of the 196current match. `format_fun()` uses the first submatch as a key into the global `env` 197map. The above code displays: 198 199[pre 200"this" has the value "that" 201] 202 203The formatter need not be an ordinary function. It may be an object of class type. 204And rather than return a string, it may accept an output iterator into which it 205writes the substitution. Consider the following, which is functionally equivalent 206to the above. 207 208 #include <map> 209 #include <string> 210 #include <iostream> 211 #include <boost/xpressive/xpressive.hpp> 212 using namespace boost; 213 using namespace xpressive; 214 215 struct formatter 216 { 217 typedef std::map<std::string, std::string> env_map; 218 env_map env; 219 220 template<typename Out> 221 Out operator()(smatch const &what, Out out) const 222 { 223 env_map::const_iterator where = env.find(what[1]); 224 if(where != env.end()) 225 { 226 std::string const &sub = where->second; 227 out = std::copy(sub.begin(), sub.end(), out); 228 } 229 return out; 230 } 231 232 }; 233 234 int main() 235 { 236 formatter fmt; 237 fmt.env["X"] = "this"; 238 fmt.env["Y"] = "that"; 239 240 std::string input("\"$(X)\" has the value \"$(Y)\""); 241 242 sregex envar = "$(" >> (s1 = +_w) >> ')'; 243 std::string output = regex_replace(input, envar, fmt); 244 std::cout << output << std::endl; 245 } 246 247The formatter must be a callable object -- a function or a function object -- 248that has one of three possible signatures, detailed in the table below. For 249the table, `fmt` is a function pointer or function object, `what` is a 250_match_results_ object, `out` is an OutputIterator, and `flags` is a value 251of `regex_constants::match_flag_type`: 252 253[table Formatter Signatures 254[ 255 [Formatter Invocation] 256 [Return Type] 257 [Semantics] 258] 259[ 260 [`fmt(what)`] 261 [Range of characters (e.g. `std::string`) or null-terminated string] 262 [The string matched by the regex is replaced with the string returned by 263 the formatter.] 264] 265[ 266 [`fmt(what, out)`] 267 [OutputIterator] 268 [The formatter writes the replacement string into `out` and returns `out`.] 269] 270[ 271 [`fmt(what, out, flags)`] 272 [OutputIterator] 273 [The formatter writes the replacement string into `out` and returns `out`. 274 The `flags` parameter is the value of the match flags passed to the 275 _regex_replace_ algorithm.] 276] 277] 278 279[h2 Formatter Expressions] 280 281In addition to format /strings/ and formatter /objects/, _regex_replace_ also 282accepts formatter /expressions/. A formatter expression is a lambda expression 283that generates a string. It uses the same syntax as that for 284[link boost_xpressive.user_s_guide.semantic_actions_and_user_defined_assertions 285Semantic Actions], which are covered later. The above example, which uses 286_regex_replace_ to substitute strings for environment variables, is repeated 287here using a formatter expression. 288 289 #include <map> 290 #include <string> 291 #include <iostream> 292 #include <boost/xpressive/xpressive.hpp> 293 #include <boost/xpressive/regex_actions.hpp> 294 using namespace boost::xpressive; 295 296 int main() 297 { 298 std::map<std::string, std::string> env; 299 env["X"] = "this"; 300 env["Y"] = "that"; 301 302 std::string input("\"$(X)\" has the value \"$(Y)\""); 303 304 sregex envar = "$(" >> (s1 = +_w) >> ')'; 305 std::string output = regex_replace(input, envar, ref(env)[s1]); 306 std::cout << output << std::endl; 307 } 308 309In the above, the formatter expression is `ref(env)[s1]`. This means to use the 310value of the first submatch, `s1`, as a key into the `env` map. The purpose of 311`xpressive::ref()` here is to make the reference to the `env` local variable /lazy/ 312so that the index operation is deferred until we know what to replace `s1` with. 313 314[endsect] 315