1[/ 2 Copyright 2006-2007 John Maddock. 3 Distributed under the Boost Software License, Version 1.0. 4 (See accompanying file LICENSE_1_0.txt or copy at 5 http://www.boost.org/LICENSE_1_0.txt). 6] 7 8[section:partial_matches Partial Matches] 9 10The [match_flag_type] `match_partial` can be passed to the following algorithms: 11[regex_match], [regex_search], and [regex_grep], and used with the 12iterator [regex_iterator]. When used it indicates that partial as 13well as full matches should be found. A partial match is one that 14matched one or more characters at the end of the text input, but 15did not match all of the regular expression (although it may have done 16so had more input been available). Partial matches are typically used 17when either validating data input (checking each character as it is 18entered on the keyboard), or when searching texts that are either too long 19to load into memory (or even into a memory mapped file), or are of 20indeterminate length (for example the source may be a socket or similar). 21Partial and full matches can be differentiated as shown in the following 22table (the variable M represents an instance of [match_results] as filled in 23by [regex_match], [regex_search] or [regex_grep]): 24 25[table 26[[ ][Result][M\[0\].matched][M\[0\].first][M\[0\].second]] 27[[No match][False][Undefined][Undefined][Undefined]] 28[[Partial match][True][False][Start of partial match.][End of partial match (end of text).]] 29[[Full match][True][True][Start of full match.][End of full match.]] 30] 31 32Be aware that using partial matches can sometimes result in somewhat 33imperfect behavior: 34 35* There are some expressions, such as ".\*abc" that will always produce a partial match. This problem can be reduced by careful construction of the regular expressions used, or by setting flags like match_not_dot_newline so that expressions like .\* can't match past line boundaries. 36* Boost.Regex currently prefers leftmost matches to full matches, so for example matching "abc|b" against "ab" produces a partial match against the "ab" rather than a full match against "b". It's more efficient to work this way, but may not be the behavior you want in all situations. 37* There are situations where full matches are found even though partial matches are also possible: for example if the partial string terminates with "abc" and the regular expression is "\w+", then a full match is found 38even though there may be more alphabetical characters to come. This particular case can be detected by checking if the match found terminates at the end of current input string. However, there are situations where 39that is not possible: for example an expression such as "abc.*123" may always have longer matches available since it could conceivably match the entire input string (no matter how long it may be). 40 41The following example tests to see whether the text could be a valid 42credit card number, as the user presses a key, the character entered 43would be added to the string being built up, and passed to `is_possible_card_number`. 44If this returns true then the text could be a valid card number, so the 45user interface's OK button would be enabled. If it returns false, then 46this is not yet a valid card number, but could be with more input, so 47the user interface would disable the OK button. Finally, if the procedure 48throws an exception the input could never become a valid number, and the 49inputted character must be discarded, and a suitable error indication 50displayed to the user. 51 52 #include <string> 53 #include <iostream> 54 #include <boost/regex.hpp> 55 56 boost::regex e("(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})"); 57 58 bool is_possible_card_number(const std::string& input) 59 { 60 // 61 // return false for partial match, true for full match, or throw for 62 // impossible match based on what we have so far... 63 boost::match_results<std::string::const_iterator> what; 64 if(0 == boost::regex_match(input, what, e, boost::match_default | boost::match_partial)) 65 { 66 // the input so far could not possibly be valid so reject it: 67 throw std::runtime_error( 68 "Invalid data entered - this could not possibly be a valid card number"); 69 } 70 // OK so far so good, but have we finished? 71 if(what[0].matched) 72 { 73 // excellent, we have a result: 74 return true; 75 } 76 // what we have so far is only a partial match... 77 return false; 78 } 79 80In the following example, text input is taken from a stream containing an 81unknown amount of text; this example simply counts the number of html tags 82encountered in the stream. The text is loaded into a buffer and searched a 83part at a time, if a partial match was encountered, then the partial match 84gets searched a second time as the start of the next batch of text: 85 86 #include <iostream> 87 #include <fstream> 88 #include <sstream> 89 #include <string> 90 #include <boost/regex.hpp> 91 92 // match some kind of html tag: 93 boost::regex e("<[^>]*>"); 94 // count how many: 95 unsigned int tags = 0; 96 97 void search(std::istream& is) 98 { 99 // buffer we'll be searching in: 100 char buf[4096]; 101 // saved position of end of partial match: 102 const char* next_pos = buf + sizeof(buf); 103 // flag to indicate whether there is more input to come: 104 bool have_more = true; 105 106 while(have_more) 107 { 108 // how much do we copy forward from last try: 109 unsigned leftover = (buf + sizeof(buf)) - next_pos; 110 // and how much is left to fill: 111 unsigned size = next_pos - buf; 112 // copy forward whatever we have left: 113 std::memmove(buf, next_pos, leftover); 114 // fill the rest from the stream: 115 is.read(buf + leftover, size); 116 unsigned read = is.gcount(); 117 // check to see if we've run out of text: 118 have_more = read == size; 119 // reset next_pos: 120 next_pos = buf + sizeof(buf); 121 // and then iterate: 122 boost::cregex_iterator a( 123 buf, 124 buf + read + leftover, 125 e, 126 boost::match_default | boost::match_partial); 127 boost::cregex_iterator b; 128 129 while(a != b) 130 { 131 if((*a)[0].matched == false) 132 { 133 // Partial match, save position and break: 134 next_pos = (*a)[0].first; 135 break; 136 } 137 else 138 { 139 // full match: 140 ++tags; 141 } 142 143 // move to next match: 144 ++a; 145 } 146 } 147 } 148 149[endsect] 150 151 152 153