• Home
  • Raw
  • Download

Lines Matching full:is

9 \fBpcre2test\fP is a test program for the PCRE2 regular expression libraries,
23 The input for \fBpcre2test\fP is a sequence of regular expression patterns and
28 subject is processed, and what output is produced.
62 Input to \fBpcre2test\fP is processed line by line, either by calling the C
64 below). The input is processed using using C's string functions, so must not
68 further data is read.
70 For maximum portability, therefore, it is safest to avoid non-printing
71 characters in \fBpcre2test\fP input files. There is a facility for specifying
82 If the 8-bit library has been built, this option causes it to be used (this is
88 the 16-bit library has been built, this is the default. If the 16-bit library
93 the 32-bit library has been built, this is the default. If the 32-bit library
98 internal binary form of the pattern is output after compilation.
107 functionality is intended for use in scripts such as \fBRunTest\fP. The
113 exit code is always 0
115 exit code is set to the link size
118 exit code is always 0
121 exit code is always 0
126 backslash-C \eC is supported (not locked out)
128 jit just-in-time support is available
132 unicode Unicode support is available
134 If an unknown option is given, an error message is output; the exit code is 0.
138 form and information about the compiled pattern is output after compilation;
139 \fB-d\fP is equivalent to \fB-b -i\fP.
142 Behave as if each subject line has the \fBdfa\fP modifier; matching is done
149 then exit with zero exit code. The numbers may be positive or negative. This is
157 compiled pattern is given after compilation.
161 compilation, each pattern is passed to the just-in-time compiler, if available.
178 times per compile or match. When JIT is used, separate times are given for the
182 default is to iterate 500,000 times.
185 This is like \fB-t\fP except that it times only the matching phase, not the
199 If \fBpcre2test\fP is given two filename arguments, it reads from the first and
200 writes to the second. If the first name is "-", input is taken from the
201 standard input. If \fBpcre2test\fP is given only one argument, it reads from
205 When \fBpcre2test\fP is built, a configuration option can specify that it
207 is done, if the input is from a terminal, it is read using the \fBreadline()\fP
217 and Perl is the same.
219 When the input is a terminal, \fBpcre2test\fP prompts for each line of input,
224 Each subject line is matched separately and independently. If you want to do
227 newline sequences. There is no limit on the length of subject lines; the input
228 buffer is automatically extended if it is too small. There are replication
233 test, at which point a new pattern or command line is expected if there is
240 In between sets of test data, a line that begins with # is interpreted as a
241 command line. If the first character is followed by white space or an
242 exclamation mark, the line is treated as a comment, and ignored. Otherwise, the
251 which are still supported when PCRE2_UTF is not set, but which require Unicode
254 This is a trigger guard that is used in test files to ensure that UTF or
256 Unicode support is not included in the library. Setting PCRE2_NEVER_UTF and
258 the difference is that \fB#forbid_utf\fP cannot be unset, and the automatic
264 This command is used to load a set of precompiled patterns from a file, as
273 When PCRE2 is built, a default newline convention can be specified. This
276 pattern is compiled. The standard test files contain tests of various newline
279 when PCRE2 is compiled with either CR or CRLF as the default newline.
287 If the default newline is in the list, this command has no effect. Otherwise,
289 first newline convention in the list (LF in the above example) is added to any
291 list is empty, the feature is turned off. This command is present in a number
294 When the POSIX API is being tested there is no way to override the default
295 newline convention, though it is possible to set the newline convention from
296 within the pattern. A warning is given if the \fBposix\fP modifier is used when
307 checked for compatibility with the \fBperltest.sh\fP script, which is used to
326 This command is used to save a set of compiled patterns to a file, as described
344 in a modifier list is ignored. Some modifiers may be given for both patterns
355 first item is not recognized as a long modifier name, it is interpreted as a
360 This is a pattern line whose modifier list starts with two one-letter modifiers
372 This is interpreted as the pattern's delimiter. A regular expression may be
374 included within it. It is possible to include the delimiter within the pattern
381 interpretation. If the terminating delimiter is immediately followed by a
386 then a backslash is added to the end of the pattern. This is done to provide a
401 Before each subject line is passed to \fBpcre2_match()\fP or
402 \fBpcre2_dfa_match()\fP, leading and trailing white space is removed, and the
403 line is scanned for backslash escapes. The following provide a means of
420 The use of \ex{hh...} is not dependent on the use of the \fButf\fP modifier on
421 the pattern. It is recognized always. There may be any number of hexadecimal
426 purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in
427 UTF-8 mode, generating more than one byte if the value is greater than 127.
437 There is a special backslash sequence that specifies replication of one or more
455 If the subject string is empty and \e= is followed by whitespace, the line is
456 treated as a comment line, and is not used for matching. For example:
458 \e= This is a comment.
459 abc\e= This is an invalid modifier list.
463 the very last character in the line is a backslash (and there is no modifier
464 list), it is ignored. This gives a way of passing an empty line as data, since
557 The \fBbsr\fP modifier specifies what \eR in a pattern should match. If it is
558 set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
559 \eR matches any Unicode newline sequence. The default is specified when PCRE2
570 The \fBdebug\fP modifier is a shorthand for \fBinfo,fullbincode\fP, requesting
575 values, which ensures that the same output is generated for different internal
580 offset values. This is used in a few special tests that run only for specific
581 code unit widths and link sizes, and is also useful for one-off tests.
584 (whether it is anchored, has a fixed first character, and so on). The
585 information is obtained from the \fBpcre2_pattern_info()\fP function. Here are
605 options are the same, just a single "options" line is output; if there are no
606 options, the line is omitted. "First code unit" is where any match must start;
607 if there is more than one they are listed as "starting code units". "Last code
608 unit" is the last literal code unit that must be present in any match. This is
613 the pattern. A list of them is output at the end of any other information that
614 is requested. For each callout, either its number or string is given, followed
622 the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
632 of hexadecimal digits. This feature is provided as a way of creating patterns
633 that contain binary zeros and other non-printing characters. White space is
644 Either single or double quotes may be used. There is no way of including
650 pattern is passed.
659 \fBexpand\fP modifier is present on a pattern, parts of the pattern that have
664 are expanded before the pattern is passed to \fBpcre2_compile()\fP. For
665 example, \e[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
666 cannot be nested. An initial "\e[" sequence is recognized only if "]{" followed
667 by decimal digits and "}" is found later in the pattern. If not, the characters
670 If part of an expanded pattern looks like an expansion, but is really part of
672 the quantifier. For example, \e[AB]{6000,6000} is not recognized as an
675 If the \fBinfo\fP modifier is set on an expanded pattern, the result of the
676 expansion is included in the information that is output.
682 Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
691 different code is generated for the different cases. See the \fBpartial\fP
699 JIT compilation is requested by the \fB/jit\fP pattern modifier, which may
718 If no number is given, 7 is assumed. The phrase "partial matching" means a call
727 If JIT compilation is successful, the compiled JIT code will automatically be
728 used when an appropriate type of match is run, except when incompatible
736 If the \fBjitfast\fP modifier is specified, matching is done using the JIT
739 JIT is not supported. If \fBjitfast\fP is specified without \fBjit\fP, jit=7 is
742 If the \fBjitverify\fP modifier is specified, information about the compiled
744 \fBjitverify\fP is specified without \fBjit\fP, jit=7 is assumed. If JIT
745 compilation is successful when \fBjitverify\fP is set, the text "(JIT)" is
757 The given locale is set, \fBpcre2_maketables()\fP is called to build a set of
758 character tables for the locale, and this is then passed to
762 \fB#pattern\fP command if a default is needed. Setting a locale and alternate
771 \fBpcre2_code\fP block; it is just the actual compiled data. If the pattern is
772 subsequently passed to the JIT compiler, the size of the JIT compiled code is
773 also output. Here is an example:
786 The default for the library is set when PCRE2 is built, but \fBpcre2test\fP
787 sets its own default of 220, which is required for running the standard test
796 causes a compilation error. The default is the largest number a PCRE2_SIZE
805 \fBposix_nosub\fP is used, the POSIX option REG_NOSUB is passed to
827 buffer is too small for the error message. If this modifier has not been set, a
828 large buffer is used.
838 The \fB/stackguard\fP modifier is used to test the use of
839 \fBpcre2_set_compile_recursion_guard()\fP, a function that is provided to
844 documentation for details). If the number specified by the modifier is greater
845 than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up
847 receives is the current nesting parenthesis depth; if this is greater than the
848 value given by the modifier, non-zero is returned, causing the compilation to
857 \fBpcre2_compile()\fP. This is used in the PCRE2 tests to check behaviour with
875 are applied to every subject line that is processed with that pattern. They may
899 When a pattern with the \fBpush\fP modifier is successfully compiled, it is
902 facility is used when saving compiled patterns to a file, as described in the
906 below. If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
907 pattern is stacked, leaving the original as current, ready to match the
914 \fBreplace\fP, which causes an error. Note that \fBjitverify\fP, which is
964 in which case they apply to every subject line that is matched against that
1011 addition output the remainder of the subject string. This is useful for tests
1014 well as the main matched substring. In each case the remainder is output on the
1019 feature is not supported for JIT matching, and if requested with JIT it is
1021 there is a lookbehind at the start of a match, or a lookahead at the end, or if
1022 \eK is used in the pattern. Characters that precede or follow the start and end
1024 underneath them. Here is an example:
1031 This shows that the matched string is "abc", with the preceding and following
1036 be indicated, if it is different to the start of the matched string. The only
1037 time when this occurs is when \eK has been processed as part of the match. In
1038 this situation, the output for the matched string is displayed from the
1058 are output as "<unset>". This modifier is not relevant for DFA matching (which
1059 does no capturing); it is ignored, with a warning message, if present.
1065 A callout function is supplied when \fBpcre2test\fP calls the library matching
1066 functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is
1069 The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
1070 only one number, 1 is returned instead of 0 when a callout of that number is
1071 reached. If two numbers are given, 1 is returned when callout <n> is reached
1077 This is set as the "user data" that is passed to the matching function, and
1078 passed back when the callout function is invoked. Any value other than zero is
1087 function is called again to search the remainder of the subject. The difference
1088 between \fBglobal\fP and \fBaltglobal\fP is that the former uses the
1090 to start searching at a new point within the entire string (which is what Perl
1095 If an empty string is matched, the next match is done with the
1098 fails, the start offset is advanced, and the normal match is retried. This
1100 the \fBsplit()\fP function. Normally, the start offset is advanced by one
1102 current character is CR followed by LF, an advance of two characters occurs.
1115 If the \fB#subject\fP command is used to set default copy and/or get lists,
1122 If the subject line is successfully matched, the substrings extracted by the
1124 instead of a colon. This is in addition to the normal full list. The string
1125 length (that is, the return from the extraction function) is given in
1133 If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
1135 cannot contain commas, because a comma signifies the end of a modifier. This is
1139 for escape sequences. In UTF mode, a replacement string is checked to see if it
1140 is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
1141 the appropriate code unit width. If it is not a valid UTF-8 string, the
1155 After a successful substitution, the modified string is output, preceded by the
1156 number of replacements. This may be zero if there were no matches. Here is a
1168 number in square brackets, that number is passed to \fBpcre2_substitute()\fP as
1170 character. Here is an example that tests the edge case:
1178 The default action of \fBpcre2_substitute()\fP is to return
1179 PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
1180 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
1183 size of buffer that is required. When this happens, \fBpcre2test\fP shows the
1191 A replacement string is ignored with POSIX and DFA matching. Specifying partial
1200 that is used by the just-in-time optimization code. It is ignored if JIT
1201 optimization is not being used. The value is a number of kilobytes. Providing a
1202 stack that is larger than the default 32K is necessary only for very
1211 \fBfind_limits\fP modifier is specified.
1217 If the \fBfind_limits\fP modifier is present, \fBpcre2test\fP calls
1223 If JIT is being used, only the match limit is relevant. If DFA matching is
1224 being used, neither limit is relevant, and this modifier is ignored (with a
1227 The \fImatch_limit\fP number is a measure of the amount of backtracking
1229 simple matches, the number is quite small, but for patterns with very large
1231 increasing length of subject string. The \fImatch_limit_recursion\fP number is
1232 a measure of how much stack (or, if PCRE2 is compiled with NO_RECURSE, how much
1233 heap) memory is needed to complete the match attempt.
1241 are returned from calls to \fBpcre2_match()\fP to be displayed. If a mark is
1243 For a match, it is on a line by itself, tagged with "MK:". Otherwise, it
1258 matching starts. Its value is a number of code units, not characters.
1266 return is given. The data value is a number of code units, not characters. When
1267 this modifier is used, the \fBuse_offset_limit\fP modifier must have been set
1268 for the pattern; if not, an error is generated.
1277 available for storing matching information. The default is 15.
1279 A value of zero is useful when testing the POSIX API because it causes
1281 POSIX API, a value of zero is used to cause
1283 match block of exactly the right size for the pattern. (It is not possible to
1284 create a match block with a zero-length ovector; there is always at least one
1291 By default, the subject string is passed to a native API matching function with
1293 string, the \fBzero_terminate\fP modifier is provided. It causes the length to
1295 this modifier has no effect, as there is no facility for passing a length.)
1306 modifier is set, however, NULL is passed. This is for testing that the matching
1325 If the \fBdfa\fP modifier is set, the alternative matching function is used.
1327 however, the \fBdfa_shortest\fP modifier is set, processing stops after the
1328 first match is found. This is always the shortest possible match.
1335 \fBpcre2_match()\fP, is being used.
1339 Otherwise, it outputs "No match" when the return is PCRE2_ERROR_NOMATCH, or
1341 return is PCRE2_ERROR_PARTIAL. (Note that this is the
1347 and a short descriptive phrase. If the error is a failed UTF string check, the
1348 code unit offset of the start of the failing character is also output. Here is
1361 Unset capturing substrings that are not followed by one that is set are not
1362 shown by \fBpcre2test\fP unless the \fBallcaptures\fP modifier is specified. In
1364 data line is matched, the second, unset substring is not shown. An "internal"
1365 unset substring is shown as "<unset>", as for the second data line.
1377 escapes if the value is less than 256 and UTF mode is not set. Otherwise they
1379 characters. If the \fB/aftertext\fP modifier is set, the output for substring
1380 0 is followed by the the rest of the subject string, identified by "0+" like
1388 If global matching is requested, the results of successive matching attempts
1400 "No match" is output only if the first match attempt fails. Here is an example
1401 of a failure message (the offset 4 that is specified by the \fBoffset\fP
1402 modifier is past the end of the subject string):
1409 prompt is used for continuations), subject lines may not. However newlines can
1418 When the alternative matching function, \fBpcre2_dfa_match()\fP, is used, the
1420 the subject where there is at least one match. For example:
1429 longest matching string is always given first (and numbered zero). After a
1430 PCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the
1431 partially matching substring. Note that this is the entire substring that was
1433 match start if a lookbehind assertion, \eb, or \eB was involved. (\eK is not
1436 If global matching is requested, the search for further matches resumes
1477 function is called during matching unless \fBcallout_none\fP is specified.
1493 arguments is slightly different.
1509 one circumflex is output if the start and current positions are the same, or if
1511 callout is in a lookbehind assertion.
1515 showing the callout number, the offset in the pattern, preceded by a plus, is
1527 If a pattern contains (*MARK) items, an additional line is output whenever
1528 a change of latest mark is passed to the callout function. For example:
1542 of the match, so nothing more is output. If, as a result of backtracking, the
1543 mark reverts to being unset, the text "<unset>" is output.
1549 The output for a callout with a string argument is similar, except that instead
1552 the subject string, and the subject string is reflected for each callout. For
1571 When \fBpcre2test\fP is outputting text in the compiled version of a pattern,
1575 When \fBpcre2test\fP is outputting text that is a matched part of a subject
1578 \fBisprint()\fP function is used to distinguish printing and non-printing
1587 It is possible to save compiled patterns on disc or elsewhere, and reload them
1592 serialized, that is, converted to a stream of bytes. A single byte stream may
1594 character tables. A single copy of the tables is included in the byte stream
1595 (its size is 1088 bytes).
1605 When a pattern with \fBpush\fP modifier is successfully compiled, it is pushed
1641 The JIT modifiers are, however permitted. Here is an example that saves and
1654 If \fBjitverify\fP is used with #pop, it does not automatically imply
1655 \fBjit\fP, which is different behaviour from when it is used on a pattern.
1657 The #popcopy command is analagous to the \fBpushcopy\fP modifier in that it