Lines Matching +full:set +full:- +full:output
3 pcre2test - a program for testing Perl-compatible regular expressions.
7 .B pcre2test "[options] [input file [output file]]"
25 defaults and controlling some special actions. The output shows the result of
28 subject is processed, and what output is produced.
37 .SH "PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES"
41 strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or
44 input and output are always in 8-bit format. When testing the 16-bit or 32-bit
45 libraries, patterns and subject strings are converted to 16-bit or 32-bit
47 to 8-bit code units for output.
65 contain binary zeros, even though in Unix-like environments, \fBfgets()\fP
70 for specifying some or all of the 8-bit input characters as hexadecimal pairs,
74 .SS "Input for the 16-bit and 32-bit libraries"
77 When testing the 16-bit or 32-bit libraries, there is a need to be able to
85 below) is set, the pattern and any following subject lines are interpreted as
86 UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
88 For non-UTF testing of wide characters, the \fButf8_input\fP modifier can be
89 used. This is mutually exclusive with \fButf\fP, and is allowed only in 16-bit
90 or 32-bit mode. It causes the pattern and following subject lines to be treated
91 as UTF-8 according to the original definition (RFC 2279), which allows for
92 character values up to 0x7fffffff. Each character is placed in one 16-bit or
93 32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error
96 UTF-8 (in its original definition) is not capable of encoding values greater
97 than 0x7fffffff, but such values can be handled by the 32-bit library. When
98 testing this library in non-UTF mode with \fButf8_input\fP set, if any
99 character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
108 \fB-8\fP
109 If the 8-bit library has been built, this option causes it to be used (this is
110 the default). If the 8-bit library has not been built, this option causes an
113 \fB-16\fP
114 If the 16-bit library has been built, this option causes it to be used. If only
115 the 16-bit library has been built, this is the default. If the 16-bit library
118 \fB-32\fP
119 If the 32-bit library has been built, this option causes it to be used. If only
120 the 32-bit library has been built, this is the default. If the 32-bit library
123 \fB-ac\fP
127 \fB-AC\fP
128 As for \fB-ac\fP, but in addition behave as if each subject line has the
132 \fB-b\fP
134 internal binary form of the pattern is output after compilation.
136 \fB-C\fP
137 Output the version number of the PCRE2 library, and all available information
139 code. All other options are ignored. If both -C and -LM are present, whichever
142 \fB-C\fP \fIoption\fP
143 Output information about a specific build-time option, then exit. This
145 following options output the value and set the exit code as indicated:
147 ebcdic-nl the code for LF (= NL) in an EBCDIC environment:
152 exit code is set to the link size
160 The following options output 1 for true or 0 for false, and set the exit code
163 backslash-C \eC is supported (not locked out)
165 jit just-in-time support is available
166 pcre2-16 the 16-bit library was built
167 pcre2-32 the 32-bit library was built
168 pcre2-8 the 8-bit library was built
171 If an unknown option is given, an error message is output; the exit code is 0.
173 \fB-d\fP
175 form and information about the compiled pattern is output after compilation;
176 \fB-d\fP is equivalent to \fB-b -i\fP.
178 \fB-dfa\fP
183 \fB-error\fP \fInumber[,number,...]\fP
185 comma-separated list, display the resulting messages on the standard output,
189 \fB-help\fP
190 Output a brief summary these options and then exit.
192 \fB-i\fP
196 \fB-jit\fP
198 compilation, each pattern is passed to the just-in-time compiler, if available.
200 \fB-jitfast\fP
202 successful compilation, each pattern is passed to the just-in-time compiler, if
206 \fB-jitverify\fP
208 successful compilation, each pattern is passed to the just-in-time compiler, if
211 \fB-LM\fP
213 standard output, then exit with zero exit code. All other options are ignored.
214 If both -C and any -Lx options are present, whichever is first is recognized.
216 \fB-LP\fP
218 output, then exit with zero exit code. All other options are ignored. If both
219 -C and any -Lx options are present, whichever is first is recognized.
221 \fB-LS\fP
223 output, then exit with zero exit code. All other options are ignored. If both
224 -C and any -Lx options are present, whichever is first is recognized.
226 \fB-pattern\fP \fImodifier-list\fP
229 \fB-q\fP
230 Do not output the version number of \fBpcre2test\fP at the start of execution.
232 \fB-S\fP \fIsize\fP
233 On Unix-like systems, set the size of the run-time stack to \fIsize\fP
236 \fB-subject\fP \fImodifier-list\fP
239 \fB-t\fP
240 Run each compile and match many times with a timer, and output the resulting
243 that are used for timing by following \fB-t\fP with a number (as a separate
244 item on the command line). For example, "-t 1000" iterates 1000 times. The
247 \fB-tm\fP
248 This is like \fB-t\fP except that it times only the matching phase, not the
251 \fB-T\fP \fB-TM\fP
252 These behave like \fB-t\fP and \fB-tm\fP, but in addition, at the end of a run,
253 the total times for all compiles and matches are output.
255 \fB-version\fP
256 Output the PCRE2 version number and then exit.
263 writes to the second. If the first name is "-", input is taken from the
271 function. This provides line-editing and history facilities. The output from
272 the \fB-help\fP option states whether or not \fBreadline()\fP will be used.
274 The program handles any number of tests, each of which consists of a set of
275 input lines. Each set starts with a regular expression pattern, followed by any
289 multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
312 options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
315 which are still supported when PCRE2_UTF is not set, but which require Unicode
324 output.
328 This command is used to load a set of precompiled patterns from a file, as
337 This command is used to load a set of binary character tables that can be
339 \fBpcre2_dftables\fP program with the -b option.
341 #newline_default [<newline-list>]
365 newline convention, though it is possible to set the newline convention from
367 modifier is used when \fB#newline_default\fP would set a default for the
368 non-POSIX API.
370 #pattern <modifier-list>
383 supported. Comment lines, #pattern commands, and #subject commands that set or
403 This command is used to save a set of compiled patterns to a file, as described
410 #subject <modifier-list>
437 This is a pattern line whose modifier list starts with two one-letter modifiers
438 (/i and /g). The lower-case abbreviated modifiers are the same as used in Perl.
445 excluding pattern meta-characters):
447 / ! " ' ` - = _ : ; , % & @ ~
457 since the delimiters are all non-alphanumeric, the inclusion of the backslash
483 \fBsubject_literal\fP modifier was set for the pattern. The following provide a
484 means of encoding non-printing characters in a visible way:
495 a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
504 Note that \exhh specifies one byte rather than one character in UTF-8 mode;
505 this makes it possible to construct invalid UTF-8 sequences for testing
506 purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in
507 UTF-8 mode, generating more than one byte if the value is greater than 127.
508 When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte
511 In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it
512 possible to construct invalid UTF-16 sequences for testing purposes.
514 In UTF-32 mode, all 4- to 8-digit \ex{...} values are accepted. This makes it
515 possible to construct invalid UTF-32 sequences for testing purposes.
541 A backslash followed by any other non-alphanumeric character just escapes that
547 If the \fBsubject_literal\fP modifier is set for a pattern, all subject lines
549 No replication is possible, and any subject modifiers must be set as defaults
558 pattern's modifier list can add to or override default modifiers that were set
566 The following modifiers set options for \fBpcre2_compile()\fP. Most of them set
568 PCRE2_EXTRA are additional options that are set in the compile context. For the
569 main options, there are some single-letter abbreviations that are the same as
579 allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
580 allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
581 allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
582 alt_bsux set PCRE2_ALT_BSUX
583 alt_circumflex set PCRE2_ALT_CIRCUMFLEX
584 alt_verbnames set PCRE2_ALT_VERBNAMES
585 anchored set PCRE2_ANCHORED
586 auto_callout set PCRE2_AUTO_CALLOUT
587 bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
588 /i caseless set PCRE2_CASELESS
589 dollar_endonly set PCRE2_DOLLAR_ENDONLY
590 /s dotall set PCRE2_DOTALL
591 dupnames set PCRE2_DUPNAMES
592 endanchored set PCRE2_ENDANCHORED
593 escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
594 /x extended set PCRE2_EXTENDED
595 /xx extended_more set PCRE2_EXTENDED_MORE
596 extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
597 firstline set PCRE2_FIRSTLINE
598 literal set PCRE2_LITERAL
599 match_line set PCRE2_EXTRA_MATCH_LINE
600 match_invalid_utf set PCRE2_MATCH_INVALID_UTF
601 match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
602 match_word set PCRE2_EXTRA_MATCH_WORD
603 /m multiline set PCRE2_MULTILINE
604 never_backslash_c set PCRE2_NEVER_BACKSLASH_C
605 never_ucp set PCRE2_NEVER_UCP
606 never_utf set PCRE2_NEVER_UTF
607 /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE
608 no_auto_possess set PCRE2_NO_AUTO_POSSESS
609 no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR
610 no_start_optimize set PCRE2_NO_START_OPTIMIZE
611 no_utf_check set PCRE2_NO_UTF_CHECK
612 ucp set PCRE2_UCP
613 ungreedy set PCRE2_UNGREEDY
614 use_offset_limit set PCRE2_USE_OFFSET_LIMIT
615 utf set PCRE2_UTF
618 non-printing characters in output strings to be printed using the \ex{hh...}
619 notation. Otherwise, those less than 0x100 are output in hex without the curly
620 brackets. Setting \fButf\fP in 16-bit or 32-bit mode also causes pattern and
621 subject strings to be translated to UTF-16 or UTF-32, respectively, before
630 about the pattern. There are single-letter abbreviations for some that are
637 convert_glob_escape=c set glob escape character
638 convert_glob_separator=c set glob separator character
639 convert_length set convert buffer length
649 max_pattern_length=<n> set the maximum pattern length
651 newline=<type> set newline type
653 parens_nest_limit=<n> set maximum parentheses depth
661 use_length do not zero-terminate the pattern
662 utf8_input treat input as UTF-8
671 set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
673 PCRE2 is built; if it is not, the default is set to Unicode.
687 output after compilation. This information does not contain length and offset
688 values, which ensures that the same output is generated for different internal
694 code unit widths and link sizes, and is also useful for one-off tests.
718 options are the same, just a single "options" line is output; if there are no
724 \fBno_start_optimize\fP is set because the minimum length is not calculated
732 the pattern. A list of them is output at the end of any other information that
741 the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
752 that contain binary zeros and other non-printing characters. White space is
771 By default, patterns are passed to the compiling functions as zero-terminated
772 strings but can be passed by length instead of being zero-terminated. The
774 automatically (whether or not \fBuse_length\fP is set) when \fBhex\fP is set,
785 .SS "Specifying wide characters in 16-bit and 32-bit modes"
788 In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
789 translated to UTF-16 or UTF-32 when the \fButf\fP modifier is set. For testing
790 the 16-bit and 32-bit libraries in non-UTF mode, the \fButf8_input\fP modifier
792 interpreted as UTF-8 as a means of specifying wide characters. More details are
824 If the \fBinfo\fP modifier is set on an expanded pattern, the result of the
825 expansion is included in the information that is output.
831 Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
838 this to optimized machine code. It needs to know whether the match-time options
853 1 compile JIT code for non-partial matching
869 PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
872 matching (for example, jit=2) but do not set the \fBpartial\fP modifier on a
874 non-partial matching.
878 run-time options are specified. For more details, see the
894 compilation is successful when \fBjitverify\fP is set, the text "(JIT)" is
895 added to the first output line after a match or non match when JIT-compiled
906 The given locale is set, \fBpcre2_maketables()\fP is called to build a set of
919 the compiled pattern to be output. This does not include the size of the
922 also output. Here is an example:
935 The default for the library is set when PCRE2 is built, but \fBpcre2test\fP
956 \fBregcomp()\fP. The POSIX wrapper supports only the 8-bit library. Note that
961 documentation. The following pattern modifiers set options for the
977 buffer is too small for the error message. If this modifier has not been set, a
984 The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
985 default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
999 than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up
1002 value given by the modifier, non-zero is returned, causing the compilation to
1010 1, 2, or 3. It causes a specific set of built-in character tables to be passed
1017 2 a set of tables defining ISO 8859 characters
1018 3 a set of tables loaded by the #loadtables command
1042 jitstack=<n> set size of JIT stack
1058 defaults, set them in a \fB#subject\fP command.
1066 backslashes. It is not possible to set subject modifiers on such lines, but any
1067 that are set as defaults by a \fB#subject\fP command are recognized.
1099 setting the \fBconvert\fP modifier. Its argument is a colon-separated list of
1100 options, which set the equivalent option for the \fBpcre2_pattern_convert()\fP
1110 The "unset" value is useful for turning off a default that has been set by a
1111 \fB#pattern\fP command. When one of these options is set, the input pattern is
1113 result is reflected in the output and then passed to \fBpcre2_compile()\fP. The
1114 normal \fButf\fP and \fBno_utf_check\fP options, if set, cause the
1119 output. However, if the \fBconvert_length\fP modifier is set to a value greater
1125 overriding the defaults, which are operating-system dependent.
1139 The following modifiers set options for \fBpcre2_match()\fP or
1146 anchored set PCRE2_ANCHORED
1147 endanchored set PCRE2_ENDANCHORED
1148 dfa_restart set PCRE2_DFA_RESTART
1149 dfa_shortest set PCRE2_DFA_SHORTEST
1150 no_jit set PCRE2_NO_JIT
1151 no_utf_check set PCRE2_NO_UTF_CHECK
1152 notbol set PCRE2_NOTBOL
1153 notempty set PCRE2_NOTEMPTY
1154 notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1155 noteol set PCRE2_NOTEOL
1156 partial_hard (or ph) set PCRE2_PARTIAL_HARD
1157 partial_soft (or ps) set PCRE2_PARTIAL_SOFT
1163 causing the POSIX wrapper API to be used, the only option-setting modifiers
1169 ignored (with a warning) if used for non-POSIX matching.
1197 allusedtext show all consulted text (non-JIT only)
1200 callout_data=<n> set a value to pass via callouts
1207 depth_limit=<n> set a depth limit
1214 heap_limit=<n> set a limit on heap memory (Kbytes)
1215 jitstack=<n> set size of JIT stack
1217 match_limit=<n> set a match limit
1222 offset=<n> set starting offset
1223 offset_limit=<n> set offset limit
1224 ovector=<n> set size of output vector
1239 zero_terminate pass the subject as zero-terminated
1252 addition output the remainder of the subject string. This is useful for tests
1255 well as the main matched substring. In each case the remainder is output on the
1262 modifier affects the output if there is a lookbehind at the start of a match,
1265 match are indicated in the output by '<' or '>' characters underneath them.
1284 this situation, the output for the matched string is displayed from the
1301 captured parentheses be output after a match. By default, only those up to the
1302 highest one actually used in the match are output (corresponding to the return
1304 are output as "<unset>". This modifier is not relevant for DFA matching (which
1315 successful complete non-DFA match. This modifier, which acts after any match
1319 of a capturing pair, "<unchanged>" is output. After a successful match, this
1322 elements are the only ones that should be set. After a DFA match, the amount of
1359 PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
1360 another, non-empty, match at the same point in the subject. If this match
1378 If the \fB#subject\fP command is used to set default copy and/or get lists,
1386 convenience functions are output with C, G, or L after the string number
1397 If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
1409 is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
1410 the appropriate code unit width. If it is not a valid UTF-8 string, the
1412 invalid UTF-8 string for testing purposes.
1414 The following modifiers set options (in additional to the normal match options)
1432 After a successful substitution, the modified string is output, preceded by the
1443 characters) for substitution tests, as fixed-size buffers are used. To make it
1446 the size of the output buffer, with the replacement string starting at the next
1453 Failed: error -47: no more memory
1456 PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
1457 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
1466 Failed: error -47: no more memory: 10 code units are needed
1476 If the \fBsubstitute_callout\fP modifier is set, a substitution callout
1477 function is set up. The \fBnull_context\fP modifier must not be set, because
1480 and output strings are output. For example:
1489 parenthesized number is the number of pairs that are set in the ovector (that
1490 is, one more than the number of capturing groups that were set). Then are
1498 of that number, and similarly \fBsubstitute_stop\fP returns -1. These cause the
1499 replacement to be rejected, and -1 causes no further matching to take place. If
1500 either of them are set, \fBsubstitute_callout\fP is assumed. For example:
1511 If both are set for the same number, stop takes precedence. Only a single skip
1519 that is used by the just-in-time optimization code. It is ignored if JIT
1523 patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any
1524 value that was set on the pattern.
1530 The \fBheap_limit\fP, \fBmatch_limit\fP, and \fBdepth_limit\fP modifiers set
1553 an in-pattern limit; they cannot increase it.
1555 For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
1561 For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount
1567 and non-recursive, to the internal matching function, thus controlling the
1580 returned for a match, non-match, or partial match, \fBpcre2test\fP shows it.
1582 is added to the non-match message.
1592 allocation on the stack, so in many cases there will be no output. No heap
1594 \fBnull_context\fP modifier must not be set on both the pattern and the
1595 subject, though it can be set on one or the other.
1611 this modifier is used, the \fBuse_offset_limit\fP modifier must have been set
1615 .SS "Setting the size of the output vector"
1619 appears, though of course it can also be used to set a default in a
1628 create a match block with a zero-length ovector; there is always at least one
1632 .SS "Passing the subject as zero-terminated"
1636 its correct length. In order to test the facility for passing a zero-terminated
1642 passing the replacement string as zero-terminated.
1650 If the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
1657 \fBnull_replacement\fP modifier is set, the subject or replacement string
1674 If the \fBdfa\fP modifier is set, the alternative matching function is used.
1676 however, the \fBdfa_shortest\fP modifier is set, processing stops after the
1680 .SH "DEFAULT OUTPUT FROM pcre2test"
1683 This section describes the output when the normal matching function,
1697 code unit offset of the start of the failing character is also output. Here is
1701 PCRE2 version 10.22 2016-07-29
1710 Unset capturing substrings that are not followed by one that is set are not
1725 If the strings contain any non-printing characters, they are output as \exhh
1726 escapes if the value is less than 256 and UTF mode is not set. Otherwise they
1727 are output as \ex{hh...} escapes. See below for the definition of non-printing
1728 characters. If the \fBaftertext\fP modifier is set, the output for substring
1738 are output in sequence, like this:
1749 "No match" is output only if the first match attempt fails. Here is an example
1755 Error -24 (bad offset value)
1764 .SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
1768 output consists of a list of all the matches that start at the first point in
1779 PCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the
1829 differences in behaviour. The output for callouts with numerical arguments and
1840 --->pqrabcdef
1843 This output indicates that callout number 0 occurred for a match attempt
1846 one circumflex is output if the start and current positions are the same, or if
1853 output. For example:
1855 re> /\ed?[A-E]\e*/auto_callout
1857 --->E*
1859 +3 ^ [A-E]
1864 If a pattern contains (*MARK) items, an additional line is output whenever
1869 --->abc
1879 of the match, so nothing more is output. If, as a result of backtracking, the
1880 mark reverts to being unset, the text "<unset>" is output.
1886 The output for a callout with a string argument is similar, except that instead
1888 string and its offset in the pattern string are output before the reflection of
1895 --->abcdefg
1898 --->abcdefg
1911 If the \fBcallout_capture\fP modifier is set, the current captured groups are
1912 output when a callout occurs. This is useful only for non-DFA matching, as
1916 The normal callout output, showing the callout number or pattern offset (as
1917 described above) is suppressed if the \fBcallout_no_where\fP modifier is set.
1920 setting the \fBcallout_extra\fP modifier causes additional output from
1923 output. If there has been a backtrack since the last callout (or start of
1924 matching if this is the first callout), "Backtrack" is output, followed by "No
1931 --->aac
1937 --->aac
1943 --->aac
1951 --->aac
1957 --->aac
1983 aborted. If both these modifiers are set for the same callout number,
1988 This is set as the "user data" that is passed to the matching function, and
2002 .SH "NON-PRINTING CHARACTERS"
2006 bytes other than 32-126 are always treated as non-printing characters and are
2010 string, it behaves in the same way, unless a different locale has been set for
2012 \fBisprint()\fP function is used to distinguish printing and non-printing
2032 for serializing and de-serializing. They are described in the
2063 reads the data in the file, and then arranges for it to be de-serialized, with
2078 option-setting modifiers.
2125 Copyright (c) 1997-2022 University of Cambridge.