Lines Matching +full:set +full:- +full:output
3 pcre2test - a program for testing Perl-compatible regular expressions.
7 .B pcre2test "[options] [input file [output file]]"
25 defaults and controlling some special actions. The output shows the result of
28 subject is processed, and what output is produced.
37 .SH "PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES"
41 strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or
44 input and output are always in 8-bit format. When testing the 16-bit or 32-bit
45 libraries, patterns and subject strings are converted to 16-bit or 32-bit
47 to 8-bit code units for output.
65 zeros, even though in Unix-like environments, \fBfgets()\fP treats any bytes
70 or all of the 8-bit input characters as hexadecimal pairs, which makes it
74 .SS "Input for the 16-bit and 32-bit libraries"
77 When testing the 16-bit or 32-bit libraries, there is a need to be able to
85 below) is set, the pattern and any following subject lines are interpreted as
86 UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
88 For non-UTF testing of wide characters, the \fButf8_input\fP modifier can be
89 used. This is mutually exclusive with \fButf\fP, and is allowed only in 16-bit
90 or 32-bit mode. It causes the pattern and following subject lines to be treated
91 as UTF-8 according to the original definition (RFC 2279), which allows for
92 character values up to 0x7fffffff. Each character is placed in one 16-bit or
93 32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error
96 UTF-8 (in its original definition) is not capable of encoding values greater
97 than 0x7fffffff, but such values can be handled by the 32-bit library. When
98 testing this library in non-UTF mode with \fButf8_input\fP set, if any
99 character is preceded by the byte 0xff (which is an invalid byte in UTF-8)
108 \fB-8\fP
109 If the 8-bit library has been built, this option causes it to be used (this is
110 the default). If the 8-bit library has not been built, this option causes an
113 \fB-16\fP
114 If the 16-bit library has been built, this option causes it to be used. If the
115 8-bit library has not been built, this is the default. If the 16-bit library
118 \fB-32\fP
119 If the 32-bit library has been built, this option causes it to be used. If no
120 other library has been built, this is the default. If the 32-bit library has
123 \fB-ac\fP
127 \fB-AC\fP
128 As for \fB-ac\fP, but in addition behave as if each subject line has the
132 \fB-b\fP
134 internal binary form of the pattern is output after compilation.
136 \fB-C\fP
137 Output the version number of the PCRE2 library, and all available information
139 code. All other options are ignored. If both -C and -LM are present, whichever
142 \fB-C\fP \fIoption\fP
143 Output information about a specific build-time option, then exit. This
145 following options output the value and set the exit code as indicated:
147 ebcdic-nl the code for LF (= NL) in an EBCDIC environment:
152 exit code is set to the link size
160 The following options output 1 for true or 0 for false, and set the exit code
163 backslash-C \eC is supported (not locked out)
165 jit just-in-time support is available
166 pcre2-16 the 16-bit library was built
167 pcre2-32 the 32-bit library was built
168 pcre2-8 the 8-bit library was built
171 If an unknown option is given, an error message is output; the exit code is 0.
173 \fB-d\fP
175 form and information about the compiled pattern is output after compilation;
176 \fB-d\fP is equivalent to \fB-b -i\fP.
178 \fB-dfa\fP
183 \fB-error\fP \fInumber[,number,...]\fP
185 comma-separated list, display the resulting messages on the standard output,
189 \fB-help\fP
190 Output a brief summary these options and then exit.
192 \fB-i\fP
196 \fB-jit\fP
198 compilation, each pattern is passed to the just-in-time compiler, if available.
200 \fB-jitfast\fP
202 successful compilation, each pattern is passed to the just-in-time compiler, if
206 \fB-jitverify\fP
208 successful compilation, each pattern is passed to the just-in-time compiler, if
211 \fB-LM\fP
213 standard output, then exit with zero exit code. All other options are ignored.
214 If both -C and any -Lx options are present, whichever is first is recognized.
216 \fB-LP\fP
218 output, then exit with zero exit code. All other options are ignored. If both
219 -C and any -Lx options are present, whichever is first is recognized.
221 \fB-LS\fP
223 output, then exit with zero exit code. All other options are ignored. If both
224 -C and any -Lx options are present, whichever is first is recognized.
226 \fB-pattern\fP \fImodifier-list\fP
229 \fB-q\fP
230 Do not output the version number of \fBpcre2test\fP at the start of execution.
232 \fB-S\fP \fIsize\fP
233 On Unix-like systems, set the size of the run-time stack to \fIsize\fP
236 \fB-subject\fP \fImodifier-list\fP
239 \fB-t\fP
240 Run each compile and match many times with a timer, and output the resulting
243 that are used for timing by following \fB-t\fP with a number (as a separate
244 item on the command line). For example, "-t 1000" iterates 1000 times. The
247 \fB-tm\fP
248 This is like \fB-t\fP except that it times only the matching phase, not the
251 \fB-T\fP \fB-TM\fP
252 These behave like \fB-t\fP and \fB-tm\fP, but in addition, at the end of a run,
253 the total times for all compiles and matches are output.
255 \fB-version\fP
256 Output the PCRE2 version number and then exit.
263 writes to the second. If the first name is "-", input is taken from the
271 function. This provides line-editing and history facilities. The output from
272 the \fB-help\fP option states whether or not \fBreadline()\fP will be used.
274 The program handles any number of tests, each of which consists of a set of
275 input lines. Each set starts with a regular expression pattern, followed by any
289 multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
312 options set, which locks out the use of the PCRE2_UTF and PCRE2_UCP options and
315 which are still supported when PCRE2_UTF is not set, but which require Unicode
324 output.
328 This command is used to load a set of precompiled patterns from a file, as
337 This command is used to load a set of binary character tables that can be
339 \fBpcre2_dftables\fP program with the -b option.
341 #newline_default [<newline-list>]
365 newline convention, though it is possible to set the newline convention from
367 modifier is used when \fB#newline_default\fP would set a default for the
368 non-POSIX API.
370 #pattern <modifier-list>
383 supported. Comment lines, #pattern commands, and #subject commands that set or
403 This command is used to save a set of compiled patterns to a file, as described
410 #subject <modifier-list>
437 This is a pattern line whose modifier list starts with two one-letter modifiers
438 (/i and /g). The lower-case abbreviated modifiers are the same as used in Perl.
445 excluding pattern meta-characters):
447 / ! " ' ` - = _ : ; , % & @ ~
457 since the delimiters are all non-alphanumeric, the inclusion of the backslash
483 \fBsubject_literal\fP modifier was set for the pattern. The following provide a
484 means of encoding non-printing characters in a visible way:
495 a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
504 Note that \exhh specifies one byte rather than one character in UTF-8 mode;
505 this makes it possible to construct invalid UTF-8 sequences for testing
506 purposes. On the other hand, \ex{hh} is interpreted as a UTF-8 character in
507 UTF-8 mode, generating more than one byte if the value is greater than 127.
508 When testing the 8-bit library not in UTF-8 mode, \ex{hh} generates one byte
511 In UTF-16 mode, all 4-digit \ex{hhhh} values are accepted. This makes it
512 possible to construct invalid UTF-16 sequences for testing purposes.
514 In UTF-32 mode, all 4- to 8-digit \ex{...} values are accepted. This makes it
515 possible to construct invalid UTF-32 sequences for testing purposes.
541 A backslash followed by any other non-alphanumeric character just escapes that
547 If the \fBsubject_literal\fP modifier is set for a pattern, all subject lines
549 No replication is possible, and any subject modifiers must be set as defaults
558 pattern's modifier list can add to or override default modifiers that were set
566 The following modifiers set options for \fBpcre2_compile()\fP. Most of them set
568 PCRE2_EXTRA are additional options that are set in the compile context.
569 Some of these options have single-letter abbreviations. There is special
578 allow_empty_class set PCRE2_ALLOW_EMPTY_CLASS
579 allow_lookaround_bsk set PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
580 allow_surrogate_escapes set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
581 alt_bsux set PCRE2_ALT_BSUX
582 alt_circumflex set PCRE2_ALT_CIRCUMFLEX
583 alt_verbnames set PCRE2_ALT_VERBNAMES
584 anchored set PCRE2_ANCHORED
585 /a ascii_all set all ASCII options
586 ascii_bsd set PCRE2_EXTRA_ASCII_BSD
587 ascii_bss set PCRE2_EXTRA_ASCII_BSS
588 ascii_bsw set PCRE2_EXTRA_ASCII_BSW
589 ascii_digit set PCRE2_EXTRA_ASCII_DIGIT
590 ascii_posix set PCRE2_EXTRA_ASCII_POSIX
591 auto_callout set PCRE2_AUTO_CALLOUT
592 bad_escape_is_literal set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
593 /i caseless set PCRE2_CASELESS
594 /r caseless_restrict set PCRE2_EXTRA_CASELESS_RESTRICT
595 dollar_endonly set PCRE2_DOLLAR_ENDONLY
596 /s dotall set PCRE2_DOTALL
597 dupnames set PCRE2_DUPNAMES
598 endanchored set PCRE2_ENDANCHORED
599 escaped_cr_is_lf set PCRE2_EXTRA_ESCAPED_CR_IS_LF
600 /x extended set PCRE2_EXTENDED
601 /xx extended_more set PCRE2_EXTENDED_MORE
602 extra_alt_bsux set PCRE2_EXTRA_ALT_BSUX
603 firstline set PCRE2_FIRSTLINE
604 literal set PCRE2_LITERAL
605 match_line set PCRE2_EXTRA_MATCH_LINE
606 match_invalid_utf set PCRE2_MATCH_INVALID_UTF
607 match_unset_backref set PCRE2_MATCH_UNSET_BACKREF
608 match_word set PCRE2_EXTRA_MATCH_WORD
609 /m multiline set PCRE2_MULTILINE
610 never_backslash_c set PCRE2_NEVER_BACKSLASH_C
611 never_ucp set PCRE2_NEVER_UCP
612 never_utf set PCRE2_NEVER_UTF
613 /n no_auto_capture set PCRE2_NO_AUTO_CAPTURE
614 no_auto_possess set PCRE2_NO_AUTO_POSSESS
615 no_dotstar_anchor set PCRE2_NO_DOTSTAR_ANCHOR
616 no_start_optimize set PCRE2_NO_START_OPTIMIZE
617 no_utf_check set PCRE2_NO_UTF_CHECK
618 ucp set PCRE2_UCP
619 ungreedy set PCRE2_UNGREEDY
620 use_offset_limit set PCRE2_USE_OFFSET_LIMIT
621 utf set PCRE2_UTF
624 non-printing characters in output strings to be printed using the \ex{hh...}
625 notation. Otherwise, those less than 0x100 are output in hex without the curly
626 brackets. Setting \fButf\fP in 16-bit or 32-bit mode also causes pattern and
627 subject strings to be translated to UTF-16 or UTF-32, respectively, before
636 about the pattern. There are single-letter abbreviations for some that are
643 convert_glob_escape=c set glob escape character
644 convert_glob_separator=c set glob separator character
645 convert_length set convert buffer length
655 max_pattern_compiled ) set maximum compiled pattern
657 max_pattern_length=<n> set maximum pattern length (code units)
658 max_varlookbehind=<n> set maximum variable lookbehind length
660 newline=<type> set newline type
663 parens_nest_limit=<n> set maximum parentheses depth
671 use_length do not zero-terminate the pattern
672 utf8_input treat input as UTF-8
681 set to "anycrlf", \eR matches CR, LF, or CRLF only. If it is set to "unicode",
683 PCRE2 is built; if it is not, the default is set to Unicode.
697 output after compilation. This information does not contain length and offset
698 values, which ensures that the same output is generated for different internal
704 code unit widths and link sizes, and is also useful for one-off tests.
728 options are the same, just a single "options" line is output; if there are no
734 \fBno_start_optimize\fP is set because the minimum length is not calculated
741 subject modifier is set.
744 the pattern. A list of them is output at the end of any other information that
753 the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
763 passed is the default PCRE2_ZERO_TERMINATED unless \fBuse_length\fP is set.
773 that contain binary zeros and other non-printing characters. White space is
792 By default, patterns are passed to the compiling functions as zero-terminated
793 strings but can be passed by length instead of being zero-terminated. The
795 automatically (whether or not \fBuse_length\fP is set) when \fBhex\fP is set,
811 whose default can be set at build time, with an ultimate default of 255. The
817 .SS "Specifying wide characters in 16-bit and 32-bit modes"
820 In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
821 translated to UTF-16 or UTF-32 when the \fButf\fP modifier is set. For testing
822 the 16-bit and 32-bit libraries in non-UTF mode, the \fButf8_input\fP modifier
824 interpreted as UTF-8 as a means of specifying wide characters. More details are
856 If the \fBinfo\fP modifier is set on an expanded pattern, the result of the
857 expansion is included in the information that is output.
863 Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
870 this to optimized machine code. It needs to know whether the match-time options
885 1 compile JIT code for non-partial matching
901 PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
904 matching (for example, jit=2) but do not set the \fBpartial\fP modifier on a
906 non-partial matching.
910 run-time options are specified. For more details, see the
926 compilation is successful when \fBjitverify\fP is set, the text "(JIT)" is
927 added to the first output line after a match or non match when JIT-compiled
938 The given locale is set, \fBpcre2_maketables()\fP is called to build a set of
951 the compiled pattern to be output. This does not include the size of the
954 also output. Here is an example:
967 The default for the library is set when PCRE2 is built, but \fBpcre2test\fP
997 \fBregcomp()\fP. The POSIX wrapper supports only the 8-bit library. Note that
1002 documentation. The following pattern modifiers set options for the
1018 buffer is too small for the error message. If this modifier has not been set, a
1025 The pattern is passed to \fBregcomp()\fP as a zero-terminated string by
1026 default, but if the \fBuse_length\fP or \fBhex\fP modifiers are set, the
1040 than zero, \fBpcre2_set_compile_recursion_guard()\fP is called to set up
1043 value given by the modifier, non-zero is returned, causing the compilation to
1051 1, 2, or 3. It causes a specific set of built-in character tables to be passed
1058 2 a set of tables defining ISO 8859 characters
1059 3 a set of tables loaded by the #loadtables command
1084 jitstack=<n> set size of JIT stack
1100 defaults, set them in a \fB#subject\fP command.
1108 backslashes. It is not possible to set subject modifiers on such lines, but any
1109 that are set as defaults by a \fB#subject\fP command are recognized.
1141 setting the \fBconvert\fP modifier. Its argument is a colon-separated list of
1142 options, which set the equivalent option for the \fBpcre2_pattern_convert()\fP
1152 The "unset" value is useful for turning off a default that has been set by a
1153 \fB#pattern\fP command. When one of these options is set, the input pattern is
1155 result is reflected in the output and then passed to \fBpcre2_compile()\fP. The
1156 normal \fButf\fP and \fBno_utf_check\fP options, if set, cause the
1161 output. However, if the \fBconvert_length\fP modifier is set to a value greater
1167 overriding the defaults, which are operating-system dependent.
1181 The following modifiers set options for \fBpcre2_match()\fP or
1188 anchored set PCRE2_ANCHORED
1189 endanchored set PCRE2_ENDANCHORED
1190 dfa_restart set PCRE2_DFA_RESTART
1191 dfa_shortest set PCRE2_DFA_SHORTEST
1192 disable_recurseloop_check set PCRE2_DISABLE_RECURSELOOP_CHECK
1193 no_jit set PCRE2_NO_JIT
1194 no_utf_check set PCRE2_NO_UTF_CHECK
1195 notbol set PCRE2_NOTBOL
1196 notempty set PCRE2_NOTEMPTY
1197 notempty_atstart set PCRE2_NOTEMPTY_ATSTART
1198 noteol set PCRE2_NOTEOL
1199 partial_hard (or ph) set PCRE2_PARTIAL_HARD
1200 partial_soft (or ps) set PCRE2_PARTIAL_SOFT
1206 causing the POSIX wrapper API to be used, the only option-setting modifiers
1212 ignored (with a warning) if used for non-POSIX matching.
1240 allusedtext show all consulted text (non-JIT only)
1243 callout_data=<n> set a value to pass via callouts
1250 depth_limit=<n> set a depth limit
1258 heap_limit=<n> set a limit on heap memory (Kbytes)
1259 jitstack=<n> set size of JIT stack
1261 match_limit=<n> set a match limit
1266 offset=<n> set starting offset
1267 offset_limit=<n> set offset limit
1268 ovector=<n> set size of output vector
1283 zero_terminate pass the subject as zero-terminated
1296 addition output the remainder of the subject string. This is useful for tests
1299 well as the main matched substring. In each case the remainder is output on the
1306 modifier affects the output if there is a lookbehind at the start of a match,
1309 match are indicated in the output by '<' or '>' characters underneath them.
1328 this situation, the output for the matched string is displayed from the
1345 captured parentheses be output after a match. By default, only those up to the
1346 highest one actually used in the match are output (corresponding to the return
1348 are output as "<unset>". This modifier is not relevant for DFA matching (which
1359 successful complete non-DFA match. This modifier, which acts after any match
1363 of a capturing pair, "<unchanged>" is output. After a successful match, this
1366 elements are the only ones that should be set. After a DFA match, the amount of
1403 PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search for
1404 another, non-empty, match at the same point in the subject. If this match
1422 If the \fB#subject\fP command is used to set default copy and/or get lists,
1430 convenience functions are output with C, G, or L after the string number
1441 If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
1453 is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
1454 the appropriate code unit width. If it is not a valid UTF-8 string, the
1456 invalid UTF-8 string for testing purposes.
1458 The following modifiers set options (in additional to the normal match options)
1476 After a successful substitution, the modified string is output, preceded by the
1487 characters) for substitution tests, as fixed-size buffers are used. To make it
1490 the size of the output buffer, with the replacement string starting at the next
1497 Failed: error -47: no more memory
1500 PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
1501 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
1510 Failed: error -47: no more memory: 10 code units are needed
1520 If the \fBsubstitute_callout\fP modifier is set, a substitution callout
1521 function is set up. The \fBnull_context\fP modifier must not be set, because
1524 and output strings are output. For example:
1533 parenthesized number is the number of pairs that are set in the ovector (that
1534 is, one more than the number of capturing groups that were set). Then are
1542 of that number, and similarly \fBsubstitute_stop\fP returns -1. These cause the
1543 replacement to be rejected, and -1 causes no further matching to take place. If
1544 either of them are set, \fBsubstitute_callout\fP is assumed. For example:
1555 If both are set for the same number, stop takes precedence. Only a single skip
1563 that is used by the just-in-time optimization code. It is ignored if JIT
1567 patterns. If \fBjitstack\fP is set non-zero on a subject line it overrides any
1568 value that was set on the pattern.
1574 The \fBheap_limit\fP, \fBmatch_limit\fP, and \fBdepth_limit\fP modifiers set
1597 an in-pattern limit; they cannot increase it.
1599 For non-DFA matching, the minimum \fIdepth_limit\fP number is a measure of how
1605 For non-DFA matching, the \fImatch_limit\fP number is a measure of the amount
1611 and non-recursive, to the internal matching function, thus controlling the
1624 returned for a match, non-match, or partial match, \fBpcre2test\fP shows it.
1626 is added to the non-match message.
1636 allocation on the stack, so in many cases there will be no output. No heap
1638 \fBnull_context\fP modifier must not be set on both the pattern and the
1639 subject, though it can be set on one or the other.
1654 unless there was a previous non-JIT match. Note that specifing a size of zero
1655 for the output vector (see below) causes \fBpcre2test\fP to free its match data
1672 this modifier is used, the \fBuse_offset_limit\fP modifier must have been set
1676 .SS "Setting the size of the output vector"
1680 appears, though of course it can also be used to set a default in a
1689 to create a match block with a zero-length ovector; there is always at least
1693 .SS "Passing the subject as zero-terminated"
1697 its correct length. In order to test the facility for passing a zero-terminated
1703 passing the replacement string as zero-terminated.
1711 If the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
1718 \fBnull_replacement\fP modifier is set, the subject or replacement string
1735 If the \fBdfa\fP modifier is set, the alternative matching function is used.
1737 however, the \fBdfa_shortest\fP modifier is set, processing stops after the
1741 .SH "DEFAULT OUTPUT FROM pcre2test"
1744 This section describes the output when the normal matching function,
1758 code unit offset of the start of the failing character is also output. Here is
1762 PCRE2 version 10.22 2016-07-29
1771 Unset capturing substrings that are not followed by one that is set are not
1786 If the strings contain any non-printing characters, they are output as \exhh
1787 escapes if the value is less than 256 and UTF mode is not set. Otherwise they
1788 are output as \ex{hh...} escapes. See below for the definition of non-printing
1789 characters. If the \fBaftertext\fP modifier is set, the output for substring 0
1798 are output in sequence, like this:
1809 "No match" is output only if the first match attempt fails. Here is an example
1815 Error -24 (bad offset value)
1824 .SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
1828 output consists of a list of all the matches that start at the first point in
1839 PCRE2_ERROR_PARTIAL return, the output is "Partial match:", followed by the
1889 differences in behaviour. The output for callouts with numerical arguments and
1900 --->pqrabcdef
1903 This output indicates that callout number 0 occurred for a match attempt
1906 one circumflex is output if the start and current positions are the same, or if
1913 output. For example:
1915 re> /\ed?[A-E]\e*/auto_callout
1917 --->E*
1919 +3 ^ [A-E]
1924 If a pattern contains (*MARK) items, an additional line is output whenever
1929 --->abc
1939 of the match, so nothing more is output. If, as a result of backtracking, the
1940 mark reverts to being unset, the text "<unset>" is output.
1946 The output for a callout with a string argument is similar, except that instead
1948 string and its offset in the pattern string are output before the reflection of
1955 --->abcdefg
1958 --->abcdefg
1971 If the \fBcallout_capture\fP modifier is set, the current captured groups are
1972 output when a callout occurs. This is useful only for non-DFA matching, as
1976 The normal callout output, showing the callout number or pattern offset (as
1977 described above) is suppressed if the \fBcallout_no_where\fP modifier is set.
1980 setting the \fBcallout_extra\fP modifier causes additional output from
1983 output. If there has been a backtrack since the last callout (or start of
1984 matching if this is the first callout), "Backtrack" is output, followed by "No
1991 --->aac
1997 --->aac
2003 --->aac
2011 --->aac
2017 --->aac
2043 aborted. If both these modifiers are set for the same callout number,
2048 This is set as the "user data" that is passed to the matching function, and
2062 .SH "NON-PRINTING CHARACTERS"
2066 bytes other than 32-126 are always treated as non-printing characters and are
2070 string, it behaves in the same way, unless a different locale has been set for
2072 \fBisprint()\fP function is used to distinguish printing and non-printing
2092 for serializing and de-serializing. They are described in the
2123 reads the data in the file, and then arranges for it to be de-serialized, with
2138 option-setting modifiers.
2185 Copyright (c) 1997-2024 University of Cambridge.