• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1PCRE2TEST(1)                General Commands Manual               PCRE2TEST(1)
2
3
4
5NAME
6       pcre2test - a program for testing Perl-compatible regular expressions.
7
8SYNOPSIS
9
10       pcre2test [options] [input file [output file]]
11
12       pcre2test is a test program for the PCRE2 regular expression libraries,
13       but it can also be used for  experimenting  with  regular  expressions.
14       This  document  describes the features of the test program; for details
15       of the regular expressions themselves, see the pcre2pattern  documenta-
16       tion.  For  details  of  the  PCRE2  library  function  calls and their
17       options, see the pcre2api documentation.
18
19       The input for pcre2test is a sequence of  regular  expression  patterns
20       and  subject  strings  to  be matched. There are also command lines for
21       setting defaults and controlling some special actions. The output shows
22       the  result  of  each  match attempt. Modifiers on external or internal
23       command lines, the patterns, and the subject lines specify PCRE2  func-
24       tion  options, control how the subject is processed, and what output is
25       produced.
26
27       As the original fairly simple PCRE library evolved,  it  acquired  many
28       different  features,  and  as  a  result, the original pcretest program
29       ended up with a lot of options in a messy, arcane syntax,  for  testing
30       all the features. The move to the new PCRE2 API provided an opportunity
31       to re-implement the test program as pcre2test, with a cleaner  modifier
32       syntax.  Nevertheless,  there are still many obscure modifiers, some of
33       which are specifically designed for use in conjunction  with  the  test
34       script  and  data  files that are distributed as part of PCRE2. All the
35       modifiers are documented here, some  without  much  justification,  but
36       many  of  them  are  unlikely  to  be  of  use  except when testing the
37       libraries.
38
39
40PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
41
42       Different versions of the PCRE2 library can be built to support charac-
43       ter  strings  that  are encoded in 8-bit, 16-bit, or 32-bit code units.
44       One, two, or  all  three  of  these  libraries  may  be  simultaneously
45       installed. The pcre2test program can be used to test all the libraries.
46       However, its own input and output are  always  in  8-bit  format.  When
47       testing  the  16-bit  or 32-bit libraries, patterns and subject strings
48       are converted to 16- or  32-bit  format  before  being  passed  to  the
49       library  functions.  Results are converted back to 8-bit code units for
50       output.
51
52       In the rest of this document, the names of library functions and struc-
53       tures  are  given  in  generic  form,  for example, pcre_compile(). The
54       actual names used in the libraries have a suffix _8, _16,  or  _32,  as
55       appropriate.
56
57
58INPUT ENCODING
59
60       Input  to  pcre2test is processed line by line, either by calling the C
61       library's fgets() function, or via the libreadline library (see below).
62       The  input  is  processed using using C's string functions, so must not
63       contain binary zeroes, even though in Unix-like  environments,  fgets()
64       treats any bytes other than newline as data characters. In some Windows
65       environments character 26 (hex 1A) causes an immediate end of file, and
66       no further data is read.
67
68       For  maximum portability, therefore, it is safest to avoid non-printing
69       characters in pcre2test input files. There is a facility for specifying
70       some or all of a pattern's characters as hexadecimal pairs, thus making
71       it possible to include binary zeroes in a pattern for testing purposes.
72       Subject  lines are processed for backslash escapes, which makes it pos-
73       sible to include any data value.
74
75
76COMMAND LINE OPTIONS
77
78       -8        If the 8-bit library has been built, this option causes it to
79                 be  used  (this is the default). If the 8-bit library has not
80                 been built, this option causes an error.
81
82       -16       If the 16-bit library has been built, this option  causes  it
83                 to  be  used. If only the 16-bit library has been built, this
84                 is the default. If the 16-bit library  has  not  been  built,
85                 this option causes an error.
86
87       -32       If  the  32-bit library has been built, this option causes it
88                 to be used. If only the 32-bit library has been  built,  this
89                 is  the  default.  If  the 32-bit library has not been built,
90                 this option causes an error.
91
92       -b        Behave as if each pattern has the /fullbincode modifier;  the
93                 full internal binary form of the pattern is output after com-
94                 pilation.
95
96       -C        Output the version number  of  the  PCRE2  library,  and  all
97                 available  information  about  the optional features that are
98                 included, and then  exit  with  zero  exit  code.  All  other
99                 options are ignored.
100
101       -C option Output  information  about a specific build-time option, then
102                 exit. This functionality is intended for use in scripts  such
103                 as  RunTest.  The  following options output the value and set
104                 the exit code as indicated:
105
106                   ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
107                                0x15 or 0x25
108                                0 if used in an ASCII environment
109                                exit code is always 0
110                   linksize   the configured internal link size (2, 3, or 4)
111                                exit code is set to the link size
112                   newline    the default newline setting:
113                                CR, LF, CRLF, ANYCRLF, or ANY
114                                exit code is always 0
115                   bsr        the default setting for what \R matches:
116                                ANYCRLF or ANY
117                                exit code is always 0
118
119                 The following options output 1 for true or 0 for  false,  and
120                 set the exit code to the same value:
121
122                   backslash-C  \C is supported (not locked out)
123                   ebcdic       compiled for an EBCDIC environment
124                   jit          just-in-time support is available
125                   pcre2-16     the 16-bit library was built
126                   pcre2-32     the 32-bit library was built
127                   pcre2-8      the 8-bit library was built
128                   unicode      Unicode support is available
129
130                 If  an  unknown  option is given, an error message is output;
131                 the exit code is 0.
132
133       -d        Behave as if each pattern has the debug modifier; the  inter-
134                 nal form and information about the compiled pattern is output
135                 after compilation; -d is equivalent to -b -i.
136
137       -dfa      Behave as if each subject line has the dfa modifier; matching
138                 is  done  using the pcre2_dfa_match() function instead of the
139                 default pcre2_match().
140
141       -error number[,number,...]
142                 Call pcre2_get_error_message() for each of the error  numbers
143                 in  the  comma-separated list, display the resulting messages
144                 on the standard output, then exit with zero  exit  code.  The
145                 numbers  may  be  positive or negative. This is a convenience
146                 facility for PCRE2 maintainers.
147
148       -help     Output a brief summary these options and then exit.
149
150       -i        Behave as if each pattern has the /info modifier; information
151                 about the compiled pattern is given after compilation.
152
153       -jit      Behave  as  if  each pattern line has the jit modifier; after
154                 successful compilation, each pattern is passed to  the  just-
155                 in-time compiler, if available.
156
157       -pattern modifier-list
158                 Behave as if each pattern line contains the given modifiers.
159
160       -q        Do not output the version number of pcre2test at the start of
161                 execution.
162
163       -S size   On Unix-like systems, set the size of the run-time  stack  to
164                 size megabytes.
165
166       -subject modifier-list
167                 Behave as if each subject line contains the given modifiers.
168
169       -t        Run  each compile and match many times with a timer, and out-
170                 put the resulting times per compile or  match.  When  JIT  is
171                 used,  separate  times  are given for the initial compile and
172                 the JIT compile. You can control  the  number  of  iterations
173                 that  are used for timing by following -t with a number (as a
174                 separate item on the command line). For  example,  "-t  1000"
175                 iterates 1000 times. The default is to iterate 500,000 times.
176
177       -tm       This is like -t except that it times only the matching phase,
178                 not the compile phase.
179
180       -T -TM    These behave like -t and -tm, but in addition, at the end  of
181                 a  run, the total times for all compiles and matches are out-
182                 put.
183
184       -version  Output the PCRE2 version number and then exit.
185
186
187DESCRIPTION
188
189       If pcre2test is given two filename arguments, it reads from  the  first
190       and writes to the second. If the first name is "-", input is taken from
191       the standard input. If pcre2test is given only one argument,  it  reads
192       from that file and writes to stdout. Otherwise, it reads from stdin and
193       writes to stdout.
194
195       When pcre2test is built, a configuration option  can  specify  that  it
196       should  be linked with the libreadline or libedit library. When this is
197       done, if the input is from a terminal, it is read using the  readline()
198       function. This provides line-editing and history facilities. The output
199       from the -help option states whether or not readline() will be used.
200
201       The program handles any number of tests, each of which  consists  of  a
202       set  of input lines. Each set starts with a regular expression pattern,
203       followed by any number of subject lines to be matched against that pat-
204       tern. In between sets of test data, command lines that begin with # may
205       appear. This file format, with some restrictions, can also be processed
206       by  the perltest.sh script that is distributed with PCRE2 as a means of
207       checking that the behaviour of PCRE2 and Perl is the same.
208
209       When the input is a terminal, pcre2test prompts for each line of input,
210       using  "re>"  to prompt for regular expression patterns, and "data>" to
211       prompt for subject lines. Command lines starting with # can be  entered
212       only in response to the "re>" prompt.
213
214       Each  subject line is matched separately and independently. If you want
215       to do multi-line matches, you have to use the \n escape sequence (or \r
216       or  \r\n,  etc.,  depending on the newline setting) in a single line of
217       input to encode the newline sequences. There is no limit on the  length
218       of  subject  lines; the input buffer is automatically extended if it is
219       too small. There are replication features that  makes  it  possible  to
220       generate  long  repetitive  pattern  or subject lines without having to
221       supply them explicitly.
222
223       An empty line or the end of the file signals the  end  of  the  subject
224       lines  for  a  test,  at  which  point a new pattern or command line is
225       expected if there is still input to be read.
226
227
228COMMAND LINES
229
230       In between sets of test data, a line that begins with # is  interpreted
231       as a command line. If the first character is followed by white space or
232       an exclamation mark, the line is treated as  a  comment,  and  ignored.
233       Otherwise, the following commands are recognized:
234
235         #forbid_utf
236
237       Subsequent   patterns   automatically   have  the  PCRE2_NEVER_UTF  and
238       PCRE2_NEVER_UCP options set, which locks out the use of  the  PCRE2_UTF
239       and  PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start of
240       patterns. This command also forces an error  if  a  subsequent  pattern
241       contains  any  occurrences  of \P, \p, or \X, which are still supported
242       when PCRE2_UTF is not set, but which require Unicode  property  support
243       to be included in the library.
244
245       This  is  a trigger guard that is used in test files to ensure that UTF
246       or Unicode property tests are not accidentally added to files that  are
247       used  when  Unicode  support  is  not  included in the library. Setting
248       PCRE2_NEVER_UTF and PCRE2_NEVER_UCP as a default can also  be  obtained
249       by  the  use  of #pattern; the difference is that #forbid_utf cannot be
250       unset, and the automatic options are not displayed in pattern  informa-
251       tion, to avoid cluttering up test output.
252
253         #load <filename>
254
255       This command is used to load a set of precompiled patterns from a file,
256       as described in the section entitled  "Saving  and  restoring  compiled
257       patterns" below.
258
259         #newline_default [<newline-list>]
260
261       When  PCRE2  is  built,  a default newline convention can be specified.
262       This determines which characters and/or character pairs are  recognized
263       as indicating a newline in a pattern or subject string. The default can
264       be overridden when a pattern is compiled. The standard test files  con-
265       tain  tests  of  various  newline  conventions, but the majority of the
266       tests expect a single  linefeed  to  be  recognized  as  a  newline  by
267       default. Without special action the tests would fail when PCRE2 is com-
268       piled with either CR or CRLF as the default newline.
269
270       The #newline_default command specifies a list of newline types that are
271       acceptable  as the default. The types must be one of CR, LF, CRLF, ANY-
272       CRLF, or ANY (in upper or lower case), for example:
273
274         #newline_default LF Any anyCRLF
275
276       If the default newline is in the list, this command has no effect. Oth-
277       erwise,  except  when  testing  the  POSIX API, a newline modifier that
278       specifies the first newline convention in the list  (LF  in  the  above
279       example)  is  added to any pattern that does not already have a newline
280       modifier. If the newline list is empty, the feature is turned off. This
281       command is present in a number of the standard test input files.
282
283       When  the  POSIX  API  is  being tested there is no way to override the
284       default newline convention, though it is possible to  set  the  newline
285       convention  from  within  the  pattern. A warning is given if the posix
286       modifier is used when #newline_default would set a default for the non-
287       POSIX API.
288
289         #pattern <modifier-list>
290
291       This  command  sets  a default modifier list that applies to all subse-
292       quent patterns. Modifiers on a pattern can change these settings.
293
294         #perltest
295
296       The appearance of this line causes all subsequent modifier settings  to
297       be checked for compatibility with the perltest.sh script, which is used
298       to confirm that Perl gives the same results as PCRE2. Also, apart  from
299       comment  lines,  none of the other command lines are permitted, because
300       they and many of the modifiers are specific to  pcre2test,  and  should
301       not  be  used in test files that are also processed by perltest.sh. The
302       #perltest command helps detect tests that are accidentally put  in  the
303       wrong file.
304
305         #pop [<modifiers>]
306         #popcopy [<modifiers>]
307
308       These  commands  are used to manipulate the stack of compiled patterns,
309       as described in the section entitled  "Saving  and  restoring  compiled
310       patterns" below.
311
312         #save <filename>
313
314       This  command  is used to save a set of compiled patterns to a file, as
315       described in the section entitled "Saving and restoring  compiled  pat-
316       terns" below.
317
318         #subject <modifier-list>
319
320       This  command  sets  a default modifier list that applies to all subse-
321       quent subject lines. Modifiers on a subject line can change these  set-
322       tings.
323
324
325MODIFIER SYNTAX
326
327       Modifier lists are used with both pattern and subject lines. Items in a
328       list are separated by commas followed by optional white space. Trailing
329       whitespace  in  a modifier list is ignored. Some modifiers may be given
330       for both patterns and subject lines, whereas others are valid only  for
331       one  or  the  other.  Each  modifier  has  a  long  name,  for  example
332       "anchored", and some of them must be followed by an equals sign  and  a
333       value,  for  example,  "offset=12". Values cannot contain comma charac-
334       ters, but may contain spaces. Modifiers that do not take values may  be
335       preceded by a minus sign to turn off a previous setting.
336
337       A few of the more common modifiers can also be specified as single let-
338       ters, for example "i" for "caseless". In documentation,  following  the
339       Perl convention, these are written with a slash ("the /i modifier") for
340       clarity. Abbreviated modifiers must all be concatenated  in  the  first
341       item  of a modifier list. If the first item is not recognized as a long
342       modifier name, it is interpreted as a sequence of these  abbreviations.
343       For example:
344
345         /abc/ig,newline=cr,jit=3
346
347       This  is  a pattern line whose modifier list starts with two one-letter
348       modifiers (/i and /g). The lower-case  abbreviated  modifiers  are  the
349       same as used in Perl.
350
351
352PATTERN SYNTAX
353
354       A  pattern line must start with one of the following characters (common
355       symbols, excluding pattern meta-characters):
356
357         / ! " ' ` - = _ : ; , % & @ ~
358
359       This is interpreted as the pattern's delimiter.  A  regular  expression
360       may  be  continued  over several input lines, in which case the newline
361       characters are included within it. It is possible to include the delim-
362       iter within the pattern by escaping it with a backslash, for example
363
364         /abc\/def/
365
366       If  you do this, the escape and the delimiter form part of the pattern,
367       but since the delimiters are all non-alphanumeric, this does not affect
368       its  interpretation.  If  the terminating delimiter is immediately fol-
369       lowed by a backslash, for example,
370
371         /abc/\
372
373       then a backslash is added to the end of the pattern. This  is  done  to
374       provide  a  way of testing the error condition that arises if a pattern
375       finishes with a backslash, because
376
377         /abc\/
378
379       is interpreted as the first line of a pattern that starts with  "abc/",
380       causing  pcre2test to read the next line as a continuation of the regu-
381       lar expression.
382
383       A pattern can be followed by a modifier list (details below).
384
385
386SUBJECT LINE SYNTAX
387
388       Before   each   subject   line   is   passed   to   pcre2_match()    or
389       pcre2_dfa_match(), leading and trailing white space is removed, and the
390       line is scanned for backslash escapes. The following provide a means of
391       encoding non-printing characters in a visible way:
392
393         \a         alarm (BEL, \x07)
394         \b         backspace (\x08)
395         \e         escape (\x27)
396         \f         form feed (\x0c)
397         \n         newline (\x0a)
398         \r         carriage return (\x0d)
399         \t         tab (\x09)
400         \v         vertical tab (\x0b)
401         \nnn       octal character (up to 3 octal digits); always
402                      a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
403         \o{dd...}  octal character (any number of octal digits}
404         \xhh       hexadecimal byte (up to 2 hex digits)
405         \x{hh...}  hexadecimal character (any number of hex digits)
406
407       The use of \x{hh...} is not dependent on the use of the utf modifier on
408       the pattern. It is recognized always. There may be any number of  hexa-
409       decimal  digits  inside  the  braces; invalid values provoke error mes-
410       sages.
411
412       Note that \xhh specifies one byte rather than one  character  in  UTF-8
413       mode;  this  makes it possible to construct invalid UTF-8 sequences for
414       testing purposes. On the other hand, \x{hh} is interpreted as  a  UTF-8
415       character  in UTF-8 mode, generating more than one byte if the value is
416       greater than 127.  When testing the 8-bit library not  in  UTF-8  mode,
417       \x{hh} generates one byte for values less than 256, and causes an error
418       for greater values.
419
420       In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
421       possible to construct invalid UTF-16 sequences for testing purposes.
422
423       In  UTF-32  mode,  all  4- to 8-digit \x{...} values are accepted. This
424       makes it possible to construct invalid  UTF-32  sequences  for  testing
425       purposes.
426
427       There is a special backslash sequence that specifies replication of one
428       or more characters:
429
430         \[<characters>]{<count>}
431
432       This makes it possible to test long strings without having  to  provide
433       them as part of the file. For example:
434
435         \[abc]{4}
436
437       is  converted to "abcabcabcabc". This feature does not support nesting.
438       To include a closing square bracket in the characters, code it as \x5D.
439
440       A backslash followed by an equals sign marks the  end  of  the  subject
441       string and the start of a modifier list. For example:
442
443         abc\=notbol,notempty
444
445       If  the  subject  string is empty and \= is followed by whitespace, the
446       line is treated as a comment line, and is not used  for  matching.  For
447       example:
448
449         \= This is a comment.
450         abc\= This is an invalid modifier list.
451
452       A  backslash  followed  by  any  other  non-alphanumeric character just
453       escapes that character. A backslash followed by anything else causes an
454       error.  However,  if the very last character in the line is a backslash
455       (and there is no modifier list), it is ignored. This  gives  a  way  of
456       passing  an  empty line as data, since a real empty line terminates the
457       data input.
458
459
460PATTERN MODIFIERS
461
462       There are several types of modifier that can appear in  pattern  lines.
463       Except where noted below, they may also be used in #pattern commands. A
464       pattern's modifier list can add to or override default  modifiers  that
465       were set by a previous #pattern command.
466
467   Setting compilation options
468
469       The  following modifiers set options for pcre2_compile(). The most com-
470       mon ones have single-letter abbreviations. See pcre2api for a  descrip-
471       tion of their effects.
472
473             allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
474             alt_bsux                  set PCRE2_ALT_BSUX
475             alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
476             alt_verbnames             set PCRE2_ALT_VERBNAMES
477             anchored                  set PCRE2_ANCHORED
478             auto_callout              set PCRE2_AUTO_CALLOUT
479         /i  caseless                  set PCRE2_CASELESS
480             dollar_endonly            set PCRE2_DOLLAR_ENDONLY
481         /s  dotall                    set PCRE2_DOTALL
482             dupnames                  set PCRE2_DUPNAMES
483         /x  extended                  set PCRE2_EXTENDED
484             firstline                 set PCRE2_FIRSTLINE
485             match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
486         /m  multiline                 set PCRE2_MULTILINE
487             never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
488             never_ucp                 set PCRE2_NEVER_UCP
489             never_utf                 set PCRE2_NEVER_UTF
490             no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
491             no_auto_possess           set PCRE2_NO_AUTO_POSSESS
492             no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
493             no_start_optimize         set PCRE2_NO_START_OPTIMIZE
494             no_utf_check              set PCRE2_NO_UTF_CHECK
495             ucp                       set PCRE2_UCP
496             ungreedy                  set PCRE2_UNGREEDY
497             use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
498             utf                       set PCRE2_UTF
499
500       As well as turning on the PCRE2_UTF option, the utf modifier causes all
501       non-printing characters in output  strings  to  be  printed  using  the
502       \x{hh...}  notation. Otherwise, those less than 0x100 are output in hex
503       without the curly brackets.
504
505   Setting compilation controls
506
507       The following modifiers  affect  the  compilation  process  or  request
508       information about the pattern:
509
510             bsr=[anycrlf|unicode]     specify \R handling
511         /B  bincode                   show binary code without lengths
512             callout_info              show callout information
513             debug                     same as info,fullbincode
514             fullbincode               show binary code with lengths
515         /I  info                      show info about compiled pattern
516             hex                       unquoted characters are hexadecimal
517             jit[=<number>]            use JIT
518             jitfast                   use JIT fast path
519             jitverify                 verify JIT use
520             locale=<name>             use this locale
521             max_pattern_length=<n>    set the maximum pattern length
522             memory                    show memory used
523             newline=<type>            set newline type
524             null_context              compile with a NULL context
525             parens_nest_limit=<n>     set maximum parentheses depth
526             posix                     use the POSIX API
527             posix_nosub               use the POSIX API with REG_NOSUB
528             push                      push compiled pattern onto the stack
529             pushcopy                  push a copy onto the stack
530             stackguard=<number>       test the stackguard feature
531             tables=[0|1|2]            select internal tables
532
533       The effects of these modifiers are described in the following sections.
534
535   Newline and \R handling
536
537       The  bsr modifier specifies what \R in a pattern should match. If it is
538       set to "anycrlf", \R matches CR, LF, or CRLF only.  If  it  is  set  to
539       "unicode",  \R  matches  any  Unicode  newline sequence. The default is
540       specified when PCRE2 is built, with the default default being Unicode.
541
542       The newline modifier specifies which characters are to  be  interpreted
543       as newlines, both in the pattern and in subject lines. The type must be
544       one of CR, LF, CRLF, ANYCRLF, or ANY (in upper or lower case).
545
546   Information about a pattern
547
548       The debug modifier is a shorthand for info,fullbincode, requesting  all
549       available information.
550
551       The bincode modifier causes a representation of the compiled code to be
552       output after compilation. This information does not contain length  and
553       offset values, which ensures that the same output is generated for dif-
554       ferent internal link sizes and different code  unit  widths.  By  using
555       bincode,  the  same  regression tests can be used in different environ-
556       ments.
557
558       The fullbincode modifier, by contrast, does include length  and  offset
559       values.  This is used in a few special tests that run only for specific
560       code unit widths and link sizes, and is also useful for one-off tests.
561
562       The info modifier  requests  information  about  the  compiled  pattern
563       (whether  it  is anchored, has a fixed first character, and so on). The
564       information is obtained from the  pcre2_pattern_info()  function.  Here
565       are some typical examples:
566
567           re> /(?i)(^a|^b)/m,info
568         Capturing subpattern count = 1
569         Compile options: multiline
570         Overall options: caseless multiline
571         First code unit at start or follows newline
572         Subject length lower bound = 1
573
574           re> /(?i)abc/info
575         Capturing subpattern count = 0
576         Compile options: <none>
577         Overall options: caseless
578         First code unit = 'a' (caseless)
579         Last code unit = 'c' (caseless)
580         Subject length lower bound = 3
581
582       "Compile  options"  are those specified by modifiers; "overall options"
583       have added options that are taken or deduced from the pattern. If  both
584       sets  of  options are the same, just a single "options" line is output;
585       if there are no options, the line is  omitted.  "First  code  unit"  is
586       where  any  match must start; if there is more than one they are listed
587       as "starting code units". "Last code unit" is  the  last  literal  code
588       unit  that  must  be  present in any match. This is not necessarily the
589       last character. These lines are omitted if no starting or  ending  code
590       units are recorded.
591
592       The  callout_info  modifier requests information about all the callouts
593       in the pattern. A list of them is output at the end of any other infor-
594       mation that is requested. For each callout, either its number or string
595       is given, followed by the item that follows it in the pattern.
596
597   Passing a NULL context
598
599       Normally, pcre2test passes a context block to pcre2_compile().  If  the
600       null_context  modifier  is  set,  however,  NULL is passed. This is for
601       testing that pcre2_compile() behaves correctly in this  case  (it  uses
602       default values).
603
604   Specifying pattern characters in hexadecimal
605
606       The  hex  modifier specifies that the characters of the pattern, except
607       for substrings enclosed in single or double quotes, are  to  be  inter-
608       preted  as  pairs  of hexadecimal digits. This feature is provided as a
609       way of creating patterns that contain binary zeros and other non-print-
610       ing  characters.  White space is permitted between pairs of digits. For
611       example, this pattern contains three characters:
612
613         /ab 32 59/hex
614
615       Parts of such a pattern are taken literally  if  quoted.  This  pattern
616       contains  nine characters, only two of which are specified in hexadeci-
617       mal:
618
619         /ab "literal" 32/hex
620
621       Either single or double quotes may be used. There is no way of  includ-
622       ing the delimiter within a substring.
623
624       By  default,  pcre2test  passes  patterns as zero-terminated strings to
625       pcre2_compile(), giving the length as  PCRE2_ZERO_TERMINATED.  However,
626       for  patterns specified with the hex modifier, the actual length of the
627       pattern is passed.
628
629   Generating long repetitive patterns
630
631       Some tests use long patterns that are very repetitive. Instead of  cre-
632       ating  a very long input line for such a pattern, you can use a special
633       repetition feature, similar to the  one  described  for  subject  lines
634       above.  If  the  expand  modifier is present on a pattern, parts of the
635       pattern that have the form
636
637         \[<characters>]{<count>}
638
639       are expanded before the pattern is passed to pcre2_compile(). For exam-
640       ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
641       cannot be nested. An initial "\[" sequence is recognized only  if  "]{"
642       followed  by  decimal  digits and "}" is found later in the pattern. If
643       not, the characters remain in the pattern unaltered.
644
645       If part of an expanded pattern looks like an expansion, but  is  really
646       part of the actual pattern, unwanted expansion can be avoided by giving
647       two values in the quantifier. For example, \[AB]{6000,6000} is not rec-
648       ognized as an expansion item.
649
650       If  the  info modifier is set on an expanded pattern, the result of the
651       expansion is included in the information that is output.
652
653   JIT compilation
654
655       Just-in-time (JIT) compiling is a  heavyweight  optimization  that  can
656       greatly  speed  up pattern matching. See the pcre2jit documentation for
657       details. JIT compiling happens, optionally, after a  pattern  has  been
658       successfully  compiled into an internal form. The JIT compiler converts
659       this to optimized machine code. It needs to know whether the match-time
660       options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
661       because different code is generated for the different  cases.  See  the
662       partial  modifier in "Subject Modifiers" below for details of how these
663       options are specified for each match attempt.
664
665       JIT compilation is requested by the /jit pattern  modifier,  which  may
666       optionally be followed by an equals sign and a number in the range 0 to
667       7.  The three bits that make up the number specify which of  the  three
668       JIT operating modes are to be compiled:
669
670         1  compile JIT code for non-partial matching
671         2  compile JIT code for soft partial matching
672         4  compile JIT code for hard partial matching
673
674       The possible values for the /jit modifier are therefore:
675
676         0  disable JIT
677         1  normal matching only
678         2  soft partial matching only
679         3  normal and soft partial matching
680         4  hard partial matching only
681         6  soft and hard partial matching only
682         7  all three modes
683
684       If  no  number  is  given,  7 is assumed. The phrase "partial matching"
685       means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
686       PCRE2_PARTIAL_HARD  option set. Note that such a call may return a com-
687       plete match; the options enable the possibility of a partial match, but
688       do  not  require it. Note also that if you request JIT compilation only
689       for partial matching (for example, /jit=2) but do not set  the  partial
690       modifier  on  a  subject line, that match will not use JIT code because
691       none was compiled for non-partial matching.
692
693       If JIT compilation is successful, the compiled JIT code will  automati-
694       cally  be  used  when  an appropriate type of match is run, except when
695       incompatible run-time options are specified. For more details, see  the
696       pcre2jit  documentation. See also the jitstack modifier below for a way
697       of setting the size of the JIT stack.
698
699       If the jitfast modifier is specified, matching is done  using  the  JIT
700       "fast  path" interface, pcre2_jit_match(), which skips some of the san-
701       ity checks that are done by pcre2_match(), and of course does not  work
702       when  JIT  is not supported. If jitfast is specified without jit, jit=7
703       is assumed.
704
705       If the jitverify modifier is specified, information about the  compiled
706       pattern  shows  whether  JIT  compilation was or was not successful. If
707       jitverify is specified without jit, jit=7 is assumed. If  JIT  compila-
708       tion  is successful when jitverify is set, the text "(JIT)" is added to
709       the first output line after a match or non match when JIT-compiled code
710       was actually used in the match.
711
712   Setting a locale
713
714       The /locale modifier must specify the name of a locale, for example:
715
716         /pattern/locale=fr_FR
717
718       The given locale is set, pcre2_maketables() is called to build a set of
719       character tables for the locale, and this is then passed to  pcre2_com-
720       pile()  when compiling the regular expression. The same tables are used
721       when matching the following subject lines. The /locale modifier applies
722       only to the pattern on which it appears, but can be given in a #pattern
723       command if a default is needed. Setting a locale and alternate  charac-
724       ter tables are mutually exclusive.
725
726   Showing pattern memory
727
728       The  /memory  modifier  causes  the size in bytes of the memory used to
729       hold the compiled pattern to be output. This does not include the  size
730       of  the  pcre2_code  block; it is just the actual compiled data. If the
731       pattern is subsequently passed to the JIT compiler, the size of the JIT
732       compiled code is also output. Here is an example:
733
734           re> /a(b)c/jit,memory
735         Memory allocation (code space): 21
736         Memory allocation (JIT code): 1910
737
738
739   Limiting nested parentheses
740
741       The  parens_nest_limit  modifier  sets  a  limit on the depth of nested
742       parentheses in a pattern. Breaching  the  limit  causes  a  compilation
743       error.   The  default  for  the library is set when PCRE2 is built, but
744       pcre2test sets its own default of 220, which is  required  for  running
745       the standard test suite.
746
747   Limiting the pattern length
748
749       The  max_pattern_length  modifier  sets  a limit, in code units, to the
750       length of pattern that pcre2_compile() will accept. Breaching the limit
751       causes  a  compilation  error.  The  default  is  the  largest number a
752       PCRE2_SIZE variable can hold (essentially unlimited).
753
754   Using the POSIX wrapper API
755
756       The /posix and posix_nosub modifiers cause pcre2test to call PCRE2  via
757       the  POSIX  wrapper API rather than its native API. When posix_nosub is
758       used, the POSIX option REG_NOSUB is  passed  to  regcomp().  The  POSIX
759       wrapper  supports  only  the 8-bit library. Note that it does not imply
760       POSIX matching semantics; for more detail see the pcre2posix documenta-
761       tion.  The  following  pattern  modifiers set options for the regcomp()
762       function:
763
764         caseless           REG_ICASE
765         multiline          REG_NEWLINE
766         dotall             REG_DOTALL     )
767         ungreedy           REG_UNGREEDY   ) These options are not part of
768         ucp                REG_UCP        )   the POSIX standard
769         utf                REG_UTF8       )
770
771       The regerror_buffsize modifier specifies a size for  the  error  buffer
772       that  is  passed to regerror() in the event of a compilation error. For
773       example:
774
775         /abc/posix,regerror_buffsize=20
776
777       This provides a means of testing the behaviour of regerror()  when  the
778       buffer  is  too  small  for the error message. If this modifier has not
779       been set, a large buffer is used.
780
781       The aftertext and allaftertext  subject  modifiers  work  as  described
782       below.  All other modifiers are either ignored, with a warning message,
783       or cause an error.
784
785   Testing the stack guard feature
786
787       The /stackguard modifier is used to  test  the  use  of  pcre2_set_com-
788       pile_recursion_guard(),  a  function  that  is provided to enable stack
789       availability to be checked during compilation (see the  pcre2api  docu-
790       mentation  for  details).  If  the  number specified by the modifier is
791       greater than zero, pcre2_set_compile_recursion_guard() is called to set
792       up  callback  from pcre2_compile() to a local function. The argument it
793       receives is the current nesting parenthesis depth; if this  is  greater
794       than the value given by the modifier, non-zero is returned, causing the
795       compilation to be aborted.
796
797   Using alternative character tables
798
799       The value specified for the /tables modifier must be one of the  digits
800       0, 1, or 2. It causes a specific set of built-in character tables to be
801       passed to pcre2_compile(). This is used in the PCRE2 tests to check be-
802       haviour with different character tables. The digit specifies the tables
803       as follows:
804
805         0   do not pass any special character tables
806         1   the default ASCII tables, as distributed in
807               pcre2_chartables.c.dist
808         2   a set of tables defining ISO 8859 characters
809
810       In table 2, some characters whose codes are greater than 128 are  iden-
811       tified  as  letters,  digits,  spaces, etc. Setting alternate character
812       tables and a locale are mutually exclusive.
813
814   Setting certain match controls
815
816       The following modifiers are really subject modifiers, and are described
817       below.   However, they may be included in a pattern's modifier list, in
818       which case they are applied to every subject  line  that  is  processed
819       with that pattern. They may not appear in #pattern commands. These mod-
820       ifiers do not affect the compilation process.
821
822             aftertext                  show text after match
823             allaftertext               show text after captures
824             allcaptures                show all captures
825             allusedtext                show all consulted text
826         /g  global                     global matching
827             mark                       show mark values
828             replace=<string>           specify a replacement string
829             startchar                  show starting character when relevant
830             substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
831             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
832             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
833             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
834
835       These modifiers may not appear in a #pattern command. If you want  them
836       as defaults, set them in a #subject command.
837
838   Saving a compiled pattern
839
840       When  a  pattern with the push modifier is successfully compiled, it is
841       pushed onto a stack of compiled patterns,  and  pcre2test  expects  the
842       next  line to contain a new pattern (or a command) instead of a subject
843       line. This facility is used when saving compiled patterns to a file, as
844       described  in  the section entitled "Saving and restoring compiled pat-
845       terns" below. If pushcopy is used instead of push, a copy of  the  com-
846       piled  pattern  is  stacked,  leaving the original as current, ready to
847       match the following input lines. This provides a  way  of  testing  the
848       pcre2_code_copy()  function.   The  push  and  pushcopy   modifiers are
849       incompatible with compilation modifiers such  as  global  that  act  at
850       match  time. Any that are specified are ignored (for the stacked copy),
851       with a warning message, except for replace, which causes an error. Note
852       that  jitverify, which is allowed, does not carry through to any subse-
853       quent matching that uses a stacked pattern.
854
855
856SUBJECT MODIFIERS
857
858       The modifiers that can appear in subject lines and the #subject command
859       are of two types.
860
861   Setting match options
862
863       The    following   modifiers   set   options   for   pcre2_match()   or
864       pcre2_dfa_match(). See pcreapi for a description of their effects.
865
866             anchored                  set PCRE2_ANCHORED
867             dfa_restart               set PCRE2_DFA_RESTART
868             dfa_shortest              set PCRE2_DFA_SHORTEST
869             no_jit                    set PCRE2_NO_JIT
870             no_utf_check              set PCRE2_NO_UTF_CHECK
871             notbol                    set PCRE2_NOTBOL
872             notempty                  set PCRE2_NOTEMPTY
873             notempty_atstart          set PCRE2_NOTEMPTY_ATSTART
874             noteol                    set PCRE2_NOTEOL
875             partial_hard (or ph)      set PCRE2_PARTIAL_HARD
876             partial_soft (or ps)      set PCRE2_PARTIAL_SOFT
877
878       The partial matching modifiers are provided with abbreviations  because
879       they appear frequently in tests.
880
881       If  the  /posix  modifier was present on the pattern, causing the POSIX
882       wrapper API to be used, the only option-setting modifiers that have any
883       effect   are   notbol,   notempty,   and  noteol,  causing  REG_NOTBOL,
884       REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to  regexec().
885       The other modifiers are ignored, with a warning message.
886
887   Setting match controls
888
889       The  following  modifiers  affect the matching process or request addi-
890       tional information. Some of them may also be  specified  on  a  pattern
891       line  (see  above), in which case they apply to every subject line that
892       is matched against that pattern.
893
894             aftertext                  show text after match
895             allaftertext               show text after captures
896             allcaptures                show all captures
897             allusedtext                show all consulted text (non-JIT only)
898             altglobal                  alternative global matching
899             callout_capture            show captures at callout time
900             callout_data=<n>           set a value to pass via callouts
901             callout_fail=<n>[:<m>]     control callout failure
902             callout_none               do not supply a callout function
903             copy=<number or name>      copy captured substring
904             dfa                        use pcre2_dfa_match()
905             find_limits                find match and recursion limits
906             get=<number or name>       extract captured substring
907             getall                     extract all captured substrings
908         /g  global                     global matching
909             jitstack=<n>               set size of JIT stack
910             mark                       show mark values
911             match_limit=<n>            set a match limit
912             memory                     show memory usage
913             null_context               match with a NULL context
914             offset=<n>                 set starting offset
915             offset_limit=<n>           set offset limit
916             ovector=<n>                set size of output vector
917             recursion_limit=<n>        set a recursion limit
918             replace=<string>           specify a replacement string
919             startchar                  show startchar when relevant
920             startoffset=<n>            same as offset=<n>
921             substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
922             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
923             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
924             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
925             zero_terminate             pass the subject as zero-terminated
926
927       The effects of these modifiers are described in the following sections.
928       When  matching  via the POSIX wrapper API, the aftertext, allaftertext,
929       and ovector subject modifiers work as described below. All other  modi-
930       fiers are either ignored, with a warning message, or cause an error.
931
932   Showing more text
933
934       The  aftertext modifier requests that as well as outputting the part of
935       the subject string that matched the entire pattern, pcre2test should in
936       addition output the remainder of the subject string. This is useful for
937       tests where the subject contains multiple copies of the same substring.
938       The  allaftertext  modifier  requests the same action for captured sub-
939       strings as well as the main matched substring. In each case the remain-
940       der is output on the following line with a plus character following the
941       capture number.
942
943       The allusedtext modifier requests that all the text that was  consulted
944       during  a  successful pattern match by the interpreter should be shown.
945       This feature is not supported for JIT matching, and if  requested  with
946       JIT  it  is  ignored  (with  a  warning message). Setting this modifier
947       affects the output if there is a lookbehind at the start of a match, or
948       a  lookahead  at  the  end, or if \K is used in the pattern. Characters
949       that precede or follow the start and end of the actual match are  indi-
950       cated  in  the output by '<' or '>' characters underneath them. Here is
951       an example:
952
953           re> /(?<=pqr)abc(?=xyz)/
954         data> 123pqrabcxyz456\=allusedtext
955          0: pqrabcxyz
956             <<<   >>>
957
958       This shows that the matched string is "abc",  with  the  preceding  and
959       following  strings  "pqr"  and  "xyz"  having been consulted during the
960       match (when processing the assertions).
961
962       The startchar modifier requests that the  starting  character  for  the
963       match  be  indicated,  if  it  is different to the start of the matched
964       string. The only time when this occurs is when \K has been processed as
965       part of the match. In this situation, the output for the matched string
966       is displayed from the starting character  instead  of  from  the  match
967       point,  with  circumflex  characters  under the earlier characters. For
968       example:
969
970           re> /abc\Kxyz/
971         data> abcxyz\=startchar
972          0: abcxyz
973             ^^^
974
975       Unlike allusedtext, the startchar modifier can be used with JIT.   How-
976       ever, these two modifiers are mutually exclusive.
977
978   Showing the value of all capture groups
979
980       The allcaptures modifier requests that the values of all potential cap-
981       tured parentheses be output after a match. By default, only those up to
982       the highest one actually used in the match are output (corresponding to
983       the return code from pcre2_match()). Groups that did not take  part  in
984       the  match  are  output as "<unset>". This modifier is not relevant for
985       DFA matching (which does no capturing); it is ignored, with  a  warning
986       message, if present.
987
988   Testing callouts
989
990       A  callout function is supplied when pcre2test calls the library match-
991       ing functions, unless callout_none is specified. If callout_capture  is
992       set, the current captured groups are output when a callout occurs.
993
994       The  callout_fail modifier can be given one or two numbers. If there is
995       only one number, 1 is returned instead of 0 when a callout of that num-
996       ber  is  reached.  If two numbers are given, 1 is returned when callout
997       <n> is reached for the <m>th time. Note that callouts with string argu-
998       ments  are  always  given  the  number zero. See "Callouts" below for a
999       description of the output when a callout it taken.
1000
1001       The callout_data modifier can be given an unsigned or a  negative  num-
1002       ber.   This  is  set  as the "user data" that is passed to the matching
1003       function, and passed back when the callout  function  is  invoked.  Any
1004       value  other  than  zero  is  used as a return from pcre2test's callout
1005       function.
1006
1007   Finding all matches in a string
1008
1009       Searching for all possible matches within a subject can be requested by
1010       the  global or /altglobal modifier. After finding a match, the matching
1011       function is called again to search the remainder of  the  subject.  The
1012       difference  between  global  and  altglobal is that the former uses the
1013       start_offset argument to pcre2_match() or  pcre2_dfa_match()  to  start
1014       searching  at  a new point within the entire string (which is what Perl
1015       does), whereas the latter passes over a shortened subject. This makes a
1016       difference to the matching process if the pattern begins with a lookbe-
1017       hind assertion (including \b or \B).
1018
1019       If an empty string  is  matched,  the  next  match  is  done  with  the
1020       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
1021       for another, non-empty, match at the same point in the subject. If this
1022       match  fails,  the  start  offset  is advanced, and the normal match is
1023       retried. This imitates the way Perl handles such cases when  using  the
1024       /g  modifier  or  the  split()  function. Normally, the start offset is
1025       advanced by one character, but if  the  newline  convention  recognizes
1026       CRLF  as  a newline, and the current character is CR followed by LF, an
1027       advance of two characters occurs.
1028
1029   Testing substring extraction functions
1030
1031       The copy  and  get  modifiers  can  be  used  to  test  the  pcre2_sub-
1032       string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
1033       given more than once, and each can specify a group name or number,  for
1034       example:
1035
1036          abcd\=copy=1,copy=3,get=G1
1037
1038       If  the  #subject command is used to set default copy and/or get lists,
1039       these can be unset by specifying a negative number to cancel  all  num-
1040       bered groups and an empty name to cancel all named groups.
1041
1042       The  getall  modifier  tests pcre2_substring_list_get(), which extracts
1043       all captured substrings.
1044
1045       If the subject line is successfully matched, the  substrings  extracted
1046       by  the  convenience  functions  are  output  with C, G, or L after the
1047       string number instead of a colon. This is in  addition  to  the  normal
1048       full  list.  The string length (that is, the return from the extraction
1049       function) is given in parentheses after each substring, followed by the
1050       name when the extraction was by name.
1051
1052   Testing the substitution function
1053
1054       If  the  replace  modifier  is  set, the pcre2_substitute() function is
1055       called instead of one of the matching functions. Note that  replacement
1056       strings  cannot  contain commas, because a comma signifies the end of a
1057       modifier. This is not thought to be an issue in a test program.
1058
1059       Unlike subject strings, pcre2test does not process replacement  strings
1060       for  escape  sequences. In UTF mode, a replacement string is checked to
1061       see if it is a valid UTF-8 string. If so, it is correctly converted  to
1062       a  UTF  string of the appropriate code unit width. If it is not a valid
1063       UTF-8 string, the individual code units are copied directly. This  pro-
1064       vides a means of passing an invalid UTF-8 string for testing purposes.
1065
1066       The  following modifiers set options (in additional to the normal match
1067       options) for pcre2_substitute():
1068
1069         global                      PCRE2_SUBSTITUTE_GLOBAL
1070         substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
1071         substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1072         substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1073         substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
1074
1075
1076       After a successful substitution, the modified string  is  output,  pre-
1077       ceded  by the number of replacements. This may be zero if there were no
1078       matches. Here is a simple example of a substitution test:
1079
1080         /abc/replace=xxx
1081             =abc=abc=
1082          1: =xxx=abc=
1083             =abc=abc=\=global
1084          2: =xxx=xxx=
1085
1086       Subject and replacement strings should be kept relatively short  (fewer
1087       than  256 characters) for substitution tests, as fixed-size buffers are
1088       used. To make it easy to test for buffer overflow, if  the  replacement
1089       string  starts  with a number in square brackets, that number is passed
1090       to pcre2_substitute() as the  size  of  the  output  buffer,  with  the
1091       replacement  string  starting at the next character. Here is an example
1092       that tests the edge case:
1093
1094         /abc/
1095             123abc123\=replace=[10]XYZ
1096          1: 123XYZ123
1097             123abc123\=replace=[9]XYZ
1098         Failed: error -47: no more memory
1099
1100       The   default   action   of    pcre2_substitute()    is    to    return
1101       PCRE2_ERROR_NOMEMORY  when  the output buffer is too small. However, if
1102       the PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using  the  sub-
1103       stitute_overflow_length  modifier),  pcre2_substitute() continues to go
1104       through the motions of matching and substituting, in order  to  compute
1105       the size of buffer that is required. When this happens, pcre2test shows
1106       the required buffer length (which includes space for the trailing zero)
1107       as part of the error message. For example:
1108
1109         /abc/substitute_overflow_length
1110             123abc123\=replace=[9]XYZ
1111         Failed: error -47: no more memory: 10 code units are needed
1112
1113       A replacement string is ignored with POSIX and DFA matching. Specifying
1114       partial matching provokes an error return  ("bad  option  value")  from
1115       pcre2_substitute().
1116
1117   Setting the JIT stack size
1118
1119       The  jitstack modifier provides a way of setting the maximum stack size
1120       that is used by the just-in-time optimization code. It  is  ignored  if
1121       JIT optimization is not being used. The value is a number of kilobytes.
1122       Providing a stack that is larger than the default 32K is necessary only
1123       for very complicated patterns.
1124
1125   Setting match and recursion limits
1126
1127       The  match_limit and recursion_limit modifiers set the appropriate lim-
1128       its in the match context. These values are ignored when the find_limits
1129       modifier is specified.
1130
1131   Finding minimum limits
1132
1133       If  the  find_limits modifier is present, pcre2test calls pcre2_match()
1134       several times, setting  different  values  in  the  match  context  via
1135       pcre2_set_match_limit()  and pcre2_set_recursion_limit() until it finds
1136       the minimum values for each parameter that allow pcre2_match() to  com-
1137       plete without error.
1138
1139       If JIT is being used, only the match limit is relevant. If DFA matching
1140       is being used, neither limit is relevant, and this modifier is  ignored
1141       (with a warning message).
1142
1143       The  match_limit number is a measure of the amount of backtracking that
1144       takes place, and learning the minimum value  can  be  instructive.  For
1145       most  simple  matches, the number is quite small, but for patterns with
1146       very large numbers of matching possibilities, it can become large  very
1147       quickly    with    increasing    length    of   subject   string.   The
1148       match_limit_recursion number is a measure of how  much  stack  (or,  if
1149       PCRE2  is  compiled with NO_RECURSE, how much heap) memory is needed to
1150       complete the match attempt.
1151
1152   Showing MARK names
1153
1154
1155       The mark modifier causes the names from backtracking control verbs that
1156       are  returned from calls to pcre2_match() to be displayed. If a mark is
1157       returned for a match, non-match, or partial match, pcre2test shows  it.
1158       For  a  match, it is on a line by itself, tagged with "MK:". Otherwise,
1159       it is added to the non-match message.
1160
1161   Showing memory usage
1162
1163       The memory modifier causes pcre2test to log all memory  allocation  and
1164       freeing calls that occur during a match operation.
1165
1166   Setting a starting offset
1167
1168       The  offset  modifier  sets  an  offset  in the subject string at which
1169       matching starts. Its value is a number of code units, not characters.
1170
1171   Setting an offset limit
1172
1173       The offset_limit modifier sets a limit for  unanchored  matches.  If  a
1174       match cannot be found starting at or before this offset in the subject,
1175       a "no match" return is given. The data value is a number of code units,
1176       not  characters. When this modifier is used, the use_offset_limit modi-
1177       fier must have been set for the pattern; if not, an error is generated.
1178
1179   Setting the size of the output vector
1180
1181       The ovector modifier applies only to  the  subject  line  in  which  it
1182       appears,  though  of  course  it can also be used to set a default in a
1183       #subject command. It specifies the number of pairs of offsets that  are
1184       available for storing matching information. The default is 15.
1185
1186       A  value of zero is useful when testing the POSIX API because it causes
1187       regexec() to be called with a NULL capture vector. When not testing the
1188       POSIX  API,  a  value  of  zero  is used to cause pcre2_match_data_cre-
1189       ate_from_pattern() to be called, in order to create a  match  block  of
1190       exactly the right size for the pattern. (It is not possible to create a
1191       match block with a zero-length ovector; there is always  at  least  one
1192       pair of offsets.)
1193
1194   Passing the subject as zero-terminated
1195
1196       By default, the subject string is passed to a native API matching func-
1197       tion with its correct length. In order to test the facility for passing
1198       a  zero-terminated  string, the zero_terminate modifier is provided. It
1199       causes the length to be passed as PCRE2_ZERO_TERMINATED. (When matching
1200       via  the  POSIX  interface, this modifier has no effect, as there is no
1201       facility for passing a length.)
1202
1203       When testing pcre2_substitute(), this modifier also has the  effect  of
1204       passing the replacement string as zero-terminated.
1205
1206   Passing a NULL context
1207
1208       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
1209       pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
1210       set,  however,  NULL  is  passed. This is for testing that the matching
1211       functions behave correctly in this case (they use default values). This
1212       modifier  cannot  be used with the find_limits modifier or when testing
1213       the substitution function.
1214
1215
1216THE ALTERNATIVE MATCHING FUNCTION
1217
1218       By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
1219       pcre2_match() to match each subject line. PCRE2 also supports an alter-
1220       native matching function, pcre2_dfa_match(), which operates in  a  dif-
1221       ferent  way, and has some restrictions. The differences between the two
1222       functions are described in the pcre2matching documentation.
1223
1224       If the dfa modifier is set, the alternative matching function is  used.
1225       This  function  finds all possible matches at a given point in the sub-
1226       ject. If, however, the dfa_shortest modifier is set,  processing  stops
1227       after  the  first  match is found. This is always the shortest possible
1228       match.
1229
1230
1231DEFAULT OUTPUT FROM pcre2test
1232
1233       This section describes the output when the  normal  matching  function,
1234       pcre2_match(), is being used.
1235
1236       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
1237       strings, starting with number 0 for the string that matched  the  whole
1238       pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
1239       PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
1240       matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
1241       this is the entire substring that  was  inspected  during  the  partial
1242       match;  it  may  include  characters before the actual match start if a
1243       lookbehind assertion, \K, \b, or \B was involved.)
1244
1245       For any other return, pcre2test outputs the PCRE2 negative error number
1246       and  a  short  descriptive  phrase. If the error is a failed UTF string
1247       check, the code unit offset of the start of the  failing  character  is
1248       also output. Here is an example of an interactive pcre2test run.
1249
1250         $ pcre2test
1251         PCRE2 version 9.00 2014-05-10
1252
1253           re> /^abc(\d+)/
1254         data> abc123
1255          0: abc123
1256          1: 123
1257         data> xyz
1258         No match
1259
1260       Unset capturing substrings that are not followed by one that is set are
1261       not shown by pcre2test unless the allcaptures modifier is specified. In
1262       the following example, there are two capturing substrings, but when the
1263       first data line is matched, the second, unset substring is  not  shown.
1264       An  "internal" unset substring is shown as "<unset>", as for the second
1265       data line.
1266
1267           re> /(a)|(b)/
1268         data> a
1269          0: a
1270          1: a
1271         data> b
1272          0: b
1273          1: <unset>
1274          2: b
1275
1276       If the strings contain any non-printing characters, they are output  as
1277       \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
1278       Otherwise they are output as \x{hh...} escapes. See below for the defi-
1279       nition  of  non-printing characters. If the /aftertext modifier is set,
1280       the output for substring 0 is followed by the the rest of  the  subject
1281       string, identified by "0+" like this:
1282
1283           re> /cat/aftertext
1284         data> cataract
1285          0: cat
1286          0+ aract
1287
1288       If  global  matching  is  requested, the results of successive matching
1289       attempts are output in sequence, like this:
1290
1291           re> /\Bi(\w\w)/g
1292         data> Mississippi
1293          0: iss
1294          1: ss
1295          0: iss
1296          1: ss
1297          0: ipp
1298          1: pp
1299
1300       "No match" is output only if the first match attempt fails. Here is  an
1301       example  of  a  failure  message (the offset 4 that is specified by the
1302       offset modifier is past the end of the subject string):
1303
1304           re> /xyz/
1305         data> xyz\=offset=4
1306         Error -24 (bad offset value)
1307
1308       Note that whereas patterns can be continued over several lines (a plain
1309       ">"  prompt  is used for continuations), subject lines may not. However
1310       newlines can be included in a subject by means of the \n escape (or \r,
1311       \r\n, etc., depending on the newline sequence setting).
1312
1313
1314OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1315
1316       When the alternative matching function, pcre2_dfa_match(), is used, the
1317       output consists of a list of all the matches that start  at  the  first
1318       point in the subject where there is at least one match. For example:
1319
1320           re> /(tang|tangerine|tan)/
1321         data> yellow tangerine\=dfa
1322          0: tangerine
1323          1: tang
1324          2: tan
1325
1326       Using  the normal matching function on this data finds only "tang". The
1327       longest matching string is always  given  first  (and  numbered  zero).
1328       After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
1329       followed by the partially matching substring. Note  that  this  is  the
1330       entire  substring  that  was inspected during the partial match; it may
1331       include characters before the actual match start if a lookbehind asser-
1332       tion, \b, or \B was involved. (\K is not supported for DFA matching.)
1333
1334       If global matching is requested, the search for further matches resumes
1335       at the end of the longest match. For example:
1336
1337           re> /(tang|tangerine|tan)/g
1338         data> yellow tangerine and tangy sultana\=dfa
1339          0: tangerine
1340          1: tang
1341          2: tan
1342          0: tang
1343          1: tan
1344          0: tan
1345
1346       The alternative matching function does not support  substring  capture,
1347       so  the  modifiers  that are concerned with captured substrings are not
1348       relevant.
1349
1350
1351RESTARTING AFTER A PARTIAL MATCH
1352
1353       When the alternative matching function has given  the  PCRE2_ERROR_PAR-
1354       TIAL return, indicating that the subject partially matched the pattern,
1355       you can restart the match with additional subject data by means of  the
1356       dfa_restart modifier. For example:
1357
1358           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
1359         data> 23ja\=P,dfa
1360         Partial match: 23ja
1361         data> n05\=dfa,dfa_restart
1362          0: n05
1363
1364       For  further  information  about partial matching, see the pcre2partial
1365       documentation.
1366
1367
1368CALLOUTS
1369
1370       If the pattern contains any callout requests, pcre2test's callout func-
1371       tion  is called during matching unless callout_none is specified.  This
1372       works with both matching functions.
1373
1374       The callout function in pcre2test returns zero (carry on  matching)  by
1375       default,  but you can use a callout_fail modifier in a subject line (as
1376       described above) to change this and other parameters of the callout.
1377
1378       Inserting callouts can be helpful when using pcre2test to check compli-
1379       cated  regular expressions. For further information about callouts, see
1380       the pcre2callout documentation.
1381
1382       The output for callouts with numerical arguments and those with  string
1383       arguments is slightly different.
1384
1385   Callouts with numerical arguments
1386
1387       By default, the callout function displays the callout number, the start
1388       and current positions in the subject text at the callout time, and  the
1389       next pattern item to be tested. For example:
1390
1391         --->pqrabcdef
1392           0    ^  ^     \d
1393
1394       This  output  indicates  that  callout  number  0  occurred for a match
1395       attempt starting at the fourth character of the  subject  string,  when
1396       the  pointer  was  at  the seventh character, and when the next pattern
1397       item was \d. Just one circumflex is output if  the  start  and  current
1398       positions  are  the same, or if the current position precedes the start
1399       position, which can happen if the callout is in a lookbehind assertion.
1400
1401       Callouts numbered 255 are assumed to be automatic callouts, inserted as
1402       a  result  of the /auto_callout pattern modifier. In this case, instead
1403       of showing the callout number, the offset in the pattern, preceded by a
1404       plus, is output. For example:
1405
1406           re> /\d?[A-E]\*/auto_callout
1407         data> E*
1408         --->E*
1409          +0 ^      \d?
1410          +3 ^      [A-E]
1411          +8 ^^     \*
1412         +10 ^ ^
1413          0: E*
1414
1415       If a pattern contains (*MARK) items, an additional line is output when-
1416       ever a change of latest mark is passed to  the  callout  function.  For
1417       example:
1418
1419           re> /a(*MARK:X)bc/auto_callout
1420         data> abc
1421         --->abc
1422          +0 ^       a
1423          +1 ^^      (*MARK:X)
1424         +10 ^^      b
1425         Latest Mark: X
1426         +11 ^ ^     c
1427         +12 ^  ^
1428          0: abc
1429
1430       The  mark  changes between matching "a" and "b", but stays the same for
1431       the rest of the match, so nothing more is output. If, as  a  result  of
1432       backtracking,  the  mark  reverts to being unset, the text "<unset>" is
1433       output.
1434
1435   Callouts with string arguments
1436
1437       The output for a callout with a string argument is similar, except that
1438       instead  of outputting a callout number before the position indicators,
1439       the callout string and its offset in  the  pattern  string  are  output
1440       before  the reflection of the subject string, and the subject string is
1441       reflected for each callout. For example:
1442
1443           re> /^ab(?C'first')cd(?C"second")ef/
1444         data> abcdefg
1445         Callout (7): 'first'
1446         --->abcdefg
1447             ^ ^         c
1448         Callout (20): "second"
1449         --->abcdefg
1450             ^   ^       e
1451          0: abcdef
1452
1453
1454NON-PRINTING CHARACTERS
1455
1456       When pcre2test is outputting text in the compiled version of a pattern,
1457       bytes  other  than 32-126 are always treated as non-printing characters
1458       and are therefore shown as hex escapes.
1459
1460       When pcre2test is outputting text that is a matched part of  a  subject
1461       string,  it behaves in the same way, unless a different locale has been
1462       set for the pattern (using the /locale modifier).  In  this  case,  the
1463       isprint()  function  is  used  to distinguish printing and non-printing
1464       characters.
1465
1466
1467SAVING AND RESTORING COMPILED PATTERNS
1468
1469       It is possible to save compiled patterns  on  disc  or  elsewhere,  and
1470       reload them later, subject to a number of restrictions. JIT data cannot
1471       be saved. The host on which the patterns are reloaded must  be  running
1472       the same version of PCRE2, with the same code unit width, and must also
1473       have the same endianness, pointer width  and  PCRE2_SIZE  type.  Before
1474       compiled  patterns  can be saved they must be serialized, that is, con-
1475       verted to a stream of bytes. A single byte stream may contain any  num-
1476       ber  of  compiled  patterns,  but  they must all use the same character
1477       tables. A single copy of the tables is included in the byte stream (its
1478       size is 1088 bytes).
1479
1480       The  functions  whose  names  begin  with pcre2_serialize_ are used for
1481       serializing and de-serializing. They are described in the  pcre2serial-
1482       ize  documentation.  In  this  section  we  describe  the  features  of
1483       pcre2test that can be used to test these functions.
1484
1485       When a pattern with push  modifier  is  successfully  compiled,  it  is
1486       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
1487       next line to contain a new pattern (or command) instead  of  a  subject
1488       line.  By contrast, the pushcopy modifier causes a copy of the compiled
1489       pattern to be stacked, leaving the  original  available  for  immediate
1490       matching.  By  using  push and/or pushcopy, a number of patterns can be
1491       compiled and retained. These modifiers are incompatible with posix, and
1492       control  modifiers  that act at match time are ignored (with a message)
1493       for the stacked patterns. The jitverify modifier applies only  at  com-
1494       pile time.
1495
1496       The command
1497
1498         #save <filename>
1499
1500       causes all the stacked patterns to be serialized and the result written
1501       to the named file. Afterwards, all the stacked patterns are freed.  The
1502       command
1503
1504         #load <filename>
1505
1506       reads  the  data in the file, and then arranges for it to be de-serial-
1507       ized, with the resulting compiled patterns added to the pattern  stack.
1508       The  pattern  on the top of the stack can be retrieved by the #pop com-
1509       mand, which must be followed by  lines  of  subjects  that  are  to  be
1510       matched  with  the pattern, terminated as usual by an empty line or end
1511       of file. This command may be followed by  a  modifier  list  containing
1512       only  control  modifiers that act after a pattern has been compiled. In
1513       particular,  hex,  posix,  posix_nosub,  push,  and  pushcopy  are  not
1514       allowed,  nor are any option-setting modifiers.  The JIT modifiers are,
1515       however permitted. Here is an example that saves and reloads  two  pat-
1516       terns.
1517
1518         /abc/push
1519         /xyz/push
1520         #save tempfile
1521         #load tempfile
1522         #pop info
1523         xyz
1524
1525         #pop jit,bincode
1526         abc
1527
1528       If  jitverify  is  used with #pop, it does not automatically imply jit,
1529       which is different behaviour from when it is used on a pattern.
1530
1531       The #popcopy command is analagous to the pushcopy modifier in  that  it
1532       makes current a copy of the topmost stack pattern, leaving the original
1533       still on the stack.
1534
1535
1536SEE ALSO
1537
1538       pcre2(3),  pcre2api(3),  pcre2callout(3),  pcre2jit,  pcre2matching(3),
1539       pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
1540
1541
1542AUTHOR
1543
1544       Philip Hazel
1545       University Computing Service
1546       Cambridge, England.
1547
1548
1549REVISION
1550
1551       Last updated: 06 July 2016
1552       Copyright (c) 1997-2016 University of Cambridge.
1553