• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4
5NAME
6       pcre2grep - a grep with Perl-compatible regular expressions.
7
8SYNOPSIS
9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10
11
12DESCRIPTION
13
14       pcre2grep  searches  files  for  character patterns, in the same way as
15       other grep commands do, but it uses the PCRE2  regular  expression  li-
16       brary  to support patterns that are compatible with the regular expres-
17       sions of Perl 5. See pcre2syntax(3) for a  quick-reference  summary  of
18       pattern syntax, or pcre2pattern(3) for a full description of the syntax
19       and semantics of the regular expressions that PCRE2 supports.
20
21       Patterns, whether supplied on the command line or in a  separate  file,
22       are given without delimiters. For example:
23
24         pcre2grep Thursday /etc/motd
25
26       If you attempt to use delimiters (for example, by surrounding a pattern
27       with slashes, as is common in Perl scripts), they  are  interpreted  as
28       part  of  the pattern. Quotes can of course be used to delimit patterns
29       on the command line because they are interpreted by the shell, and  in-
30       deed  quotes  are  required  if a pattern contains white space or shell
31       metacharacters.
32
33       The first argument that follows any option settings is treated  as  the
34       single  pattern  to be matched when neither -e nor -f is present.  Con-
35       versely, when one or both of these options are  used  to  specify  pat-
36       terns, all arguments are treated as path names. At least one of -e, -f,
37       or an argument pattern must be provided.
38
39       If no files are specified, pcre2grep  reads  the  standard  input.  The
40       standard  input can also be referenced by a name consisting of a single
41       hyphen.  For example:
42
43         pcre2grep some-pattern file1 - file3
44
45       By default, input files are searched  line  by  line.  Each  line  that
46       matches  a  pattern  is  copied to the standard output, and if there is
47       more than one file, the file name is output at the start of each  line,
48       followed  by  a  colon.  However, there are options that can change how
49       pcre2grep behaves. For example, the -M  option  makes  it  possible  to
50       search  for  strings  that  span  line  boundaries. What defines a line
51       boundary is controlled by the -N (--newline) option. The -h and -H  op-
52       tions  control  whether  or not file names are shown, and the -Z option
53       changes the file name terminator to a zero byte.
54
55       The amount of memory used for buffering files that are being scanned is
56       controlled  by  parameters  that  can  be  set by the --buffer-size and
57       --max-buffer-size options. The first of these sets the size  of  buffer
58       that  is obtained at the start of processing. If an input file contains
59       very long lines, a larger buffer may be needed; this is handled by  au-
60       tomatically  extending  the buffer, up to the limit specified by --max-
61       buffer-size. The default values for these parameters can  be  set  when
62       pcre2grep  is  built;  if nothing is specified, the defaults are set to
63       20KiB and 1MiB respectively. An error occurs if a line is too long  and
64       the buffer can no longer be expanded.
65
66       The  block  of  memory that is actually used is three times the "buffer
67       size", to allow for buffering "before" and "after" lines. If the buffer
68       size  is too small, fewer than requested "before" and "after" lines may
69       be output.
70
71       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
72       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
73       pattern (specified by the use of -e and/or -f), each pattern is applied
74       to  each  line  in the order in which they are defined, except that all
75       the -e patterns are tried before the -f patterns.
76
77       By default, as soon as one pattern matches a line, no further  patterns
78       are considered. However, if --colour (or --color) is used to colour the
79       matching substrings, or if --only-matching, --file-offsets, --line-off-
80       sets,  or  --output  is  used  to output only the part of the line that
81       matched (either shown literally, or as an  offset),  the  behaviour  is
82       different. In this situation, all the patterns are applied to the line.
83       If there is more than one match, the one that  begins  nearest  to  the
84       start  of  the subject is processed; if there is more than one match at
85       that position, the one with the  longest  matching  substring  is  pro-
86       cessed;  if the matching substrings are equal, the first match found is
87       processed.
88
89       Scanning with all the patterns resumes immediately following the match,
90       so  that  later  matches  on the same line can be found. Note, however,
91       that an overlapping match that starts in the middle  of  another  match
92       will not be processed.
93
94       The  above behaviour was changed at release 10.41 to be more compatible
95       with GNU grep. In earlier releases, pcre2grep did not recognize matches
96       from later patterns that were earlier in the subject.
97
98       Patterns  that can match an empty string are accepted, but empty string
99       matches  are  never  recognized.  An  example  is  the  pattern   "(su-
100       per)?(man)?",  in which all components are optional. This pattern finds
101       all occurrences of both "super" and  "man";  the  output  differs  from
102       matching  with  "super|man" when only the matching substrings are being
103       shown.
104
105       If the LC_ALL or LC_CTYPE environment variable is set,  pcre2grep  uses
106       the value to set a locale when calling the PCRE2 library.  The --locale
107       option can be used to override this.
108
109
110SUPPORT FOR COMPRESSED FILES
111
112       Compile-time options for pcre2grep can set it up to use libz or  libbz2
113       for  reading  compressed  files whose names end in .gz or .bz2, respec-
114       tively. You can find out whether your pcre2grep binary has support  for
115       one  or  both of these file types by running it with the --help option.
116       If the appropriate support is not present, all  files  are  treated  as
117       plain  text.  The standard input is always so treated. If a file with a
118       .gz or .bz2 extension is not in fact compressed, it is read as a  plain
119       text  file.  When  input  is  from  a  compressed .gz or .bz2 file, the
120       --line-buffered option is ignored.
121
122
123BINARY FILES
124
125       By default, a file that contains a binary zero byte  within  the  first
126       1024  bytes is identified as a binary file, and is processed specially.
127       However, if the newline type is specified as NUL,  that  is,  the  line
128       terminator is a binary zero, the test for a binary file is not applied.
129       See the --binary-files option for a means of changing  the  way  binary
130       files are handled.
131
132
133BINARY ZEROS IN PATTERNS
134
135       Patterns  passed  from the command line are strings that are terminated
136       by a binary zero, so cannot contain internal zeros.  However,  patterns
137       that are read from a file via the -f option may contain binary zeros.
138
139
140OPTIONS
141
142       The  order  in  which some of the options appear can affect the output.
143       For example, both the -H and -l options affect  the  printing  of  file
144       names.  Whichever  comes later in the command line will be the one that
145       takes effect. Similarly, except where noted  below,  if  an  option  is
146       given  twice,  the  later setting is used. Numerical values for options
147       may be followed by K  or  M,  to  signify  multiplication  by  1024  or
148       1024*1024 respectively.
149
150       --        This terminates the list of options. It is useful if the next
151                 item on the command line starts with a hyphen but is  not  an
152                 option.  This  allows for the processing of patterns and file
153                 names that start with hyphens.
154
155       -A number, --after-context=number
156                 Output up to number lines  of  context  after  each  matching
157                 line.  Fewer lines are output if the next match or the end of
158                 the file is reached, or if the  processing  buffer  size  has
159                 been set too small. If file names and/or line numbers are be-
160                 ing output, a hyphen separator is used instead of a colon for
161                 the  context  lines  (the -Z option can be used to change the
162                 file name terminator to a zero byte). A line containing  "--"
163                 is  output  between  each  group of lines, unless they are in
164                 fact contiguous in the input file. The value of number is ex-
165                 pected  to  be  relatively  small. When -c is used, -A is ig-
166                 nored.
167
168       -a, --text
169                 Treat binary files as text. This is equivalent  to  --binary-
170                 files=text.
171
172       --allow-lookaround-bsk
173                 PCRE2 now forbids the use of \K in lookarounds by default, in
174                 line with Perl.  This option  causes  pcre2grep  to  set  the
175                 PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK  option,  which enables this
176                 somewhat dangerous usage.
177
178       -B number, --before-context=number
179                 Output up to number lines of  context  before  each  matching
180                 line.  Fewer  lines  are  output if the previous match or the
181                 start of the file is within number lines, or if the  process-
182                 ing  buffer size has been set too small. If file names and/or
183                 line numbers are being output, a hyphen separator is used in-
184                 stead  of a colon for the context lines (the -Z option can be
185                 used to change the file name terminator to a  zero  byte).  A
186                 line  containing  "--" is output between each group of lines,
187                 unless they are in fact contiguous in  the  input  file.  The
188                 value  of  number is expected to be relatively small. When -c
189                 is used, -B is ignored.
190
191       --binary-files=word
192                 Specify how binary files are to be processed. If the word  is
193                 "binary"  (the default), pattern matching is performed on bi-
194                 nary files, but  the  only  output  is  "Binary  file  <name>
195                 matches"  when a match succeeds. If the word is "text", which
196                 is equivalent to the -a or --text option,  binary  files  are
197                 processed  in  the  same way as any other file. In this case,
198                 when a match succeeds, the  output  may  be  binary  garbage,
199                 which  can  have  nasty effects if sent to a terminal. If the
200                 word is "without-match", which is equivalent to  the  -I  op-
201                 tion, binary files are not processed at all; they are assumed
202                 not to be of interest and are  skipped  without  causing  any
203                 output or affecting the return code.
204
205       --buffer-size=number
206                 Set  the  parameter that controls how much memory is obtained
207                 at the start of processing for buffering files that are being
208                 scanned. See also --max-buffer-size below.
209
210       -C number, --context=number
211                 Output  number  lines  of  context both before and after each
212                 matching line.  This is equivalent to setting both -A and  -B
213                 to the same value.
214
215       -c, --count
216                 Do  not  output  lines from the files that are being scanned;
217                 instead output the number  of  lines  that  would  have  been
218                 shown, either because they matched, or, if -v is set, because
219                 they failed to match. By default, this count is  exactly  the
220                 same  as the number of lines that would have been output, but
221                 if the -M (multiline) option is used (without -v), there  may
222                 be  more suppressed lines than the count (that is, the number
223                 of matches).
224
225                 If no lines are selected, the number zero is output. If  sev-
226                 eral  files are are being scanned, a count is output for each
227                 of them and the -t option can be used to cause a total to  be
228                 output  at  the end. However, if the --files-with-matches op-
229                 tion is also used, only those files whose counts are  greater
230                 than zero are listed. When -c is used, the -A, -B, and -C op-
231                 tions are ignored.
232
233       --colour, --color
234                 If this option is given without any data, it is equivalent to
235                 "--colour=auto".   If  data  is required, it must be given in
236                 the same shell item, separated by an equals sign.
237
238       --colour=value, --color=value
239                 This option specifies under what circumstances the parts of a
240                 line that matched a pattern should be coloured in the output.
241                 It is ignored if --file-offsets, --line-offsets, or  --output
242                 is set. By default, output is not coloured. The value for the
243                 --colour  option  (which  is  optional,  see  above)  may  be
244                 "never",  "always",  or "auto". In the latter case, colouring
245                 happens only if the standard output is connected to a  termi-
246                 nal.   More resources are used when colouring is enabled, be-
247                 cause pcre2grep has to search for all possible matches  in  a
248                 line, not just one, in order to colour them all.
249
250                 The  colour  that  is used can be specified by setting one of
251                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
252                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
253                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
254                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
255                 variable should be a string of two numbers,  separated  by  a
256                 semicolon,  except  in  the  case  of GREP_COLORS, which must
257                 start with "ms=" or "mt=" followed by two semicolon-separated
258                 colours,  terminated  by the end of the string or by a colon.
259                 If GREP_COLORS does not start with "ms=" or "mt=" it  is  ig-
260                 nored, and GREP_COLOR is checked.
261
262                 If  the  string obtained from one of the above variables con-
263                 tains any characters other than semicolon or digits, the set-
264                 ting is ignored and the default colour is used. The string is
265                 copied directly into the control string for setting colour on
266                 a  terminal,  so it is your responsibility to ensure that the
267                 values make sense. If no  relevant  environment  variable  is
268                 set, the default is "1;31", which gives red.
269
270       -D action, --devices=action
271                 If  an  input path is not a regular file or a directory, "ac-
272                 tion" specifies how it is to be processed. Valid  values  are
273                 "read" (the default) or "skip" (silently skip the path).
274
275       -d action, --directories=action
276                 If an input path is a directory, "action" specifies how it is
277                 to be processed.  Valid values are  "read"  (the  default  in
278                 non-Windows  environments,  for compatibility with GNU grep),
279                 "recurse" (equivalent to the -r option), or "skip"  (silently
280                 skip  the  path, the default in Windows environments). In the
281                 "read" case, directories are read as if  they  were  ordinary
282                 files.  In some operating systems the effect of reading a di-
283                 rectory like this is an immediate end-of-file; in  others  it
284                 may provoke an error.
285
286       --depth-limit=number
287                 See --match-limit below.
288
289       -e pattern, --regex=pattern, --regexp=pattern
290                 Specify a pattern to be matched. This option can be used mul-
291                 tiple times in order to specify several patterns. It can also
292                 be  used  as a way of specifying a single pattern that starts
293                 with a hyphen. When -e is used, no argument pattern is  taken
294                 from  the  command  line;  all  arguments are treated as file
295                 names. There is no limit to the number of patterns. They  are
296                 applied to each line in the order in which they are defined.
297
298                 If  -f is used with -e, the command line patterns are matched
299                 first, followed by the patterns from the file(s), independent
300                 of the order in which these options are specified.
301
302       --exclude=pattern
303                 Files (but not directories) whose names match the pattern are
304                 skipped without being processed. This applies to  all  files,
305                 whether  listed  on  the  command line, obtained from --file-
306                 list, or by scanning a directory. The pattern is a PCRE2 reg-
307                 ular  expression,  and is matched against the final component
308                 of the file name, not the entire path. The -F, -w, and -x op-
309                 tions  do  not apply to this pattern. The option may be given
310                 any number of times in order to specify multiple patterns. If
311                 a  file  name matches both an --include and an --exclude pat-
312                 tern, it is excluded. There is no short form for this option.
313
314       --exclude-from=filename
315                 Treat each non-empty line of the file  as  the  data  for  an
316                 --exclude option. What constitutes a newline when reading the
317                 file is the operating system's default. The --newline  option
318                 has  no  effect on this option. This option may be given more
319                 than once in order to specify a number of files to read.
320
321       --exclude-dir=pattern
322                 Directories whose names match the pattern are skipped without
323                 being  processed, whatever the setting of the --recursive op-
324                 tion. This applies to all directories, whether listed on  the
325                 command  line,  obtained  from  --file-list, or by scanning a
326                 parent directory. The pattern is a PCRE2 regular  expression,
327                 and  is  matched against the final component of the directory
328                 name, not the entire path. The -F, -w, and -x options do  not
329                 apply  to this pattern. The option may be given any number of
330                 times in order to specify more than one pattern. If a  direc-
331                 tory  matches both --include-dir and --exclude-dir, it is ex-
332                 cluded. There is no short form for this option.
333
334       -F, --fixed-strings
335                 Interpret each data-matching  pattern  as  a  list  of  fixed
336                 strings,  separated  by newlines, instead of as a regular ex-
337                 pression. What constitutes a newline for this purpose is con-
338                 trolled by the --newline option. The -w (match as a word) and
339                 -x (match whole line) options can be used with -F.  They  ap-
340                 ply  to  each of the fixed strings. A line is selected if any
341                 of the fixed strings are found in it (subject to -w or -x, if
342                 present).  This  option applies only to the patterns that are
343                 matched against the contents of files; it does not  apply  to
344                 patterns  specified  by any of the --include or --exclude op-
345                 tions.
346
347       -f filename, --file=filename
348                 Read patterns from the file, one per line.  As  is  the  case
349                 with  patterns  on  the command line, no delimiters should be
350                 used. What constitutes a newline when reading the file is the
351                 operating  system's  default interpretation of \n. The --new-
352                 line option has no effect  on  this  option.  Trailing  white
353                 space is removed from each line, and blank lines are ignored.
354                 An empty file contains  no  patterns  and  therefore  matches
355                 nothing.  Patterns  read  from a file in this way may contain
356                 binary zeros, which are treated as ordinary data characters.
357
358                 If this option is given more than  once,  all  the  specified
359                 files  are read. A data line is output if any of the patterns
360                 match it. A file name can be given as "-"  to  refer  to  the
361                 standard  input.  When  -f is used, patterns specified on the
362                 command line using -e may also be present; they  are  matched
363                 before the file's patterns. However, no pattern is taken from
364                 the command line; all arguments are treated as the  names  of
365                 paths to be searched.
366
367       --file-list=filename
368                 Read  a  list  of  files  and/or  directories  that are to be
369                 scanned from the given file, one per line. What constitutes a
370                 newline  when  reading the file is the operating system's de-
371                 fault. Trailing white space is removed from  each  line,  and
372                 blank lines are ignored. These paths are processed before any
373                 that are listed on the command line. The  file  name  can  be
374                 given  as  "-"  to refer to the standard input. If --file and
375                 --file-list are both specified  as  "-",  patterns  are  read
376                 first.  This is useful only when the standard input is a ter-
377                 minal, from which further lines (the list of  files)  can  be
378                 read after an end-of-file indication. If this option is given
379                 more than once, all the specified files are read.
380
381       --file-offsets
382                 Instead of showing lines or parts of lines that  match,  show
383                 each  match  as  an  offset  from the start of the file and a
384                 length, separated by a comma. In this mode, --colour  has  no
385                 effect,  and no context is shown. That is, the -A, -B, and -C
386                 options are ignored. If there is more than  one  match  in  a
387                 line,  each of them is shown separately. This option is mutu-
388                 ally exclusive with  --output,  --line-offsets,  and  --only-
389                 matching.
390
391       -H, --with-filename
392                 Force  the  inclusion of the file name at the start of output
393                 lines when searching a single file. The file name is not nor-
394                 mally  shown  in  this case.  By default, for matching lines,
395                 the file name is followed by a colon; for  context  lines,  a
396                 hyphen separator is used. The -Z option can be used to change
397                 the terminator to a zero byte. If a line number is also being
398                 output, it follows the file name. When the -M option causes a
399                 pattern to match more than one line, only the first  is  pre-
400                 ceded  by  the  file name. This option overrides any previous
401                 -h, -l, or -L options.
402
403       -h, --no-filename
404                 Suppress the output file names when searching multiple files.
405                 File  names  are  normally  shown  when  multiple  files  are
406                 searched. By default, for matching lines, the  file  name  is
407                 followed by a colon; for context lines, a hyphen separator is
408                 used. The -Z option can be used to change the terminator to a
409                 zero  byte. If a line number is also being output, it follows
410                 the file name.  This option overrides any previous -H, -L, or
411                 -l options.
412
413       --heap-limit=number
414                 See --match-limit below.
415
416       --help    Output  a  help  message, giving brief details of the command
417                 options and file type support, and then exit.  Anything  else
418                 on the command line is ignored.
419
420       -I        Ignore   binary   files.  This  is  equivalent  to  --binary-
421                 files=without-match.
422
423       -i, --ignore-case
424                 Ignore upper/lower case distinctions during comparisons.
425
426       --include=pattern
427                 If any --include patterns are specified, the only files  that
428                 are processed are those whose names match one of the patterns
429                 and do not match an --exclude pattern. This option  does  not
430                 affect  directories,  but  it  applies  to all files, whether
431                 listed on the command line, obtained from --file-list, or  by
432                 scanning  a directory. The pattern is a PCRE2 regular expres-
433                 sion, and is matched against the final component of the  file
434                 name,  not the entire path. The -F, -w, and -x options do not
435                 apply to this pattern. The option may be given any number  of
436                 times.  If a file name matches both an --include and an --ex-
437                 clude pattern, it is excluded.  There is no  short  form  for
438                 this option.
439
440       --include-from=filename
441                 Treat  each  non-empty  line  of  the file as the data for an
442                 --include option. What constitutes a newline for this purpose
443                 is  the  operating system's default. The --newline option has
444                 no effect on this option. This option may be given any number
445                 of times; all the files are read.
446
447       --include-dir=pattern
448                 If  any --include-dir patterns are specified, the only direc-
449                 tories that are processed are those whose names match one  of
450                 the  patterns and do not match an --exclude-dir pattern. This
451                 applies to all directories, whether  listed  on  the  command
452                 line,  obtained from --file-list, or by scanning a parent di-
453                 rectory. The pattern is a PCRE2 regular  expression,  and  is
454                 matched  against  the  final component of the directory name,
455                 not the entire path. The -F, -w, and -x options do not  apply
456                 to this pattern. The option may be given any number of times.
457                 If a directory matches both --include-dir and  --exclude-dir,
458                 it is excluded. There is no short form for this option.
459
460       -L, --files-without-match
461                 Instead  of  outputting lines from the files, just output the
462                 names of the files that do not contain any lines  that  would
463                 have  been  output. Each file name is output once, on a sepa-
464                 rate line by default, but if the -Z option is set,  they  are
465                 separated  by  zero  bytes  instead  of newlines. This option
466                 overrides any previous -H, -h, or -l options.
467
468       -l, --files-with-matches
469                 Instead of outputting lines from the files, just  output  the
470                 names of the files containing lines that would have been out-
471                 put. Each file name is output once, on a separate  line,  but
472                 if the -Z option is set, they are separated by zero bytes in-
473                 stead of newlines. Searching normally  stops  as  soon  as  a
474                 matching  line is found in a file. However, if the -c (count)
475                 option is also used, matching continues in  order  to  obtain
476                 the  correct  count,  and  those files that have at least one
477                 match are listed along with their counts. Using  this  option
478                 with  -c is a way of suppressing the listing of files with no
479                 matches that occurs with -c on its own. This option overrides
480                 any previous -H, -h, or -L options.
481
482       --label=name
483                 This option supplies a name to be used for the standard input
484                 when file names are being output. If not supplied, "(standard
485                 input)" is used. There is no short form for this option.
486
487       --line-buffered
488                 When  this  option is given, non-compressed input is read and
489                 processed line by line, and the output is flushed after  each
490                 write.  By  default,  input  is  read in large chunks, unless
491                 pcre2grep can determine that it is reading from  a  terminal,
492                 which is currently possible only in Unix-like environments or
493                 Windows. Output to terminal is normally automatically flushed
494                 by  the  operating system. This option can be useful when the
495                 input or output is attached to a pipe and  you  do  not  want
496                 pcre2grep  to  buffer up large amounts of data.  However, its
497                 use will affect performance, and the  -M  (multiline)  option
498                 ceases  to  work. When input is from a compressed .gz or .bz2
499                 file, --line-buffered is ignored.
500
501       --line-offsets
502                 Instead of showing lines or parts of lines that  match,  show
503                 each match as a line number, the offset from the start of the
504                 line, and a length. The line number is terminated by a  colon
505                 (as  usual; see the -n option), and the offset and length are
506                 separated by a comma. In this mode, --colour has  no  effect,
507                 and  no context is shown. That is, the -A, -B, and -C options
508                 are ignored. If there is more than one match in a line,  each
509                 of  them  is shown separately. This option is mutually exclu-
510                 sive with --output, --file-offsets, and --only-matching.
511
512       --locale=locale-name
513                 This option specifies a locale to be used for pattern  match-
514                 ing.  It  overrides the value in the LC_ALL or LC_CTYPE envi-
515                 ronment variables. If no locale is specified, the  PCRE2  li-
516                 brary's default (usually the "C" locale) is used. There is no
517                 short form for this option.
518
519       -M, --multiline
520                 Allow patterns to match more than one line. When this  option
521                 is set, the PCRE2 library is called in "multiline" mode. This
522                 allows a matched string to extend past the end of a line  and
523                 continue  on one or more subsequent lines. Patterns used with
524                 -M may usefully contain literal newline characters and inter-
525                 nal  occurrences of ^ and $ characters. The output for a suc-
526                 cessful match may consist of more than one  line.  The  first
527                 line  is  the  line  in which the match started, and the last
528                 line is the line in which the match  ended.  If  the  matched
529                 string  ends  with a newline sequence, the output ends at the
530                 end of that line.  If -v is set,  none  of  the  lines  in  a
531                 multi-line  match  are output. Once a match has been handled,
532                 scanning restarts at the beginning of the line after the  one
533                 in which the match ended.
534
535                 The  newline  sequence  that separates multiple lines must be
536                 matched as part of the pattern.  For  example,  to  find  the
537                 phrase  "regular  expression" in a file where "regular" might
538                 be at the end of a line and "expression" at the start of  the
539                 next line, you could use this command:
540
541                   pcre2grep -M 'regular\s+expression' <file>
542
543                 The \s escape sequence matches any white space character, in-
544                 cluding newlines, and is followed by + so as to match  trail-
545                 ing  white  space  on the first line as well as possibly han-
546                 dling a two-character newline sequence.
547
548                 There is a limit to the number of lines that can be  matched,
549                 imposed  by  the way that pcre2grep buffers the input file as
550                 it scans it. With a  sufficiently  large  processing  buffer,
551                 this should not be a problem, but the -M option does not work
552                 when input is read line by line (see --line-buffered.)
553
554       -m number, --max-count=number
555                 Stop processing after finding number matching lines, or  non-
556                 matching  lines if -v is also set. Any trailing context lines
557                 are output after the final match.  In  multiline  mode,  each
558                 multiline  match counts as just one line for this purpose. If
559                 this limit is reached when reading the standard input from  a
560                 regular file, the file is left positioned just after the last
561                 matching line.  If -c is also set, the count that  is  output
562                 is  never  greater  than number. This option has no effect if
563                 used with -L, -l, or -q, or when just checking for a match in
564                 a binary file.
565
566       --match-limit=number
567                 Processing  some  regular expression patterns may take a very
568                 long time to search for all possible matching strings. Others
569                 may  require  a  very large amount of memory. There are three
570                 options that set resource limits for matching.
571
572                 The --match-limit option provides a means of limiting comput-
573                 ing  resource usage when processing patterns that are not go-
574                 ing to match, but which have a very large number of possibil-
575                 ities in their search trees. The classic example is a pattern
576                 that uses nested unlimited repeats. Internally, PCRE2  has  a
577                 counter  that  is  incremented each time around its main pro-
578                 cessing loop. If the value set by --match-limit  is  reached,
579                 an error occurs.
580
581                 The  --heap-limit  option specifies, as a number of kibibytes
582                 (units of 1024 bytes), the maximum amount of heap memory that
583                 may be used for matching.
584
585                 The  --depth-limit  option  limits  the depth of nested back-
586                 tracking points, which indirectly limits the amount of memory
587                 that is used. The amount of memory needed for each backtrack-
588                 ing point depends on the number of capturing  parentheses  in
589                 the pattern, so the amount of memory that is used before this
590                 limit acts varies from pattern to pattern. This limit  is  of
591                 use only if it is set smaller than --match-limit.
592
593                 There  are no short forms for these options. The default lim-
594                 its can be set when the PCRE2 library is  compiled;  if  they
595                 are  not specified, the defaults are very large and so effec-
596                 tively unlimited.
597
598       --max-buffer-size=number
599                 This limits the expansion of  the  processing  buffer,  whose
600                 initial  size can be set by --buffer-size. The maximum buffer
601                 size is silently forced to be no smaller  than  the  starting
602                 buffer size.
603
604       -N newline-type, --newline=newline-type
605                 Six different conventions for indicating the ends of lines in
606                 scanned files are supported. For example:
607
608                   pcre2grep -N CRLF 'some pattern' <file>
609
610                 The newline type may be specified in upper, lower,  or  mixed
611                 case.  If the newline type is NUL, lines are separated by bi-
612                 nary zero characters. The other types are the  single-charac-
613                 ter  sequences  CR  (carriage  return) and LF (linefeed), the
614                 two-character sequence CRLF, an "anycrlf" type, which  recog-
615                 nizes  any  of  the preceding three types, and an "any" type,
616                 for which any Unicode line ending sequence is assumed to  end
617                 a  line.  The Unicode sequences are the three just mentioned,
618                 plus VT (vertical tab, U+000B), FF (form feed,  U+000C),  NEL
619                 (next  line,  U+0085),  LS  (line  separator, U+2028), and PS
620                 (paragraph separator, U+2029).
621
622                 When the PCRE2 library is built, a  default  line-ending  se-
623                 quence  is specified.  This is normally the standard sequence
624                 for the operating system. Unless otherwise specified by  this
625                 option, pcre2grep uses the library's default.
626
627                 This  option makes it possible to use pcre2grep to scan files
628                 that have come from other environments without having to mod-
629                 ify  their  line  endings.  If the data that is being scanned
630                 does not agree  with  the  convention  set  by  this  option,
631                 pcre2grep  may  behave in strange ways. Note that this option
632                 does not apply to files specified by the -f,  --exclude-from,
633                 or  --include-from options, which are expected to use the op-
634                 erating system's standard newline sequence.
635
636       -n, --line-number
637                 Precede each output line by its line number in the file, fol-
638                 lowed  by  a colon for matching lines or a hyphen for context
639                 lines. If the file name is also being output, it precedes the
640                 line  number.  When  the  -M option causes a pattern to match
641                 more than one line, only the first is preceded  by  its  line
642                 number. This option is forced if --line-offsets is used.
643
644       --no-jit  If  the  PCRE2 library is built with support for just-in-time
645                 compiling (which speeds up matching), pcre2grep automatically
646                 makes use of this, unless it was explicitly disabled at build
647                 time. This option can be used to disable the use  of  JIT  at
648                 run  time. It is provided for testing and working round prob-
649                 lems.  It should never be needed in normal use.
650
651       -O text, --output=text
652                 When there is a match, instead of outputting  the  line  that
653                 matched,  output just the text specified in this option, fol-
654                 lowed by an operating-system standard newline. In this  mode,
655                 --colour  has  no  effect, and no context is shown.  That is,
656                 the -A, -B, and -C options are ignored. The --newline  option
657                 has  no  effect  on  this option, which is mutually exclusive
658                 with  --only-matching,  --file-offsets,  and  --line-offsets.
659                 However,  like  --only-matching,  if  there  is more than one
660                 match in a line, each of them causes a line of output.
661
662                 Escape sequences starting with a dollar character may be used
663                 to insert the contents of the matched part of the line and/or
664                 captured substrings into the text.
665
666                 $<digits> or ${<digits>} is replaced  by  the  captured  sub-
667                 string  of  the  given  decimal  number; zero substitutes the
668                 whole match. If the number is greater than the number of cap-
669                 turing  substrings,  or if the capture is unset, the replace-
670                 ment is empty.
671
672                 $a is replaced by bell; $b by backspace; $e by escape; $f  by
673                 form  feed;  $n by newline; $r by carriage return; $t by tab;
674                 $v by vertical tab.
675
676                 $o<digits> or $o{<digits>} is replaced by the character whose
677                 code  point  is the given octal number. In the first form, up
678                 to three octal digits are processed.  When  more  digits  are
679                 needed  in Unicode mode to specify a wide character, the sec-
680                 ond form must be used.
681
682                 $x<digits> or $x{<digits>} is replaced by the character  rep-
683                 resented  by the given hexadecimal number. In the first form,
684                 up to two hexadecimal digits are processed. When more  digits
685                 are  needed  in Unicode mode to specify a wide character, the
686                 second form must be used.
687
688                 Any other character is substituted by itself. In  particular,
689                 $$ is replaced by a single dollar.
690
691       -o, --only-matching
692                 Show only the part of the line that matched a pattern instead
693                 of the whole line. In this mode, no context  is  shown.  That
694                 is,  the -A, -B, and -C options are ignored. If there is more
695                 than one match in a line, each of them is  shown  separately,
696                 on  a separate line of output. If -o is combined with -v (in-
697                 vert the sense of the match to find non-matching  lines),  no
698                 output  is  generated,  but  the return code is set appropri-
699                 ately. If the matched portion of the line is  empty,  nothing
700                 is  output  unless  the  file  name  or line number are being
701                 printed, in which case they are shown on an  otherwise  empty
702                 line.  This  option  is  mutually  exclusive  with  --output,
703                 --file-offsets and --line-offsets.
704
705       -onumber, --only-matching=number
706                 Show only the part of the line  that  matched  the  capturing
707                 parentheses of the given number. Up to 50 capturing parenthe-
708                 ses are supported by default. This limit can be  changed  via
709                 the  --om-capture option. A pattern may contain any number of
710                 capturing parentheses, but only those whose number is  within
711                 the  limit can be accessed by -o. An error occurs if the num-
712                 ber specified by -o is greater than the limit.
713
714                 -o0 is the same as -o without a number. Because these options
715                 can  be given without an argument (see above), if an argument
716                 is present, it must be given in the same shell item, for  ex-
717                 ample,  -o3  or --only-matching=2. The comments given for the
718                 non-argument case above also apply to  this  option.  If  the
719                 specified  capturing parentheses do not exist in the pattern,
720                 or were not set in the match, nothing is  output  unless  the
721                 file name or line number are being output.
722
723                 If  this  option is given multiple times, multiple substrings
724                 are output for each match,  in  the  order  the  options  are
725                 given,  and  all on one line. For example, -o3 -o1 -o3 causes
726                 the substrings matched by capturing parentheses 3 and  1  and
727                 then  3 again to be output. By default, there is no separator
728                 (but see the next but one option).
729
730       --om-capture=number
731                 Set the number of capturing parentheses that can be  accessed
732                 by -o. The default is 50.
733
734       --om-separator=text
735                 Specify  a  separating string for multiple occurrences of -o.
736                 The default is an empty string. Separating strings are  never
737                 coloured.
738
739       -q, --quiet
740                 Work quietly, that is, display nothing except error messages.
741                 The exit status indicates whether or  not  any  matches  were
742                 found.
743
744       -r, --recursive
745                 If  any given path is a directory, recursively scan the files
746                 it contains, taking note of any --include and --exclude  set-
747                 tings.  By  default, a directory is read as a normal file; in
748                 some operating systems this gives an  immediate  end-of-file.
749                 This  option is a shorthand for setting the -d option to "re-
750                 curse".
751
752       --recursion-limit=number
753                 This is an obsolete synonym for --depth-limit.  See  --match-
754                 limit above for details.
755
756       -s, --no-messages
757                 Suppress  error  messages  about  non-existent  or unreadable
758                 files. Such files are quietly skipped.  However,  the  return
759                 code is still 2, even if matches were found in other files.
760
761       -t, --total-count
762                 This  option  is  useful when scanning more than one file. If
763                 used on its own, -t suppresses all output except for a  grand
764                 total  number  of matching lines (or non-matching lines if -v
765                 is used) in all the files. If -t is used with -c, a grand to-
766                 tal  is  output  except  when the previous output is just one
767                 line. In other words, it is not output when just  one  file's
768                 count  is  listed.  If file names are being output, the grand
769                 total is preceded by "TOTAL:". Otherwise, it appears as  just
770                 another  number.  The  -t option is ignored when used with -L
771                 (list files without matches), because the grand  total  would
772                 always be zero.
773
774       -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
775                 has been compiled with UTF-8 support. All patterns (including
776                 those  for any --exclude and --include options) and all lines
777                 that are scanned must be valid strings of  UTF-8  characters.
778                 If an invalid UTF-8 string is encountered, an error occurs.
779
780       -U, --utf-allow-invalid
781                 As  --utf,  but in addition subject lines may contain invalid
782                 UTF-8 code unit sequences. These can never form part  of  any
783                 pattern  match.  Patterns  themselves, however, must still be
784                 valid UTF-8 strings. This facility allows valid UTF-8 strings
785                 to be sought within arbitrary byte sequences in executable or
786                 other binary files. For more details about matching  in  non-
787                 valid UTF-8 strings, see the pcre2unicode(3) documentation.
788
789       -V, --version
790                 Write  the version numbers of pcre2grep and the PCRE2 library
791                 to the standard output and then exit. Anything  else  on  the
792                 command line is ignored.
793
794       -v, --invert-match
795                 Invert  the  sense  of  the match, so that lines which do not
796                 match any of the patterns are the ones that are  found.  When
797                 this  option  is  set,  options  such  as --only-matching and
798                 --output, which specify parts of a match that are to be  out-
799                 put, are ignored.
800
801       -w, --word-regex, --word-regexp
802                 Force the patterns only to match "words". That is, there must
803                 be a word boundary at the  start  and  end  of  each  matched
804                 string.  This is equivalent to having "\b(?:" at the start of
805                 each pattern, and ")\b" at the end. This option applies  only
806                 to  the  patterns  that  are  matched against the contents of
807                 files; it does not apply to patterns specified by any of  the
808                 --include or --exclude options.
809
810       -x, --line-regex, --line-regexp
811                 Force  the  patterns to start matching only at the beginnings
812                 of lines, and in  addition,  require  them  to  match  entire
813                 lines. In multiline mode the match may be more than one line.
814                 This is equivalent to having "^(?:" at the start of each pat-
815                 tern  and  ")$"  at  the end. This option applies only to the
816                 patterns that are matched against the contents of  files;  it
817                 does  not apply to patterns specified by any of the --include
818                 or --exclude options.
819
820       -Z, --null
821                 Terminate files names in the regular output with a zero  byte
822                 (the  NUL  character)  instead of what would normally appear.
823                 This is useful when file  names  contain  unusual  characters
824                 such  as  colons,  hyphens, or even newlines. The option does
825                 not apply to file names in error messages.
826
827
828ENVIRONMENT VARIABLES
829
830       The environment variables LC_ALL and LC_CTYPE are examined, in that or-
831       der, for a locale. The first one that is set is used. This can be over-
832       ridden by the --locale option. If no locale is set, the PCRE2 library's
833       default (usually the "C" locale) is used.
834
835
836NEWLINES
837
838       The  -N  (--newline) option allows pcre2grep to scan files with newline
839       conventions that differ from the default. This option affects only  the
840       way  scanned files are processed. It does not affect the interpretation
841       of files specified by the -f,  --file-list,  --exclude-from,  or  --in-
842       clude-from options.
843
844       Any  parts  of the scanned input files that are written to the standard
845       output are copied with whatever newline sequences they have in the  in-
846       put.  However,  if  the final line of a file is output, and it does not
847       end with a newline sequence, a newline sequence is added. If  the  new-
848       line  setting  is  CR, LF, CRLF or NUL, that line ending is output; for
849       the other settings (ANYCRLF or ANY) a single NL is used.
850
851       The newline setting does not affect the way in which  pcre2grep  writes
852       newlines  in  informational  messages  to the standard output and error
853       streams.  Under Windows, the standard output is set to  be  binary,  so
854       that  "\r\n" at the ends of output lines that are copied from the input
855       is not converted to "\r\r\n" by the C I/O library. This means that  any
856       messages  written  to the standard output must end with "\r\n". For all
857       other operating systems, and for all messages  to  the  standard  error
858       stream, "\n" is used.
859
860
861OPTIONS COMPATIBILITY
862
863       Many of the short and long forms of pcre2grep's options are the same as
864       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
865       terminology) is also available as --xxx-regex (PCRE2 terminology). How-
866       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
867       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi-
868       line, -N, --newline,  --om-separator,  --output,  -u,  --utf,  -U,  and
869       --utf-allow-invalid options are specific to pcre2grep, as is the use of
870       the --only-matching option with a capturing parentheses number.
871
872       Although most of the common options work the same way, a few  are  dif-
873       ferent  in pcre2grep. For example, the --include option's argument is a
874       glob for GNU grep, but a regular expression for pcre2grep. If both  the
875       -c  and  -l  options are given, GNU grep lists only file names, without
876       counts, but pcre2grep gives the counts as well.
877
878
879OPTIONS WITH DATA
880
881       There are four different ways in which an option with data can be spec-
882       ified.   If  a  short  form option is used, the data may follow immedi-
883       ately, or (with one exception) in the next command line item. For exam-
884       ple:
885
886         -f/some/file
887         -f /some/file
888
889       The  exception is the -o option, which may appear with or without data.
890       Because of this, if data is present, it must follow immediately in  the
891       same item, for example -o3.
892
893       If  a long form option is used, the data may appear in the same command
894       line item, separated by an equals character, or (with  two  exceptions)
895       it may appear in the next command line item. For example:
896
897         --file=/some/file
898         --file /some/file
899
900       Note,  however, that if you want to supply a file name beginning with ~
901       as data in a shell command, and have the shell expand ~ to a  home  di-
902       rectory,  you  must separate the file name from the option, because the
903       shell does not treat ~ specially unless it is at the start of an item.
904
905       The exceptions to the above are the --colour (or --color)  and  --only-
906       matching  options,  for which the data is optional. If one of these op-
907       tions does have data, it must be given in  the  first  form,  using  an
908       equals character. Otherwise pcre2grep will assume that it has no data.
909
910
911USING PCRE2'S CALLOUT FACILITY
912
913       pcre2grep  has,  by  default,  support for calling external programs or
914       scripts or echoing specific strings during matching by  making  use  of
915       PCRE2's  callout  facility.  However, this support can be completely or
916       partially disabled when pcre2grep is built. You can  find  out  whether
917       your  binary has support for callouts by running it with the --help op-
918       tion. If callout support is completely disabled, all callouts  in  pat-
919       terns are ignored by pcre2grep.  If the facility is partially disabled,
920       calling external programs is not supported, and callouts  that  request
921       it are ignored.
922
923       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu-
924       ment is either a number or a quoted string (see the pcre2callout  docu-
925       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
926       only callouts with string arguments are useful.
927
928   Echoing a specific string
929
930       Starting the callout string with a pipe character  invokes  an  echoing
931       facility that avoids calling an external program or script. This facil-
932       ity is always available, provided that  callouts  were  not  completely
933       disabled  when  pcre2grep  was built. The rest of the callout string is
934       processed as a zero-terminated string, which means it should  not  con-
935       tain  any  internal  binary  zeros. It is written to the output, having
936       first been passed through the same escape processing as text  from  the
937       --output  (-O) option (see above). However, $0 cannot be used to insert
938       a matched substring because the match is still  in  progress.  Instead,
939       the  single  character '0' is inserted. Any syntax errors in the string
940       (for example, a dollar not followed by another  character)  causes  the
941       callout  to be ignored. No terminator is added to the output string, so
942       if you want a newline, you must include it explicitly using the  escape
943       $n. For example:
944
945         pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
946
947       Matching  continues normally after the string is output. If you want to
948       see only the callout output but not any output from  an  actual  match,
949       you should end the pattern with (*FAIL).
950
951   Calling external programs or scripts
952
953       This facility can be independently disabled when pcre2grep is built. It
954       is supported for Windows, where a call to _spawnvp() is used, for  VMS,
955       where  lib$spawn()  is  used,  and  for any Unix-like environment where
956       fork() and execv() are available.
957
958       If the callout string does not start with a pipe (vertical bar) charac-
959       ter,  it  is parsed into a list of substrings separated by pipe charac-
960       ters. The first substring must be an executable name, with the  follow-
961       ing substrings specifying arguments:
962
963         executable_name|arg1|arg2|...
964
965       Any  substring  (including  the executable name) may contain escape se-
966       quences started by a dollar character. These are the same  as  for  the
967       --output (-O) option documented above, except that $0 cannot insert the
968       matched string because the match is still  in  progress.  Instead,  the
969       character '0' is inserted. If you need a literal dollar or pipe charac-
970       ter in any substring, use $$ or $| respectively. Here is an example:
971
972         echo -e "abcde\n12345" | pcre2grep \
973           '(?x)(.)(..(.))
974           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
975
976         Output:
977
978           Arg1: [a] [bcd] [d] Arg2: |a| ()
979           abcde
980           Arg1: [1] [234] [4] Arg2: |1| ()
981           12345
982
983       The parameters for the system call that is used to run the  program  or
984       script are zero-terminated strings. This means that binary zero charac-
985       ters in the callout argument will cause premature termination of  their
986       substrings,  and  therefore should not be present. Any syntax errors in
987       the string (for example, a dollar not followed  by  another  character)
988       causes the callout to be ignored.  If running the program fails for any
989       reason (including the non-existence of the executable), a local  match-
990       ing failure occurs and the matcher backtracks in the normal way.
991
992
993MATCHING ERRORS
994
995       It  is  possible  to supply a regular expression that takes a very long
996       time to fail to match certain lines.  Such  patterns  normally  involve
997       nested  indefinite repeats, for example: (a+)*\d when matched against a
998       line of a's with no final digit. The PCRE2 matching function has a  re-
999       source  limit  that  causes it to abort in these circumstances. If this
1000       happens, pcre2grep outputs an error message and the  line  that  caused
1001       the  problem  to  the  standard error stream. If there are more than 20
1002       such errors, pcre2grep gives up.
1003
1004       The --match-limit option of pcre2grep can be used to  set  the  overall
1005       resource  limit.  There are also other limits that affect the amount of
1006       memory used during matching; see the  discussion  of  --heap-limit  and
1007       --depth-limit above.
1008
1009
1010DIAGNOSTICS
1011
1012       Exit status is 0 if any matches were found, 1 if no matches were found,
1013       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
1014       files  (even if matches were found in other files) or too many matching
1015       errors. Using the -s option to suppress error messages about inaccessi-
1016       ble files does not affect the return code.
1017
1018       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
1019       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
1020       exit(1).
1021
1022
1023SEE ALSO
1024
1025       pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3).
1026
1027
1028AUTHOR
1029
1030       Philip Hazel
1031       Retired from University Computing Service
1032       Cambridge, England.
1033
1034
1035REVISION
1036
1037       Last updated: 21 November 2022
1038       Copyright (c) 1997-2022 University of Cambridge.
1039