• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4
5NAME
6       pcre2grep - a grep with Perl-compatible regular expressions.
7
8SYNOPSIS
9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10
11
12DESCRIPTION
13
14       pcre2grep  searches  files  for  character patterns, in the same way as
15       other grep commands do, but it uses the PCRE2  regular  expression  li-
16       brary  to support patterns that are compatible with the regular expres-
17       sions of Perl 5. See pcre2syntax(3) for a  quick-reference  summary  of
18       pattern syntax, or pcre2pattern(3) for a full description of the syntax
19       and semantics of the regular expressions that PCRE2 supports.
20
21       Patterns, whether supplied on the command line or in a  separate  file,
22       are given without delimiters. For example:
23
24         pcre2grep Thursday /etc/motd
25
26       If you attempt to use delimiters (for example, by surrounding a pattern
27       with slashes, as is common in Perl scripts), they  are  interpreted  as
28       part  of  the pattern. Quotes can of course be used to delimit patterns
29       on the command line because they are interpreted by the shell, and  in-
30       deed  quotes  are  required  if a pattern contains white space or shell
31       metacharacters.
32
33       The first argument that follows any option settings is treated  as  the
34       single  pattern  to be matched when neither -e nor -f is present.  Con-
35       versely, when one or both of these options are  used  to  specify  pat-
36       terns, all arguments are treated as path names. At least one of -e, -f,
37       or an argument pattern must be provided.
38
39       If no files are specified, pcre2grep  reads  the  standard  input.  The
40       standard  input can also be referenced by a name consisting of a single
41       hyphen.  For example:
42
43         pcre2grep some-pattern file1 - file3
44
45       Input files are searched line by  line.  By  default,  each  line  that
46       matches  a  pattern  is  copied to the standard output, and if there is
47       more than one file, the file name is output at the start of each  line,
48       followed  by  a  colon.  However, there are options that can change how
49       pcre2grep behaves. In particular, the -M option makes  it  possible  to
50       search  for  strings  that  span  line  boundaries. What defines a line
51       boundary is controlled by the -N (--newline) option.
52
53       The amount of memory used for buffering files that are being scanned is
54       controlled  by  parameters  that  can  be  set by the --buffer-size and
55       --max-buffer-size options. The first of these sets the size  of  buffer
56       that  is obtained at the start of processing. If an input file contains
57       very long lines, a larger buffer may be needed; this is handled by  au-
58       tomatically  extending  the buffer, up to the limit specified by --max-
59       buffer-size. The default values for these parameters can  be  set  when
60       pcre2grep  is  built;  if nothing is specified, the defaults are set to
61       20KiB and 1MiB respectively. An error occurs if a line is too long  and
62       the buffer can no longer be expanded.
63
64       The  block  of  memory that is actually used is three times the "buffer
65       size", to allow for buffering "before" and "after" lines. If the buffer
66       size  is too small, fewer than requested "before" and "after" lines may
67       be output.
68
69       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
70       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
71       pattern (specified by the use of -e and/or -f), each pattern is applied
72       to  each  line  in the order in which they are defined, except that all
73       the -e patterns are tried before the -f patterns.
74
75       By default, as soon as one pattern matches a line, no further  patterns
76       are considered. However, if --colour (or --color) is used to colour the
77       matching substrings, or if --only-matching, --file-offsets, or  --line-
78       offsets  is  used to output only the part of the line that matched (ei-
79       ther shown literally, or as an offset),  scanning  resumes  immediately
80       following  the  match,  so that further matches on the same line can be
81       found. If there are multiple patterns, they are all tried  on  the  re-
82       mainder  of the line, but patterns that follow the one that matched are
83       not tried on the earlier matched part of the line.
84
85       This behaviour means that the order  in  which  multiple  patterns  are
86       specified  can affect the output when one of the above options is used.
87       This is no longer the same behaviour as GNU grep, which now manages  to
88       display  earlier  matches  for  later  patterns (as long as there is no
89       overlap).
90
91       Patterns that can match an empty string are accepted, but empty  string
92       matches   are  never  recognized.  An  example  is  the  pattern  "(su-
93       per)?(man)?", in which all components are optional. This pattern  finds
94       all  occurrences  of  both  "super"  and "man"; the output differs from
95       matching with "super|man" when only the matching substrings  are  being
96       shown.
97
98       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
99       the value to set a locale when calling the PCRE2 library.  The --locale
100       option can be used to override this.
101
102
103SUPPORT FOR COMPRESSED FILES
104
105       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
106       read compressed files whose names end in .gz or .bz2, respectively. You
107       can  find out whether your pcre2grep binary has support for one or both
108       of these file types by running it with the --help option. If the appro-
109       priate support is not present, all files are treated as plain text. The
110       standard input is always so treated. When input is  from  a  compressed
111       .gz or .bz2 file, the --line-buffered option is ignored.
112
113
114BINARY FILES
115
116       By  default,  a  file that contains a binary zero byte within the first
117       1024 bytes is identified as a binary file, and is processed  specially.
118       However,  if  the  newline  type is specified as NUL, that is, the line
119       terminator is a binary zero, the test for a binary file is not applied.
120       See  the  --binary-files  option for a means of changing the way binary
121       files are handled.
122
123
124BINARY ZEROS IN PATTERNS
125
126       Patterns passed from the command line are strings that  are  terminated
127       by  a  binary zero, so cannot contain internal zeros. However, patterns
128       that are read from a file via the -f option may contain binary zeros.
129
130
131OPTIONS
132
133       The order in which some of the options appear can  affect  the  output.
134       For  example,  both  the  -H and -l options affect the printing of file
135       names. Whichever comes later in the command line will be the  one  that
136       takes  effect.  Similarly,  except  where  noted below, if an option is
137       given twice, the later setting is used. Numerical  values  for  options
138       may  be  followed  by  K  or  M,  to  signify multiplication by 1024 or
139       1024*1024 respectively.
140
141       --        This terminates the list of options. It is useful if the next
142                 item  on  the command line starts with a hyphen but is not an
143                 option. This allows for the processing of patterns  and  file
144                 names that start with hyphens.
145
146       -A number, --after-context=number
147                 Output  up  to  number  lines  of context after each matching
148                 line. Fewer lines are output if the next match or the end  of
149                 the  file  is  reached,  or if the processing buffer size has
150                 been set too small. If file names and/or line numbers are be-
151                 ing output, a hyphen separator is used instead of a colon for
152                 the context lines. A line containing "--" is  output  between
153                 each  group  of  lines, unless they are in fact contiguous in
154                 the input file. The value of number is expected to  be  rela-
155                 tively small. When -c is used, -A is ignored.
156
157       -a, --text
158                 Treat  binary  files as text. This is equivalent to --binary-
159                 files=text.
160
161       -B number, --before-context=number
162                 Output up to number lines of  context  before  each  matching
163                 line.  Fewer  lines  are  output if the previous match or the
164                 start of the file is within number lines, or if the  process-
165                 ing  buffer size has been set too small. If file names and/or
166                 line numbers are being output, a hyphen separator is used in-
167                 stead  of  a  colon  for the context lines. A line containing
168                 "--" is output between each group of lines, unless  they  are
169                 in  fact contiguous in the input file. The value of number is
170                 expected to be relatively small. When -c is used, -B  is  ig-
171                 nored.
172
173       --binary-files=word
174                 Specify  how binary files are to be processed. If the word is
175                 "binary" (the default), pattern matching is performed on  bi-
176                 nary  files,  but  the  only  output  is  "Binary file <name>
177                 matches" when a match succeeds. If the word is "text",  which
178                 is  equivalent  to  the -a or --text option, binary files are
179                 processed in the same way as any other file.  In  this  case,
180                 when  a  match  succeeds,  the  output may be binary garbage,
181                 which can have nasty effects if sent to a  terminal.  If  the
182                 word  is  "without-match",  which is equivalent to the -I op-
183                 tion, binary files are not processed at all; they are assumed
184                 not  to  be  of  interest and are skipped without causing any
185                 output or affecting the return code.
186
187       --buffer-size=number
188                 Set the parameter that controls how much memory  is  obtained
189                 at the start of processing for buffering files that are being
190                 scanned. See also --max-buffer-size below.
191
192       -C number, --context=number
193                 Output number lines of context both  before  and  after  each
194                 matching  line.  This is equivalent to setting both -A and -B
195                 to the same value.
196
197       -c, --count
198                 Do not output lines from the files that  are  being  scanned;
199                 instead  output  the  number  of  lines  that would have been
200                 shown, either because they matched, or, if -v is set, because
201                 they  failed  to match. By default, this count is exactly the
202                 same as the number of lines that would have been output,  but
203                 if  the -M (multiline) option is used (without -v), there may
204                 be more suppressed lines than the count (that is, the  number
205                 of matches).
206
207                 If  no lines are selected, the number zero is output. If sev-
208                 eral files are are being scanned, a count is output for  each
209                 of  them and the -t option can be used to cause a total to be
210                 output at the end. However, if the  --files-with-matches  op-
211                 tion  is also used, only those files whose counts are greater
212                 than zero are listed. When -c is used, the -A, -B, and -C op-
213                 tions are ignored.
214
215       --colour, --color
216                 If this option is given without any data, it is equivalent to
217                 "--colour=auto".  If data is required, it must  be  given  in
218                 the same shell item, separated by an equals sign.
219
220       --colour=value, --color=value
221                 This option specifies under what circumstances the parts of a
222                 line that matched a pattern should be coloured in the output.
223                 By  default,  the output is not coloured. The value (which is
224                 optional, see above) may be "never", "always", or "auto".  In
225                 the  latter case, colouring happens only if the standard out-
226                 put is connected to a terminal. More resources are used  when
227                 colouring is enabled, because pcre2grep has to search for all
228                 possible matches in a line, not just one, in order to  colour
229                 them all.
230
231                 The  colour  that  is used can be specified by setting one of
232                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
233                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
234                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
235                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
236                 variable should be a string of two numbers,  separated  by  a
237                 semicolon,  except  in  the  case  of GREP_COLORS, which must
238                 start with "ms=" or "mt=" followed by two semicolon-separated
239                 colours,  terminated  by the end of the string or by a colon.
240                 If GREP_COLORS does not start with "ms=" or "mt=" it  is  ig-
241                 nored, and GREP_COLOR is checked.
242
243                 If  the  string obtained from one of the above variables con-
244                 tains any characters other than semicolon or digits, the set-
245                 ting is ignored and the default colour is used. The string is
246                 copied directly into the control string for setting colour on
247                 a  terminal,  so it is your responsibility to ensure that the
248                 values make sense. If no  relevant  environment  variable  is
249                 set, the default is "1;31", which gives red.
250
251       -D action, --devices=action
252                 If  an  input path is not a regular file or a directory, "ac-
253                 tion" specifies how it is to be processed. Valid  values  are
254                 "read" (the default) or "skip" (silently skip the path).
255
256       -d action, --directories=action
257                 If an input path is a directory, "action" specifies how it is
258                 to be processed.  Valid values are  "read"  (the  default  in
259                 non-Windows  environments,  for compatibility with GNU grep),
260                 "recurse" (equivalent to the -r option), or "skip"  (silently
261                 skip  the  path, the default in Windows environments). In the
262                 "read" case, directories are read as if  they  were  ordinary
263                 files.  In some operating systems the effect of reading a di-
264                 rectory like this is an immediate end-of-file; in  others  it
265                 may provoke an error.
266
267       --depth-limit=number
268                 See --match-limit below.
269
270       -e pattern, --regex=pattern, --regexp=pattern
271                 Specify a pattern to be matched. This option can be used mul-
272                 tiple times in order to specify several patterns. It can also
273                 be  used  as a way of specifying a single pattern that starts
274                 with a hyphen. When -e is used, no argument pattern is  taken
275                 from  the  command  line;  all  arguments are treated as file
276                 names. There is no limit to the number of patterns. They  are
277                 applied  to  each line in the order in which they are defined
278                 until one matches.
279
280                 If -f is used with -e, the command line patterns are  matched
281                 first, followed by the patterns from the file(s), independent
282                 of the order in which these options are specified. Note  that
283                 multiple  use  of -e is not the same as a single pattern with
284                 alternatives. For example, X|Y finds the first character in a
285                 line  that  is  X or Y, whereas if the two patterns are given
286                 separately, with X first, pcre2grep finds X if it is present,
287                 even if it follows Y in the line. It finds Y only if there is
288                 no X in the line. This matters only if you are  using  -o  or
289                 --colo(u)r to show the part(s) of the line that matched.
290
291       --exclude=pattern
292                 Files (but not directories) whose names match the pattern are
293                 skipped without being processed. This applies to  all  files,
294                 whether  listed  on  the  command line, obtained from --file-
295                 list, or by scanning a directory. The pattern is a PCRE2 reg-
296                 ular  expression,  and is matched against the final component
297                 of the file name, not the entire path. The -F, -w, and -x op-
298                 tions  do  not apply to this pattern. The option may be given
299                 any number of times in order to specify multiple patterns. If
300                 a  file  name matches both an --include and an --exclude pat-
301                 tern, it is excluded. There is no short form for this option.
302
303       --exclude-from=filename
304                 Treat each non-empty line of the file  as  the  data  for  an
305                 --exclude option. What constitutes a newline when reading the
306                 file is the operating system's default. The --newline  option
307                 has  no  effect on this option. This option may be given more
308                 than once in order to specify a number of files to read.
309
310       --exclude-dir=pattern
311                 Directories whose names match the pattern are skipped without
312                 being  processed, whatever the setting of the --recursive op-
313                 tion. This applies to all directories, whether listed on  the
314                 command  line,  obtained  from  --file-list, or by scanning a
315                 parent directory. The pattern is a PCRE2 regular  expression,
316                 and  is  matched against the final component of the directory
317                 name, not the entire path. The -F, -w, and -x options do  not
318                 apply  to this pattern. The option may be given any number of
319                 times in order to specify more than one pattern. If a  direc-
320                 tory  matches both --include-dir and --exclude-dir, it is ex-
321                 cluded. There is no short form for this option.
322
323       -F, --fixed-strings
324                 Interpret each data-matching  pattern  as  a  list  of  fixed
325                 strings,  separated  by newlines, instead of as a regular ex-
326                 pression. What constitutes a newline for this purpose is con-
327                 trolled by the --newline option. The -w (match as a word) and
328                 -x (match whole line) options can be used with -F.  They  ap-
329                 ply  to  each of the fixed strings. A line is selected if any
330                 of the fixed strings are found in it (subject to -w or -x, if
331                 present).  This  option applies only to the patterns that are
332                 matched against the contents of files; it does not  apply  to
333                 patterns  specified  by any of the --include or --exclude op-
334                 tions.
335
336       -f filename, --file=filename
337                 Read patterns from the file, one per  line,  and  match  them
338                 against  each  line of input. As is the case with patterns on
339                 the command line, no delimiters should be used. What  consti-
340                 tutes  a  newline when reading the file is the operating sys-
341                 tem's default interpretation of \n. The --newline option  has
342                 no  effect  on  this  option. Trailing white space is removed
343                 from each line, and blank lines are ignored.  An  empty  file
344                 contains  no patterns and therefore matches nothing. Patterns
345                 read from a file in this way may contain binary zeros,  which
346                 are  treated  as  ordinary data characters. See also the com-
347                 ments about multiple patterns versus a  single  pattern  with
348                 alternatives in the description of -e above.
349
350                 If  this  option  is  given more than once, all the specified
351                 files are read. A data line is output if any of the  patterns
352                 match  it.  A  file  name can be given as "-" to refer to the
353                 standard input. When -f is used, patterns  specified  on  the
354                 command  line  using  -e may also be present; they are tested
355                 before the file's patterns.  However,  no  other  pattern  is
356                 taken from the command line; all arguments are treated as the
357                 names of paths to be searched.
358
359       --file-list=filename
360                 Read a list of  files  and/or  directories  that  are  to  be
361                 scanned from the given file, one per line. What constitutes a
362                 newline when reading the file is the operating  system's  de-
363                 fault.  Trailing  white  space is removed from each line, and
364                 blank lines are ignored. These paths are processed before any
365                 that  are  listed  on  the command line. The file name can be
366                 given as "-" to refer to the standard input.  If  --file  and
367                 --file-list  are  both  specified  as  "-", patterns are read
368                 first. This is useful only when the standard input is a  ter-
369                 minal,  from  which  further lines (the list of files) can be
370                 read after an end-of-file indication. If this option is given
371                 more than once, all the specified files are read.
372
373       --file-offsets
374                 Instead  of  showing lines or parts of lines that match, show
375                 each match as an offset from the start  of  the  file  and  a
376                 length,  separated  by  a  comma. In this mode, no context is
377                 shown. That is, the -A, -B, and -C options  are  ignored.  If
378                 there is more than one match in a line, each of them is shown
379                 separately. This option is mutually exclusive with  --output,
380                 --line-offsets, and --only-matching.
381
382       -H, --with-filename
383                 Force  the  inclusion of the file name at the start of output
384                 lines when searching a single file. By default, the file name
385                 is not shown in this case.  For matching lines, the file name
386                 is followed by a colon; for context lines, a hyphen separator
387                 is  used.  If  a line number is also being output, it follows
388                 the file name. When the -M option causes a pattern  to  match
389                 more  than  one  line, only the first is preceded by the file
390                 name. This option overrides any previous -h, -l,  or  -L  op-
391                 tions.
392
393       -h, --no-filename
394                 Suppress the output file names when searching multiple files.
395                 By default, file names are  shown  when  multiple  files  are
396                 searched.  For matching lines, the file name is followed by a
397                 colon; for context lines, a hyphen separator is used.   If  a
398                 line  number  is also being output, it follows the file name.
399                 This option overrides any previous -H, -L, or -l options.
400
401       --heap-limit=number
402                 See --match-limit below.
403
404       --help    Output a help message, giving brief details  of  the  command
405                 options  and  file type support, and then exit. Anything else
406                 on the command line is ignored.
407
408       -I        Ignore  binary  files.  This  is  equivalent   to   --binary-
409                 files=without-match.
410
411       -i, --ignore-case
412                 Ignore upper/lower case distinctions during comparisons.
413
414       --include=pattern
415                 If  any --include patterns are specified, the only files that
416                 are processed are those whose names match one of the patterns
417                 and  do  not match an --exclude pattern. This option does not
418                 affect directories, but it  applies  to  all  files,  whether
419                 listed  on the command line, obtained from --file-list, or by
420                 scanning a directory. The pattern is a PCRE2 regular  expres-
421                 sion,  and is matched against the final component of the file
422                 name, not the entire path. The -F, -w, and -x options do  not
423                 apply  to this pattern. The option may be given any number of
424                 times. If a file name matches both an --include and an  --ex-
425                 clude  pattern,  it  is excluded.  There is no short form for
426                 this option.
427
428       --include-from=filename
429                 Treat each non-empty line of the file  as  the  data  for  an
430                 --include option. What constitutes a newline for this purpose
431                 is the operating system's default. The --newline  option  has
432                 no effect on this option. This option may be given any number
433                 of times; all the files are read.
434
435       --include-dir=pattern
436                 If any --include-dir patterns are specified, the only  direc-
437                 tories  that are processed are those whose names match one of
438                 the patterns and do not match an --exclude-dir pattern.  This
439                 applies  to  all  directories,  whether listed on the command
440                 line, obtained from --file-list, or by scanning a parent  di-
441                 rectory.  The  pattern  is a PCRE2 regular expression, and is
442                 matched against the final component of  the  directory  name,
443                 not  the entire path. The -F, -w, and -x options do not apply
444                 to this pattern. The option may be given any number of times.
445                 If  a directory matches both --include-dir and --exclude-dir,
446                 it is excluded. There is no short form for this option.
447
448       -L, --files-without-match
449                 Instead of outputting lines from the files, just  output  the
450                 names  of  the files that do not contain any lines that would
451                 have been output. Each file name is output once, on  a  sepa-
452                 rate  line.  This option overrides any previous -H, -h, or -l
453                 options.
454
455       -l, --files-with-matches
456                 Instead of outputting lines from the files, just  output  the
457                 names of the files containing lines that would have been out-
458                 put. Each file name is  output  once,  on  a  separate  line.
459                 Searching  normally stops as soon as a matching line is found
460                 in a file. However, if the -c (count) option  is  also  used,
461                 matching  continues in order to obtain the correct count, and
462                 those files that have at least one  match  are  listed  along
463                 with their counts. Using this option with -c is a way of sup-
464                 pressing the listing of files with  no  matches  that  occurs
465                 with  -c  on  its own. This option overrides any previous -H,
466                 -h, or -L options.
467
468       --label=name
469                 This option supplies a name to be used for the standard input
470                 when file names are being output. If not supplied, "(standard
471                 input)" is used. There is no short form for this option.
472
473       --line-buffered
474                 When this option is given, non-compressed input is  read  and
475                 processed  line by line, and the output is flushed after each
476                 write. By default, input is  read  in  large  chunks,  unless
477                 pcre2grep  can  determine that it is reading from a terminal,
478                 which is currently possible only in Unix-like environments or
479                 Windows. Output to terminal is normally automatically flushed
480                 by the operating system. This option can be useful  when  the
481                 input  or  output  is  attached to a pipe and you do not want
482                 pcre2grep to buffer up large amounts of data.   However,  its
483                 use  will  affect  performance, and the -M (multiline) option
484                 ceases to work. When input is from a compressed .gz  or  .bz2
485                 file, --line-buffered is ignored.
486
487       --line-offsets
488                 Instead  of  showing lines or parts of lines that match, show
489                 each match as a line number, the offset from the start of the
490                 line,  and a length. The line number is terminated by a colon
491                 (as usual; see the -n option), and the offset and length  are
492                 separated  by  a  comma.  In  this mode, no context is shown.
493                 That is, the -A, -B, and -C options are ignored. If there  is
494                 more  than  one  match in a line, each of them is shown sepa-
495                 rately. This option  is  mutually  exclusive  with  --output,
496                 --file-offsets, and --only-matching.
497
498       --locale=locale-name
499                 This  option specifies a locale to be used for pattern match-
500                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
501                 ronment  variables.  If no locale is specified, the PCRE2 li-
502                 brary's default (usually the "C" locale) is used. There is no
503                 short form for this option.
504
505       -M, --multiline
506                 Allow  patterns to match more than one line. When this option
507                 is set, the PCRE2 library is called in "multiline" mode. This
508                 allows  a matched string to extend past the end of a line and
509                 continue on one or more subsequent lines. Patterns used  with
510                 -M may usefully contain literal newline characters and inter-
511                 nal occurrences of ^ and $ characters. The output for a  suc-
512                 cessful  match  may  consist of more than one line. The first
513                 line is the line in which the match  started,  and  the  last
514                 line  is  the  line  in which the match ended. If the matched
515                 string ends with a newline sequence, the output ends  at  the
516                 end  of  that  line.   If  -v  is set, none of the lines in a
517                 multi-line match are output. Once a match has  been  handled,
518                 scanning  restarts at the beginning of the line after the one
519                 in which the match ended.
520
521                 The newline sequence that separates multiple  lines  must  be
522                 matched  as  part  of  the  pattern. For example, to find the
523                 phrase "regular expression" in a file where  "regular"  might
524                 be  at the end of a line and "expression" at the start of the
525                 next line, you could use this command:
526
527                   pcre2grep -M 'regular\s+expression' <file>
528
529                 The \s escape sequence matches any white space character, in-
530                 cluding  newlines, and is followed by + so as to match trail-
531                 ing white space on the first line as well  as  possibly  han-
532                 dling a two-character newline sequence.
533
534                 There  is a limit to the number of lines that can be matched,
535                 imposed by the way that pcre2grep buffers the input  file  as
536                 it  scans  it.  With  a sufficiently large processing buffer,
537                 this should not be a problem, but the -M option does not work
538                 when input is read line by line (see --line-buffered.)
539
540       -m number, --max-count=number
541                 Stop  processing after finding number matching lines, or non-
542                 matching lines if -v is also set. Any trailing context  lines
543                 are  output  after  the  final match. In multiline mode, each
544                 multiline match counts as just one line for this purpose.  If
545                 this  limit is reached when reading the standard input from a
546                 regular file, the file is left positioned just after the last
547                 matching  line.   If -c is also set, the count that is output
548                 is never greater than number. This option has  no  effect  if
549                 used with -L, -l, or -q, or when just checking for a match in
550                 a binary file.
551
552       --match-limit=number
553                 Processing some regular expression patterns may take  a  very
554                 long time to search for all possible matching strings. Others
555                 may require a very large amount of memory.  There  are  three
556                 options that set resource limits for matching.
557
558                 The --match-limit option provides a means of limiting comput-
559                 ing resource usage when processing patterns that are not  go-
560                 ing to match, but which have a very large number of possibil-
561                 ities in their search trees. The classic example is a pattern
562                 that  uses  nested unlimited repeats. Internally, PCRE2 has a
563                 counter that is incremented each time around  its  main  pro-
564                 cessing  loop.  If the value set by --match-limit is reached,
565                 an error occurs.
566
567                 The --heap-limit option specifies, as a number  of  kibibytes
568                 (units  of 1024 bytes), the amount of heap memory that may be
569                 used for matching. Heap memory is needed only if matching the
570                 pattern  requires a significant number of nested backtracking
571                 points to be remembered. This parameter can be set to zero to
572                 forbid the use of heap memory altogether.
573
574                 The  --depth-limit  option  limits  the depth of nested back-
575                 tracking points, which indirectly limits the amount of memory
576                 that is used. The amount of memory needed for each backtrack-
577                 ing point depends on the number of capturing  parentheses  in
578                 the pattern, so the amount of memory that is used before this
579                 limit acts varies from pattern to pattern. This limit  is  of
580                 use only if it is set smaller than --match-limit.
581
582                 There  are no short forms for these options. The default lim-
583                 its can be set when the PCRE2 library is  compiled;  if  they
584                 are  not specified, the defaults are very large and so effec-
585                 tively unlimited.
586
587       --max-buffer-size=number
588                 This limits the expansion of  the  processing  buffer,  whose
589                 initial  size can be set by --buffer-size. The maximum buffer
590                 size is silently forced to be no smaller  than  the  starting
591                 buffer size.
592
593       -N newline-type, --newline=newline-type
594                 Six different conventions for indicating the ends of lines in
595                 scanned files are supported. For example:
596
597                   pcre2grep -N CRLF 'some pattern' <file>
598
599                 The newline type may be specified in upper, lower,  or  mixed
600                 case.  If the newline type is NUL, lines are separated by bi-
601                 nary zero characters. The other types are the  single-charac-
602                 ter  sequences  CR  (carriage  return) and LF (linefeed), the
603                 two-character sequence CRLF, an "anycrlf" type, which  recog-
604                 nizes  any  of  the preceding three types, and an "any" type,
605                 for which any Unicode line ending sequence is assumed to  end
606                 a  line.  The Unicode sequences are the three just mentioned,
607                 plus VT (vertical tab, U+000B), FF (form feed,  U+000C),  NEL
608                 (next  line,  U+0085),  LS  (line  separator, U+2028), and PS
609                 (paragraph separator, U+2029).
610
611                 When the PCRE2 library is built, a  default  line-ending  se-
612                 quence  is specified.  This is normally the standard sequence
613                 for the operating system. Unless otherwise specified by  this
614                 option, pcre2grep uses the library's default.
615
616                 This  option makes it possible to use pcre2grep to scan files
617                 that have come from other environments without having to mod-
618                 ify  their  line  endings.  If the data that is being scanned
619                 does not agree  with  the  convention  set  by  this  option,
620                 pcre2grep  may  behave in strange ways. Note that this option
621                 does not apply to files specified by the -f,  --exclude-from,
622                 or  --include-from options, which are expected to use the op-
623                 erating system's standard newline sequence.
624
625       -n, --line-number
626                 Precede each output line by its line number in the file, fol-
627                 lowed  by  a colon for matching lines or a hyphen for context
628                 lines. If the file name is also being output, it precedes the
629                 line  number.  When  the  -M option causes a pattern to match
630                 more than one line, only the first is preceded  by  its  line
631                 number. This option is forced if --line-offsets is used.
632
633       --no-jit  If  the  PCRE2 library is built with support for just-in-time
634                 compiling (which speeds up matching), pcre2grep automatically
635                 makes use of this, unless it was explicitly disabled at build
636                 time. This option can be used to disable the use  of  JIT  at
637                 run  time. It is provided for testing and working round prob-
638                 lems.  It should never be needed in normal use.
639
640       -O text, --output=text
641                 When there is a match, instead of outputting  the  line  that
642                 matched,  output just the text specified in this option, fol-
643                 lowed by an operating-system standard newline. In this  mode,
644                 no  context is shown. That is, the -A, -B, and -C options are
645                 ignored. The --newline option has no effect on  this  option,
646                 which is mutually exclusive with --only-matching, --file-off-
647                 sets, and --line-offsets. However, like  --only-matching,  if
648                 there is more than one match in a line, each of them causes a
649                 line of output.
650
651                 Escape sequences starting with a dollar character may be used
652                 to insert the contents of the matched part of the line and/or
653                 captured substrings into the text.
654
655                 $<digits> or ${<digits>} is replaced  by  the  captured  sub-
656                 string  of  the  given  decimal  number; zero substitutes the
657                 whole match. If the number is greater than the number of cap-
658                 turing  substrings,  or if the capture is unset, the replace-
659                 ment is empty.
660
661                 $a is replaced by bell; $b by backspace; $e by escape; $f  by
662                 form  feed;  $n by newline; $r by carriage return; $t by tab;
663                 $v by vertical tab.
664
665                 $o<digits> or $o{<digits>} is replaced by the character whose
666                 code  point  is the given octal number. In the first form, up
667                 to three octal digits are processed.  When  more  digits  are
668                 needed  in Unicode mode to specify a wide character, the sec-
669                 ond form must be used.
670
671                 $x<digits> or $x{<digits>} is replaced by the character  rep-
672                 resented  by the given hexadecimal number. In the first form,
673                 up to two hexadecimal digits are processed. When more  digits
674                 are  needed  in Unicode mode to specify a wide character, the
675                 second form must be used.
676
677                 Any other character is substituted by itself. In  particular,
678                 $$ is replaced by a single dollar.
679
680       -o, --only-matching
681                 Show only the part of the line that matched a pattern instead
682                 of the whole line. In this mode, no context  is  shown.  That
683                 is,  the -A, -B, and -C options are ignored. If there is more
684                 than one match in a line, each of them is  shown  separately,
685                 on  a separate line of output. If -o is combined with -v (in-
686                 vert the sense of the match to find non-matching  lines),  no
687                 output  is  generated,  but  the return code is set appropri-
688                 ately. If the matched portion of the line is  empty,  nothing
689                 is  output  unless  the  file  name  or line number are being
690                 printed, in which case they are shown on an  otherwise  empty
691                 line.  This  option  is  mutually  exclusive  with  --output,
692                 --file-offsets and --line-offsets.
693
694       -onumber, --only-matching=number
695                 Show only the part of the line  that  matched  the  capturing
696                 parentheses of the given number. Up to 50 capturing parenthe-
697                 ses are supported by default. This limit can be  changed  via
698                 the  --om-capture option. A pattern may contain any number of
699                 capturing parentheses, but only those whose number is  within
700                 the  limit can be accessed by -o. An error occurs if the num-
701                 ber specified by -o is greater than the limit.
702
703                 -o0 is the same as -o without a number. Because these options
704                 can  be given without an argument (see above), if an argument
705                 is present, it must be given in the same shell item, for  ex-
706                 ample,  -o3  or --only-matching=2. The comments given for the
707                 non-argument case above also apply to  this  option.  If  the
708                 specified  capturing parentheses do not exist in the pattern,
709                 or were not set in the match, nothing is  output  unless  the
710                 file name or line number are being output.
711
712                 If  this  option is given multiple times, multiple substrings
713                 are output for each match,  in  the  order  the  options  are
714                 given,  and  all on one line. For example, -o3 -o1 -o3 causes
715                 the substrings matched by capturing parentheses 3 and  1  and
716                 then  3 again to be output. By default, there is no separator
717                 (but see the next but one option).
718
719       --om-capture=number
720                 Set the number of capturing parentheses that can be  accessed
721                 by -o. The default is 50.
722
723       --om-separator=text
724                 Specify  a  separating string for multiple occurrences of -o.
725                 The default is an empty string. Separating strings are  never
726                 coloured.
727
728       -q, --quiet
729                 Work quietly, that is, display nothing except error messages.
730                 The exit status indicates whether or  not  any  matches  were
731                 found.
732
733       -r, --recursive
734                 If  any given path is a directory, recursively scan the files
735                 it contains, taking note of any --include and --exclude  set-
736                 tings.  By  default, a directory is read as a normal file; in
737                 some operating systems this gives an  immediate  end-of-file.
738                 This  option is a shorthand for setting the -d option to "re-
739                 curse".
740
741       --recursion-limit=number
742                 This is an obsolete synonym for --depth-limit.  See  --match-
743                 limit above for details.
744
745       -s, --no-messages
746                 Suppress  error  messages  about  non-existent  or unreadable
747                 files. Such files are quietly skipped.  However,  the  return
748                 code is still 2, even if matches were found in other files.
749
750       -t, --total-count
751                 This  option  is  useful when scanning more than one file. If
752                 used on its own, -t suppresses all output except for a  grand
753                 total  number  of matching lines (or non-matching lines if -v
754                 is used) in all the files. If -t is used with -c, a grand to-
755                 tal  is  output  except  when the previous output is just one
756                 line. In other words, it is not output when just  one  file's
757                 count  is  listed.  If file names are being output, the grand
758                 total is preceded by "TOTAL:". Otherwise, it appears as  just
759                 another  number.  The  -t option is ignored when used with -L
760                 (list files without matches), because the grand  total  would
761                 always be zero.
762
763       -u, --utf Operate in UTF-8 mode. This option is available only if PCRE2
764                 has been compiled with UTF-8 support. All patterns (including
765                 those  for any --exclude and --include options) and all lines
766                 that are scanned must be valid strings of  UTF-8  characters.
767                 If an invalid UTF-8 string is encountered, an error occurs.
768
769       -U, --utf-allow-invalid
770                 As  --utf,  but in addition subject lines may contain invalid
771                 UTF-8 code unit sequences. These can never form part  of  any
772                 pattern  match.  Patterns  themselves, however, must still be
773                 valid UTF-8 strings. This facility allows valid UTF-8 strings
774                 to be sought within arbitrary byte sequences in executable or
775                 other binary files. For more details about matching  in  non-
776                 valid UTF-8 strings, see the pcre2unicode(3) documentation.
777
778       -V, --version
779                 Write  the version numbers of pcre2grep and the PCRE2 library
780                 to the standard output and then exit. Anything  else  on  the
781                 command line is ignored.
782
783       -v, --invert-match
784                 Invert  the  sense  of  the match, so that lines which do not
785                 match any of the patterns are the ones that are  found.  When
786                 this  option  is  set,  options  such  as --only-matching and
787                 --output, which specify parts of a match that are to be  out-
788                 put, are ignored.
789
790       -w, --word-regex, --word-regexp
791                 Force the patterns only to match "words". That is, there must
792                 be a word boundary at the  start  and  end  of  each  matched
793                 string.  This is equivalent to having "\b(?:" at the start of
794                 each pattern, and ")\b" at the end. This option applies  only
795                 to  the  patterns  that  are  matched against the contents of
796                 files; it does not apply to patterns specified by any of  the
797                 --include or --exclude options.
798
799       -x, --line-regex, --line-regexp
800                 Force  the  patterns to start matching only at the beginnings
801                 of lines, and in  addition,  require  them  to  match  entire
802                 lines. In multiline mode the match may be more than one line.
803                 This is equivalent to having "^(?:" at the start of each pat-
804                 tern  and  ")$"  at  the end. This option applies only to the
805                 patterns that are matched against the contents of  files;  it
806                 does  not apply to patterns specified by any of the --include
807                 or --exclude options.
808
809
810ENVIRONMENT VARIABLES
811
812       The environment variables LC_ALL and LC_CTYPE are examined, in that or-
813       der, for a locale. The first one that is set is used. This can be over-
814       ridden by the --locale option. If no locale is set, the PCRE2 library's
815       default (usually the "C" locale) is used.
816
817
818NEWLINES
819
820       The  -N  (--newline) option allows pcre2grep to scan files with newline
821       conventions that differ from the default. This option affects only  the
822       way  scanned files are processed. It does not affect the interpretation
823       of files specified by the -f,  --file-list,  --exclude-from,  or  --in-
824       clude-from options.
825
826       Any  parts  of the scanned input files that are written to the standard
827       output are copied with whatever newline sequences they have in the  in-
828       put.  However,  if  the final line of a file is output, and it does not
829       end with a newline sequence, a newline sequence is added. If  the  new-
830       line  setting  is  CR, LF, CRLF or NUL, that line ending is output; for
831       the other settings (ANYCRLF or ANY) a single NL is used.
832
833       The newline setting does not affect the way in which  pcre2grep  writes
834       newlines  in  informational  messages  to the standard output and error
835       streams.  Under Windows, the standard output is set to  be  binary,  so
836       that  "\r\n" at the ends of output lines that are copied from the input
837       is not converted to "\r\r\n" by the C I/O library. This means that  any
838       messages  written  to the standard output must end with "\r\n". For all
839       other operating systems, and for all messages  to  the  standard  error
840       stream, "\n" is used.
841
842
843OPTIONS COMPATIBILITY
844
845       Many of the short and long forms of pcre2grep's options are the same as
846       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
847       terminology) is also available as --xxx-regex (PCRE2 terminology). How-
848       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
849       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi-
850       line, -N, --newline,  --om-separator,  --output,  -u,  --utf,  -U,  and
851       --utf-allow-invalid options are specific to pcre2grep, as is the use of
852       the --only-matching option with a capturing parentheses number.
853
854       Although most of the common options work the same way, a few  are  dif-
855       ferent  in pcre2grep. For example, the --include option's argument is a
856       glob for GNU grep, but a regular expression for pcre2grep. If both  the
857       -c  and  -l  options are given, GNU grep lists only file names, without
858       counts, but pcre2grep gives the counts as well.
859
860
861OPTIONS WITH DATA
862
863       There are four different ways in which an option with data can be spec-
864       ified.   If  a  short  form option is used, the data may follow immedi-
865       ately, or (with one exception) in the next command line item. For exam-
866       ple:
867
868         -f/some/file
869         -f /some/file
870
871       The  exception is the -o option, which may appear with or without data.
872       Because of this, if data is present, it must follow immediately in  the
873       same item, for example -o3.
874
875       If  a long form option is used, the data may appear in the same command
876       line item, separated by an equals character, or (with  two  exceptions)
877       it may appear in the next command line item. For example:
878
879         --file=/some/file
880         --file /some/file
881
882       Note,  however, that if you want to supply a file name beginning with ~
883       as data in a shell command, and have the shell expand ~ to a  home  di-
884       rectory,  you  must separate the file name from the option, because the
885       shell does not treat ~ specially unless it is at the start of an item.
886
887       The exceptions to the above are the --colour (or --color)  and  --only-
888       matching  options,  for which the data is optional. If one of these op-
889       tions does have data, it must be given in  the  first  form,  using  an
890       equals character. Otherwise pcre2grep will assume that it has no data.
891
892
893USING PCRE2'S CALLOUT FACILITY
894
895       pcre2grep  has,  by  default,  support for calling external programs or
896       scripts or echoing specific strings during matching by  making  use  of
897       PCRE2's  callout  facility.  However, this support can be completely or
898       partially disabled when pcre2grep is built. You can  find  out  whether
899       your  binary has support for callouts by running it with the --help op-
900       tion. If callout support is completely disabled, all callouts  in  pat-
901       terns are ignored by pcre2grep.  If the facility is partially disabled,
902       calling external programs is not supported, and callouts  that  request
903       it are ignored.
904
905       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu-
906       ment is either a number or a quoted string (see the pcre2callout  docu-
907       mentation  for  details).  Numbered  callouts are ignored by pcre2grep;
908       only callouts with string arguments are useful.
909
910   Echoing a specific string
911
912       Starting the callout string with a pipe character  invokes  an  echoing
913       facility that avoids calling an external program or script. This facil-
914       ity is always available, provided that  callouts  were  not  completely
915       disabled  when  pcre2grep  was built. The rest of the callout string is
916       processed as a zero-terminated string, which means it should  not  con-
917       tain  any  internal  binary  zeros. It is written to the output, having
918       first been passed through the same escape processing as text  from  the
919       --output  (-O) option (see above). However, $0 cannot be used to insert
920       a matched substring because the match is still  in  progress.  Instead,
921       the  single  character '0' is inserted. Any syntax errors in the string
922       (for example, a dollar not followed by another  character)  causes  the
923       callout  to be ignored. No terminator is added to the output string, so
924       if you want a newline, you must include it explicitly using the  escape
925       $n. For example:
926
927         pcre2grep '(.)(..(.))(?C"|[$1] [$2] [$3]$n")' <some file>
928
929       Matching  continues normally after the string is output. If you want to
930       see only the callout output but not any output from  an  actual  match,
931       you should end the pattern with (*FAIL).
932
933   Calling external programs or scripts
934
935       This facility can be independently disabled when pcre2grep is built. It
936       is supported for Windows, where a call to _spawnvp() is used, for  VMS,
937       where  lib$spawn()  is  used,  and  for any Unix-like environment where
938       fork() and execv() are available.
939
940       If the callout string does not start with a pipe (vertical bar) charac-
941       ter,  it  is parsed into a list of substrings separated by pipe charac-
942       ters. The first substring must be an executable name, with the  follow-
943       ing substrings specifying arguments:
944
945         executable_name|arg1|arg2|...
946
947       Any  substring  (including  the executable name) may contain escape se-
948       quences started by a dollar character. These are the same  as  for  the
949       --output (-O) option documented above, except that $0 cannot insert the
950       matched string because the match is still  in  progress.  Instead,  the
951       character '0' is inserted. If you need a literal dollar or pipe charac-
952       ter in any substring, use $$ or $| respectively. Here is an example:
953
954         echo -e "abcde\n12345" | pcre2grep \
955           '(?x)(.)(..(.))
956           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
957
958         Output:
959
960           Arg1: [a] [bcd] [d] Arg2: |a| ()
961           abcde
962           Arg1: [1] [234] [4] Arg2: |1| ()
963           12345
964
965       The parameters for the system call that is used to run the  program  or
966       script are zero-terminated strings. This means that binary zero charac-
967       ters in the callout argument will cause premature termination of  their
968       substrings,  and  therefore should not be present. Any syntax errors in
969       the string (for example, a dollar not followed  by  another  character)
970       causes the callout to be ignored.  If running the program fails for any
971       reason (including the non-existence of the executable), a local  match-
972       ing failure occurs and the matcher backtracks in the normal way.
973
974
975MATCHING ERRORS
976
977       It  is  possible  to supply a regular expression that takes a very long
978       time to fail to match certain lines.  Such  patterns  normally  involve
979       nested  indefinite repeats, for example: (a+)*\d when matched against a
980       line of a's with no final digit. The PCRE2 matching function has a  re-
981       source  limit  that  causes it to abort in these circumstances. If this
982       happens, pcre2grep outputs an error message and the  line  that  caused
983       the  problem  to  the  standard error stream. If there are more than 20
984       such errors, pcre2grep gives up.
985
986       The --match-limit option of pcre2grep can be used to  set  the  overall
987       resource  limit.  There are also other limits that affect the amount of
988       memory used during matching; see the  discussion  of  --heap-limit  and
989       --depth-limit above.
990
991
992DIAGNOSTICS
993
994       Exit status is 0 if any matches were found, 1 if no matches were found,
995       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
996       files  (even if matches were found in other files) or too many matching
997       errors. Using the -s option to suppress error messages about inaccessi-
998       ble files does not affect the return code.
999
1000       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
1001       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
1002       exit(1).
1003
1004
1005SEE ALSO
1006
1007       pcre2pattern(3), pcre2syntax(3), pcre2callout(3), pcre2unicode(3).
1008
1009
1010AUTHOR
1011
1012       Philip Hazel
1013       University Computing Service
1014       Cambridge, England.
1015
1016
1017REVISION
1018
1019       Last updated: 04 October 2020
1020       Copyright (c) 1997-2020 University of Cambridge.
1021