• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1PCRE2GREP(1)                General Commands Manual               PCRE2GREP(1)
2
3
4
5NAME
6       pcre2grep - a grep with Perl-compatible regular expressions.
7
8SYNOPSIS
9       pcre2grep [options] [long options] [pattern] [path1 path2 ...]
10
11
12DESCRIPTION
13
14       pcre2grep  searches  files  for  character patterns, in the same way as
15       other grep commands do,  but  it  uses  the  PCRE2  regular  expression
16       library  to  support  patterns  that  are  compatible  with the regular
17       expressions of Perl 5. See pcre2syntax(3) for a quick-reference summary
18       of  pattern  syntax,  or  pcre2pattern(3) for a full description of the
19       syntax and semantics of the regular expressions that PCRE2 supports.
20
21       Patterns, whether supplied on the command line or in a  separate  file,
22       are given without delimiters. For example:
23
24         pcre2grep Thursday /etc/motd
25
26       If you attempt to use delimiters (for example, by surrounding a pattern
27       with slashes, as is common in Perl scripts), they  are  interpreted  as
28       part  of  the pattern. Quotes can of course be used to delimit patterns
29       on the command line because they are  interpreted  by  the  shell,  and
30       indeed  quotes  are required if a pattern contains white space or shell
31       metacharacters.
32
33       The first argument that follows any option settings is treated  as  the
34       single  pattern  to be matched when neither -e nor -f is present.  Con-
35       versely, when one or both of these options are  used  to  specify  pat-
36       terns, all arguments are treated as path names. At least one of -e, -f,
37       or an argument pattern must be provided.
38
39       If no files are specified, pcre2grep  reads  the  standard  input.  The
40       standard  input can also be referenced by a name consisting of a single
41       hyphen.  For example:
42
43         pcre2grep some-pattern file1 - file3
44
45       Input files are searched line by  line.  By  default,  each  line  that
46       matches  a  pattern  is  copied to the standard output, and if there is
47       more than one file, the file name is output at the start of each  line,
48       followed  by  a  colon.  However, there are options that can change how
49       pcre2grep behaves. In particular, the -M option makes  it  possible  to
50       search  for  strings  that  span  line  boundaries. What defines a line
51       boundary is controlled by the -N (--newline) option.
52
53       The amount of memory used for buffering files that are being scanned is
54       controlled  by  parameters  that  can  be  set by the --buffer-size and
55       --max-buffer-size options. The first of these sets the size  of  buffer
56       that  is obtained at the start of processing. If an input file contains
57       very long lines, a larger buffer may be  needed;  this  is  handled  by
58       automatically extending the buffer, up to the limit specified by --max-
59       buffer-size. The default values for these parameters can  be  set  when
60       pcre2grep  is  built;  if nothing is specified, the defaults are set to
61       20KiB and 1MiB respectively. An error occurs if a line is too long  and
62       the buffer can no longer be expanded.
63
64       The  block  of  memory that is actually used is three times the "buffer
65       size", to allow for buffering "before" and "after" lines. If the buffer
66       size  is too small, fewer than requested "before" and "after" lines may
67       be output.
68
69       Patterns can be no longer than 8KiB or BUFSIZ bytes, whichever  is  the
70       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
71       pattern (specified by the use of -e and/or -f), each pattern is applied
72       to  each  line  in the order in which they are defined, except that all
73       the -e patterns are tried before the -f patterns.
74
75       By default, as soon as one pattern matches a line, no further  patterns
76       are considered. However, if --colour (or --color) is used to colour the
77       matching substrings, or if --only-matching, --file-offsets, or  --line-
78       offsets  is  used  to  output  only  the  part of the line that matched
79       (either shown literally, or as an offset), scanning resumes immediately
80       following  the  match,  so that further matches on the same line can be
81       found. If there are multiple  patterns,  they  are  all  tried  on  the
82       remainder  of  the  line, but patterns that follow the one that matched
83       are not tried on the earlier part of the line.
84
85       This behaviour means that the order  in  which  multiple  patterns  are
86       specified  can affect the output when one of the above options is used.
87       This is no longer the same behaviour as GNU grep, which now manages  to
88       display  earlier  matches  for  later  patterns (as long as there is no
89       overlap).
90
91       Patterns that can match an empty string are accepted, but empty  string
92       matches   are   never   recognized.   An   example   is   the   pattern
93       "(super)?(man)?", in which all components are  optional.  This  pattern
94       finds  all  occurrences  of  both "super" and "man"; the output differs
95       from matching with "super|man" when only the  matching  substrings  are
96       being shown.
97
98       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
99       the value to set a locale when calling the PCRE2 library.  The --locale
100       option can be used to override this.
101
102
103SUPPORT FOR COMPRESSED FILES
104
105       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
106       read compressed files whose names end in .gz or .bz2, respectively. You
107       can  find out whether your pcre2grep binary has support for one or both
108       of these file types by running it with the --help option. If the appro-
109       priate support is not present, all files are treated as plain text. The
110       standard input is always so treated. When input is  from  a  compressed
111       .gz or .bz2 file, the --line-buffered option is ignored.
112
113
114BINARY FILES
115
116       By  default,  a  file that contains a binary zero byte within the first
117       1024 bytes is identified as a binary file, and is processed  specially.
118       (GNU grep identifies binary files in this manner.) However, if the new-
119       line type is specified as "nul", that is,  the  line  terminator  is  a
120       binary  zero,  the  test  for  a  binary  file  is not applied. See the
121       --binary-files option for a means of changing the way binary files  are
122       handled.
123
124
125BINARY ZEROS IN PATTERNS
126
127       Patterns  passed  from the command line are strings that are terminated
128       by a binary zero, so cannot contain internal zeros.  However,  patterns
129       that are read from a file via the -f option may contain binary zeros.
130
131
132OPTIONS
133
134       The  order  in  which some of the options appear can affect the output.
135       For example, both the -H and -l options affect  the  printing  of  file
136       names.  Whichever  comes later in the command line will be the one that
137       takes effect. Similarly, except where noted  below,  if  an  option  is
138       given  twice,  the  later setting is used. Numerical values for options
139       may be followed by K  or  M,  to  signify  multiplication  by  1024  or
140       1024*1024 respectively.
141
142       --        This terminates the list of options. It is useful if the next
143                 item on the command line starts with a hyphen but is  not  an
144                 option.  This  allows for the processing of patterns and file
145                 names that start with hyphens.
146
147       -A number, --after-context=number
148                 Output up to number lines  of  context  after  each  matching
149                 line.  Fewer lines are output if the next match or the end of
150                 the file is reached, or if the  processing  buffer  size  has
151                 been  set  too  small.  If file names and/or line numbers are
152                 being output, a hyphen separator is used instead of  a  colon
153                 for  the  context  lines.  A  line  containing "--" is output
154                 between each group of lines, unless they are in fact contigu-
155                 ous  in the input file. The value of number is expected to be
156                 relatively small. When -c is used, -A is ignored.
157
158       -a, --text
159                 Treat binary files as text. This is equivalent  to  --binary-
160                 files=text.
161
162       -B number, --before-context=number
163                 Output  up  to  number  lines of context before each matching
164                 line. Fewer lines are output if the  previous  match  or  the
165                 start  of the file is within number lines, or if the process-
166                 ing buffer size has been set too small. If file names  and/or
167                 line  numbers  are  being  output, a hyphen separator is used
168                 instead of a colon for the context lines. A  line  containing
169                 "--"  is  output between each group of lines, unless they are
170                 in fact contiguous in the input file. The value of number  is
171                 expected  to  be  relatively  small.  When  -c is used, -B is
172                 ignored.
173
174       --binary-files=word
175                 Specify how binary files are to be processed. If the word  is
176                 "binary"  (the  default),  pattern  matching  is performed on
177                 binary files, but the only  output  is  "Binary  file  <name>
178                 matches"  when a match succeeds. If the word is "text", which
179                 is equivalent to the -a or --text option,  binary  files  are
180                 processed  in  the  same way as any other file. In this case,
181                 when a match succeeds, the  output  may  be  binary  garbage,
182                 which  can  have  nasty effects if sent to a terminal. If the
183                 word is  "without-match",  which  is  equivalent  to  the  -I
184                 option,  binary  files  are  not  processed  at all; they are
185                 assumed not to be of interest and are skipped without causing
186                 any output or affecting the return code.
187
188       --buffer-size=number
189                 Set  the  parameter that controls how much memory is obtained
190                 at the start of processing for buffering files that are being
191                 scanned. See also --max-buffer-size below.
192
193       -C number, --context=number
194                 Output  number  lines  of  context both before and after each
195                 matching line.  This is equivalent to setting both -A and  -B
196                 to the same value.
197
198       -c, --count
199                 Do  not  output  lines from the files that are being scanned;
200                 instead output the number  of  lines  that  would  have  been
201                 shown, either because they matched, or, if -v is set, because
202                 they failed to match. By default, this count is  exactly  the
203                 same  as the number of lines that would have been output, but
204                 if the -M (multiline) option is used (without -v), there  may
205                 be  more suppressed lines than the count (that is, the number
206                 of matches).
207
208                 If no lines are selected, the number zero is output. If  sev-
209                 eral  files are are being scanned, a count is output for each
210                 of them and the -t option can be used to cause a total to  be
211                 output  at  the  end.  However,  if  the --files-with-matches
212                 option is also  used,  only  those  files  whose  counts  are
213                 greater  than  zero  are listed. When -c is used, the -A, -B,
214                 and -C options are ignored.
215
216       --colour, --color
217                 If this option is given without any data, it is equivalent to
218                 "--colour=auto".   If  data  is required, it must be given in
219                 the same shell item, separated by an equals sign.
220
221       --colour=value, --color=value
222                 This option specifies under what circumstances the parts of a
223                 line that matched a pattern should be coloured in the output.
224                 By default, the output is not coloured. The value  (which  is
225                 optional,  see above) may be "never", "always", or "auto". In
226                 the latter case, colouring happens only if the standard  out-
227                 put  is connected to a terminal. More resources are used when
228                 colouring is enabled, because pcre2grep has to search for all
229                 possible  matches in a line, not just one, in order to colour
230                 them all.
231
232                 The colour that is used can be specified by  setting  one  of
233                 the  environment variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR,
234                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
235                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
236                 GREP_COLORS or GREP_COLOR (in that order). The value  of  the
237                 variable  should  be  a string of two numbers, separated by a
238                 semicolon, except in the  case  of  GREP_COLORS,  which  must
239                 start with "ms=" or "mt=" followed by two semicolon-separated
240                 colours, terminated by the end of the string or by  a  colon.
241                 If  GREP_COLORS  does  not  start  with  "ms=" or "mt=" it is
242                 ignored, and GREP_COLOR is checked.
243
244                 If the string obtained from one of the above  variables  con-
245                 tains any characters other than semicolon or digits, the set-
246                 ting is ignored and the default colour is used. The string is
247                 copied directly into the control string for setting colour on
248                 a terminal, so it is your responsibility to ensure  that  the
249                 values  make  sense.  If  no relevant environment variable is
250                 set, the default is "1;31", which gives red.
251
252       -D action, --devices=action
253                 If an input path is  not  a  regular  file  or  a  directory,
254                 "action"  specifies  how  it is to be processed. Valid values
255                 are "read" (the default) or "skip" (silently skip the path).
256
257       -d action, --directories=action
258                 If an input path is a directory, "action" specifies how it is
259                 to  be  processed.   Valid  values are "read" (the default in
260                 non-Windows environments, for compatibility with  GNU  grep),
261                 "recurse"  (equivalent to the -r option), or "skip" (silently
262                 skip the path, the default in Windows environments).  In  the
263                 "read"  case,  directories  are read as if they were ordinary
264                 files. In some operating systems  the  effect  of  reading  a
265                 directory like this is an immediate end-of-file; in others it
266                 may provoke an error.
267
268       --depth-limit=number
269                 See --match-limit below.
270
271       -e pattern, --regex=pattern, --regexp=pattern
272                 Specify a pattern to be matched. This option can be used mul-
273                 tiple times in order to specify several patterns. It can also
274                 be used as a way of specifying a single pattern  that  starts
275                 with  a hyphen. When -e is used, no argument pattern is taken
276                 from the command line; all  arguments  are  treated  as  file
277                 names.  There is no limit to the number of patterns. They are
278                 applied to each line in the order in which they  are  defined
279                 until one matches.
280
281                 If  -f is used with -e, the command line patterns are matched
282                 first, followed by the patterns from the file(s), independent
283                 of  the order in which these options are specified. Note that
284                 multiple use of -e is not the same as a single  pattern  with
285                 alternatives. For example, X|Y finds the first character in a
286                 line that is X or Y, whereas if the two  patterns  are  given
287                 separately, with X first, pcre2grep finds X if it is present,
288                 even if it follows Y in the line. It finds Y only if there is
289                 no  X  in  the line. This matters only if you are using -o or
290                 --colo(u)r to show the part(s) of the line that matched.
291
292       --exclude=pattern
293                 Files (but not directories) whose names match the pattern are
294                 skipped  without  being processed. This applies to all files,
295                 whether listed on the command  line,  obtained  from  --file-
296                 list, or by scanning a directory. The pattern is a PCRE2 reg-
297                 ular expression, and is matched against the  final  component
298                 of  the  file  name,  not the entire path. The -F, -w, and -x
299                 options do not apply to this pattern. The option may be given
300                 any number of times in order to specify multiple patterns. If
301                 a file name matches both an --include and an  --exclude  pat-
302                 tern, it is excluded. There is no short form for this option.
303
304       --exclude-from=filename
305                 Treat  each  non-empty  line  of  the file as the data for an
306                 --exclude option. What constitutes a newline when reading the
307                 file  is the operating system's default. The --newline option
308                 has no effect on this option. This option may be  given  more
309                 than once in order to specify a number of files to read.
310
311       --exclude-dir=pattern
312                 Directories whose names match the pattern are skipped without
313                 being processed, whatever  the  setting  of  the  --recursive
314                 option.  This  applies  to all directories, whether listed on
315                 the command line, obtained from --file-list, or by scanning a
316                 parent  directory. The pattern is a PCRE2 regular expression,
317                 and is matched against the final component of  the  directory
318                 name,  not the entire path. The -F, -w, and -x options do not
319                 apply to this pattern. The option may be given any number  of
320                 times  in order to specify more than one pattern. If a direc-
321                 tory matches both  --include-dir  and  --exclude-dir,  it  is
322                 excluded. There is no short form for this option.
323
324       -F, --fixed-strings
325                 Interpret  each  data-matching  pattern  as  a  list of fixed
326                 strings, separated by  newlines,  instead  of  as  a  regular
327                 expression.  What  constitutes  a newline for this purpose is
328                 controlled by the --newline option. The -w (match as a  word)
329                 and  -x (match whole line) options can be used with -F.  They
330                 apply to each of the fixed strings. A line is selected if any
331                 of the fixed strings are found in it (subject to -w or -x, if
332                 present). This option applies only to the patterns  that  are
333                 matched  against  the contents of files; it does not apply to
334                 patterns specified by  any  of  the  --include  or  --exclude
335                 options.
336
337       -f filename, --file=filename
338                 Read  patterns  from  the  file, one per line, and match them
339                 against each line of input. As is the case with  patterns  on
340                 the  command line, no delimiters should be used. What consti-
341                 tutes a newline when reading the file is the  operating  sys-
342                 tem's  default interpretation of \n. The --newline option has
343                 no effect on this option. Trailing  white  space  is  removed
344                 from  each  line,  and blank lines are ignored. An empty file
345                 contains no patterns and therefore matches nothing.  Patterns
346                 read  from a file in this way may contain binary zeros, which
347                 are treated as ordinary data characters. See  also  the  com-
348                 ments  about  multiple  patterns versus a single pattern with
349                 alternatives in the description of -e above.
350
351                 If this option is given more than  once,  all  the  specified
352                 files  are read. A data line is output if any of the patterns
353                 match it. A file name can be given as "-"  to  refer  to  the
354                 standard  input.  When  -f is used, patterns specified on the
355                 command line using -e may also be present;  they  are  tested
356                 before  the  file's  patterns.  However,  no other pattern is
357                 taken from the command line; all arguments are treated as the
358                 names of paths to be searched.
359
360       --file-list=filename
361                 Read  a  list  of  files  and/or  directories  that are to be
362                 scanned from the given file, one per line. What constitutes a
363                 newline  when  reading  the  file  is  the operating system's
364                 default. Trailing white space is removed from each line,  and
365                 blank lines are ignored. These paths are processed before any
366                 that are listed on the command line. The  file  name  can  be
367                 given  as  "-"  to refer to the standard input. If --file and
368                 --file-list are both specified  as  "-",  patterns  are  read
369                 first.  This is useful only when the standard input is a ter-
370                 minal, from which further lines (the list of  files)  can  be
371                 read after an end-of-file indication. If this option is given
372                 more than once, all the specified files are read.
373
374       --file-offsets
375                 Instead of showing lines or parts of lines that  match,  show
376                 each  match  as  an  offset  from the start of the file and a
377                 length, separated by a comma. In this  mode,  no  context  is
378                 shown.  That  is,  the -A, -B, and -C options are ignored. If
379                 there is more than one match in a line, each of them is shown
380                 separately.  This option is mutually exclusive with --output,
381                 --line-offsets, and --only-matching.
382
383       -H, --with-filename
384                 Force the inclusion of the file name at the start  of  output
385                 lines when searching a single file. By default, the file name
386                 is not shown in this case.  For matching lines, the file name
387                 is followed by a colon; for context lines, a hyphen separator
388                 is used. If a line number is also being  output,  it  follows
389                 the  file  name. When the -M option causes a pattern to match
390                 more than one line, only the first is preceded  by  the  file
391                 name.  This  option  overrides  any  previous  -h,  -l, or -L
392                 options.
393
394       -h, --no-filename
395                 Suppress the output file names when searching multiple files.
396                 By  default,  file  names  are  shown when multiple files are
397                 searched. For matching lines, the file name is followed by  a
398                 colon;  for  context lines, a hyphen separator is used.  If a
399                 line number is also being output, it follows the  file  name.
400                 This option overrides any previous -H, -L, or -l options.
401
402       --heap-limit=number
403                 See --match-limit below.
404
405       --help    Output  a  help  message, giving brief details of the command
406                 options and file type support, and then exit.  Anything  else
407                 on the command line is ignored.
408
409       -I        Ignore   binary   files.  This  is  equivalent  to  --binary-
410                 files=without-match.
411
412       -i, --ignore-case
413                 Ignore upper/lower case distinctions during comparisons.
414
415       --include=pattern
416                 If any --include patterns are specified, the only files  that
417                 are  processed  are those that match one of the patterns (and
418                 do not match an --exclude  pattern).  This  option  does  not
419                 affect  directories,  but  it  applies  to all files, whether
420                 listed on the command line, obtained from --file-list, or  by
421                 scanning  a directory. The pattern is a PCRE2 regular expres-
422                 sion, and is matched against the final component of the  file
423                 name,  not the entire path. The -F, -w, and -x options do not
424                 apply to this pattern. The option may be given any number  of
425                 times.  If  a  file  name  matches  both  an --include and an
426                 --exclude pattern, it is excluded.  There is  no  short  form
427                 for this option.
428
429       --include-from=filename
430                 Treat  each  non-empty  line  of  the file as the data for an
431                 --include option. What constitutes a newline for this purpose
432                 is  the  operating system's default. The --newline option has
433                 no effect on this option. This option may be given any number
434                 of times; all the files are read.
435
436       --include-dir=pattern
437                 If  any --include-dir patterns are specified, the only direc-
438                 tories that are processed are those that  match  one  of  the
439                 patterns  (and  do  not match an --exclude-dir pattern). This
440                 applies to all directories, whether  listed  on  the  command
441                 line,  obtained  from  --file-list,  or  by scanning a parent
442                 directory. The pattern is a PCRE2 regular expression, and  is
443                 matched  against  the  final component of the directory name,
444                 not the entire path. The -F, -w, and -x options do not  apply
445                 to this pattern. The option may be given any number of times.
446                 If a directory matches both --include-dir and  --exclude-dir,
447                 it is excluded. There is no short form for this option.
448
449       -L, --files-without-match
450                 Instead  of  outputting lines from the files, just output the
451                 names of the files that do not contain any lines  that  would
452                 have  been  output. Each file name is output once, on a sepa-
453                 rate line. This option overrides any previous -H, -h,  or  -l
454                 options.
455
456       -l, --files-with-matches
457                 Instead  of  outputting lines from the files, just output the
458                 names of the files containing lines that would have been out-
459                 put.  Each  file  name  is  output  once, on a separate line.
460                 Searching normally stops as soon as a matching line is  found
461                 in  a  file.  However, if the -c (count) option is also used,
462                 matching continues in order to obtain the correct count,  and
463                 those  files  that  have  at least one match are listed along
464                 with their counts. Using this option with -c is a way of sup-
465                 pressing  the  listing  of files with no matches. This opeion
466                 overrides any previous -H, -h, or -L options.
467
468       --label=name
469                 This option supplies a name to be used for the standard input
470                 when file names are being output. If not supplied, "(standard
471                 input)" is used. There is no short form for this option.
472
473       --line-buffered
474                 When this option is given, non-compressed input is  read  and
475                 processed  line by line, and the output is flushed after each
476                 write. By default, input is  read  in  large  chunks,  unless
477                 pcre2grep  can  determine  that it is reading from a terminal
478                 (which is currently possible only in  Unix-like  environments
479                 or  Windows).  Output  to  terminal is normally automatically
480                 flushed by the operating system. This option  can  be  useful
481                 when the input or output is attached to a pipe and you do not
482                 want pcre2grep to buffer up large amounts of data.   However,
483                 its  use  will  affect  performance,  and  the -M (multiline)
484                 option ceases to work. When input is from a compressed .gz or
485                 .bz2 file, --line-buffered is ignored.
486
487       --line-offsets
488                 Instead  of  showing lines or parts of lines that match, show
489                 each match as a line number, the offset from the start of the
490                 line,  and a length. The line number is terminated by a colon
491                 (as usual; see the -n option), and the offset and length  are
492                 separated  by  a  comma.  In  this mode, no context is shown.
493                 That is, the -A, -B, and -C options are ignored. If there  is
494                 more  than  one  match in a line, each of them is shown sepa-
495                 rately. This option  is  mutually  exclusive  with  --output,
496                 --file-offsets, and --only-matching.
497
498       --locale=locale-name
499                 This  option specifies a locale to be used for pattern match-
500                 ing. It overrides the value in the LC_ALL or  LC_CTYPE  envi-
501                 ronment  variables.  If  no  locale  is  specified, the PCRE2
502                 library's default (usually the "C" locale) is used. There  is
503                 no short form for this option.
504
505       --match-limit=number
506                 Processing  some  regular expression patterns may take a very
507                 long time to search for all possible matching strings. Others
508                 may  require  a  very large amount of memory. There are three
509                 options that set resource limits for matching.
510
511                 The --match-limit option provides a means of limiting comput-
512                 ing  resource  usage  when  processing  patterns that are not
513                 going to match, but which have a very large number of  possi-
514                 bilities in their search trees. The classic example is a pat-
515                 tern that uses nested unlimited  repeats.  Internally,  PCRE2
516                 has  a  counter that is incremented each time around its main
517                 processing  loop.  If  the  value  set  by  --match-limit  is
518                 reached, an error occurs.
519
520                 The  --heap-limit  option specifies, as a number of kibibytes
521                 (units of 1024 bytes), the amount of heap memory that may  be
522                 used for matching. Heap memory is needed only if matching the
523                 pattern requires a significant number of nested  backtracking
524                 points to be remembered. This parameter can be set to zero to
525                 forbid the use of heap memory altogether.
526
527                 The --depth-limit option limits the  depth  of  nested  back-
528                 tracking points, which indirectly limits the amount of memory
529                 that is used. The amount of memory needed for each backtrack-
530                 ing  point  depends on the number of capturing parentheses in
531                 the pattern, so the amount of memory that is used before this
532                 limit  acts  varies from pattern to pattern. This limit is of
533                 use only if it is set smaller than --match-limit.
534
535                 There are no short forms for these options. The default  lim-
536                 its  can  be  set when the PCRE2 library is compiled; if they
537                 are not specified, the defaults are very large and so  effec-
538                 tively unlimited.
539
540       --max-buffer-size=number
541                 This  limits  the  expansion  of the processing buffer, whose
542                 initial size can be set by --buffer-size. The maximum  buffer
543                 size  is  silently  forced to be no smaller than the starting
544                 buffer size.
545
546       -M, --multiline
547                 Allow patterns to match more than one line. When this  option
548                 is set, the PCRE2 library is called in "multiline" mode. This
549                 allows a matched string to extend past the end of a line  and
550                 continue  on one or more subsequent lines. Patterns used with
551                 -M may usefully contain literal newline characters and inter-
552                 nal  occurrences of ^ and $ characters. The output for a suc-
553                 cessful match may consist of more than one  line.  The  first
554                 line  is  the  line  in which the match started, and the last
555                 line is the line in which the match  ended.  If  the  matched
556                 string  ends  with a newline sequence, the output ends at the
557                 end of that line.  If -v is set,  none  of  the  lines  in  a
558                 multi-line  match  are output. Once a match has been handled,
559                 scanning restarts at the beginning of the line after the  one
560                 in which the match ended.
561
562                 The  newline  sequence  that separates multiple lines must be
563                 matched as part of the pattern.  For  example,  to  find  the
564                 phrase  "regular  expression" in a file where "regular" might
565                 be at the end of a line and "expression" at the start of  the
566                 next line, you could use this command:
567
568                   pcre2grep -M 'regular\s+expression' <file>
569
570                 The  \s  escape  sequence  matches any white space character,
571                 including newlines, and is followed  by  +  so  as  to  match
572                 trailing  white  space  on the first line as well as possibly
573                 handling a two-character newline sequence.
574
575                 There is a limit to the number of lines that can be  matched,
576                 imposed  by  the way that pcre2grep buffers the input file as
577                 it scans it. With a  sufficiently  large  processing  buffer,
578                 this should not be a problem, but the -M option does not work
579                 when input is read line by line (see --line-buffered.)
580
581       -N newline-type, --newline=newline-type
582                 The PCRE2 library supports  five  different  conventions  for
583                 indicating  the  ends of lines. They are the single-character
584                 sequences CR (carriage return) and LF  (linefeed),  the  two-
585                 character  sequence CRLF, an "anycrlf" convention, which rec-
586                 ognizes any of the preceding three types, and an  "any"  con-
587                 vention, in which any Unicode line ending sequence is assumed
588                 to end a line. The Unicode sequences are the three just  men-
589                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF  (form feed,
590                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
591                 U+2028), and PS (paragraph separator, U+2029).
592
593                 When  the  PCRE2  library  is  built,  a  default line-ending
594                 sequence  is  specified.   This  is  normally  the   standard
595                 sequence for the operating system. Unless otherwise specified
596                 by this option, pcre2grep uses the  library's  default.   The
597                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
598                 ANY. This makes it possible to use pcre2grep  to  scan  files
599                 that have come from other environments without having to mod-
600                 ify their line endings. If the data  that  is  being  scanned
601                 does  not  agree  with  the  convention  set  by this option,
602                 pcre2grep may behave in strange ways. Note that  this  option
603                 does  not apply to files specified by the -f, --exclude-from,
604                 or --include-from options, which  are  expected  to  use  the
605                 operating system's standard newline sequence.
606
607       -n, --line-number
608                 Precede each output line by its line number in the file, fol-
609                 lowed by a colon for matching lines or a hyphen  for  context
610                 lines. If the file name is also being output, it precedes the
611                 line number. When the -M option causes  a  pattern  to  match
612                 more  than  one  line, only the first is preceded by its line
613                 number. This option is forced if --line-offsets is used.
614
615       --no-jit  If the PCRE2 library is built with support  for  just-in-time
616                 compiling (which speeds up matching), pcre2grep automatically
617                 makes use of this, unless it was explicitly disabled at build
618                 time.  This  option  can be used to disable the use of JIT at
619                 run time. It is provided for testing and working round  prob-
620                 lems.  It should never be needed in normal use.
621
622       -O text, --output=text
623                 When  there  is a match, instead of outputting the whole line
624                 that matched, output just the  given  text.  This  option  is
625                 mutually  exclusive with --only-matching, --file-offsets, and
626                 --line-offsets. Escape sequences starting with a dollar char-
627                 acter  may be used to insert the contents of the matched part
628                 of the line and/or captured substrings into the text.
629
630                 $<digits> or ${<digits>} is replaced  by  the  captured  sub-
631                 string  of  the  given  decimal  number; zero substitutes the
632                 whole match. If the number is greater than the number of cap-
633                 turing  substrings,  or if the capture is unset, the replace-
634                 ment is empty.
635
636                 $a is replaced by bell; $b by backspace; $e by escape; $f  by
637                 form  feed;  $n by newline; $r by carriage return; $t by tab;
638                 $v by vertical tab.
639
640                 $o<digits> is replaced by the character  represented  by  the
641                 given octal number; up to three digits are processed.
642
643                 $x<digits>  is  replaced  by the character represented by the
644                 given hexadecimal number; up to two digits are processed.
645
646                 Any other character is substituted by itself. In  particular,
647                 $$ is replaced by a single dollar.
648
649       -o, --only-matching
650                 Show only the part of the line that matched a pattern instead
651                 of the whole line. In this mode, no context  is  shown.  That
652                 is,  the -A, -B, and -C options are ignored. If there is more
653                 than one match in a line, each of them is  shown  separately,
654                 on  a  separate  line  of  output.  If -o is combined with -v
655                 (invert the sense of the match to find  non-matching  lines),
656                 no  output is generated, but the return code is set appropri-
657                 ately. If the matched portion of the line is  empty,  nothing
658                 is  output  unless  the  file  name  or line number are being
659                 printed, in which case they are shown on an  otherwise  empty
660                 line.  This  option  is  mutually  exclusive  with  --output,
661                 --file-offsets and --line-offsets.
662
663       -onumber, --only-matching=number
664                 Show only the part of the line  that  matched  the  capturing
665                 parentheses of the given number. Up to 32 capturing parenthe-
666                 ses are supported, and -o0 is equivalent to -o without a num-
667                 ber.  Because  these options can be given without an argument
668                 (see above), if an argument is present, it must be  given  in
669                 the  same  shell item, for example, -o3 or --only-matching=2.
670                 The comments given for the non-argument case above also apply
671                 to this option. If the specified capturing parentheses do not
672                 exist in the pattern, or were not set in the  match,  nothing
673                 is  output unless the file name or line number are being out-
674                 put.
675
676                 If this option is given multiple times,  multiple  substrings
677                 are  output  for  each  match,  in  the order the options are
678                 given, and all on one line. For example, -o3 -o1  -o3  causes
679                 the  substrings  matched by capturing parentheses 3 and 1 and
680                 then 3 again to be output. By default, there is no  separator
681                 (but see the next option).
682
683       --om-separator=text
684                 Specify  a  separating string for multiple occurrences of -o.
685                 The default is an empty string. Separating strings are  never
686                 coloured.
687
688       -q, --quiet
689                 Work quietly, that is, display nothing except error messages.
690                 The exit status indicates whether or  not  any  matches  were
691                 found.
692
693       -r, --recursive
694                 If  any given path is a directory, recursively scan the files
695                 it contains, taking note of any --include and --exclude  set-
696                 tings.  By  default, a directory is read as a normal file; in
697                 some operating systems this gives an  immediate  end-of-file.
698                 This  option  is  a  shorthand  for  setting the -d option to
699                 "recurse".
700
701       --recursion-limit=number
702                 See --match-limit above.
703
704       -s, --no-messages
705                 Suppress error  messages  about  non-existent  or  unreadable
706                 files.  Such  files  are quietly skipped. However, the return
707                 code is still 2, even if matches were found in other files.
708
709       -t, --total-count
710                 This option is useful when scanning more than  one  file.  If
711                 used  on its own, -t suppresses all output except for a grand
712                 total number of matching lines (or non-matching lines  if  -v
713                 is  used)  in  all  the files. If -t is used with -c, a grand
714                 total is output except when the previous output is  just  one
715                 line.  In  other words, it is not output when just one file's
716                 count is listed. If file names are being  output,  the  grand
717                 total  is preceded by "TOTAL:". Otherwise, it appears as just
718                 another number. The -t option is ignored when  used  with  -L
719                 (list  files  without matches), because the grand total would
720                 always be zero.
721
722       -u, --utf-8
723                 Operate in UTF-8 mode. This option is available only if PCRE2
724                 has been compiled with UTF-8 support. All patterns (including
725                 those for any --exclude and --include options) and  all  sub-
726                 ject  lines  that  are scanned must be valid strings of UTF-8
727                 characters.
728
729       -V, --version
730                 Write the version numbers of pcre2grep and the PCRE2  library
731                 to  the  standard  output and then exit. Anything else on the
732                 command line is ignored.
733
734       -v, --invert-match
735                 Invert the sense of the match, so that  lines  which  do  not
736                 match any of the patterns are the ones that are found.
737
738       -w, --word-regex, --word-regexp
739                 Force the patterns only to match "words". That is, there must
740                 be a word boundary at the  start  and  end  of  each  matched
741                 string.  This is equivalent to having "\b(?:" at the start of
742                 each pattern, and ")\b" at the end. This option applies  only
743                 to  the  patterns  that  are  matched against the contents of
744                 files; it does not apply to patterns specified by any of  the
745                 --include or --exclude options.
746
747       -x, --line-regex, --line-regexp
748                 Force  the  patterns to start matching only at the beginnings
749                 of lines, and in  addition,  require  them  to  match  entire
750                 lines. In multiline mode the match may be more than one line.
751                 This is equivalent to having "^(?:" at the start of each pat-
752                 tern  and  ")$"  at  the end. This option applies only to the
753                 patterns that are matched against the contents of  files;  it
754                 does  not apply to patterns specified by any of the --include
755                 or --exclude options.
756
757
758ENVIRONMENT VARIABLES
759
760       The environment variables LC_ALL and LC_CTYPE  are  examined,  in  that
761       order,  for  a  locale.  The first one that is set is used. This can be
762       overridden by the --locale option. If  no  locale  is  set,  the  PCRE2
763       library's default (usually the "C" locale) is used.
764
765
766NEWLINES
767
768       The -N (--newline) option allows pcre2grep to scan files with different
769       newline conventions from the default. Any parts of the input files that
770       are  written  to the standard output are copied identically, with what-
771       ever newline sequences they have in the input. However, the setting  of
772       this  option  affects only the way scanned files are processed. It does
773       not affect the interpretation of files specified  by  the  -f,  --file-
774       list, --exclude-from, or --include-from options, nor does it affect the
775       way in which pcre2grep writes informational messages  to  the  standard
776       error and output streams. For these it uses the string "\n" to indicate
777       newlines, relying on the C I/O library to convert this to an  appropri-
778       ate sequence.
779
780
781OPTIONS COMPATIBILITY
782
783       Many of the short and long forms of pcre2grep's options are the same as
784       in the GNU grep program. Any long option of the form --xxx-regexp  (GNU
785       terminology) is also available as --xxx-regex (PCRE2 terminology). How-
786       ever, the  --depth-limit,  --file-list,  --file-offsets,  --heap-limit,
787       --include-dir,  --line-offsets,  --locale,  --match-limit, -M, --multi-
788       line, -N, --newline, --om-separator, --output, -u, and --utf-8  options
789       are  specific to pcre2grep, as is the use of the --only-matching option
790       with a capturing parentheses number.
791
792       Although most of the common options work the same way, a few  are  dif-
793       ferent  in pcre2grep. For example, the --include option's argument is a
794       glob for GNU grep, but a regular expression for pcre2grep. If both  the
795       -c  and  -l  options are given, GNU grep lists only file names, without
796       counts, but pcre2grep gives the counts as well.
797
798
799OPTIONS WITH DATA
800
801       There are four different ways in which an option with data can be spec-
802       ified.   If  a  short  form option is used, the data may follow immedi-
803       ately, or (with one exception) in the next command line item. For exam-
804       ple:
805
806         -f/some/file
807         -f /some/file
808
809       The  exception is the -o option, which may appear with or without data.
810       Because of this, if data is present, it must follow immediately in  the
811       same item, for example -o3.
812
813       If  a long form option is used, the data may appear in the same command
814       line item, separated by an equals character, or (with  two  exceptions)
815       it may appear in the next command line item. For example:
816
817         --file=/some/file
818         --file /some/file
819
820       Note,  however, that if you want to supply a file name beginning with ~
821       as data in a shell command, and have the  shell  expand  ~  to  a  home
822       directory, you must separate the file name from the option, because the
823       shell does not treat ~ specially unless it is at the start of an item.
824
825       The exceptions to the above are the --colour (or --color)  and  --only-
826       matching  options,  for  which  the  data  is optional. If one of these
827       options does have data, it must be given in the first  form,  using  an
828       equals character. Otherwise pcre2grep will assume that it has no data.
829
830
831USING PCRE2'S CALLOUT FACILITY
832
833       pcre2grep  has,  by  default,  support for calling external programs or
834       scripts or echoing specific strings during matching by  making  use  of
835       PCRE2's  callout  facility.  However, this support can be disabled when
836       pcre2grep is built. You can find out whether your  binary  has  support
837       for  callouts  by  running it with the --help option. If the support is
838       not enabled, all callouts in patterns are ignored by pcre2grep.
839
840       A callout in a PCRE2 pattern is of the form (?C<arg>) where  the  argu-
841       ment  is either a number or a quoted string (see the pcre2callout docu-
842       mentation for details). Numbered callouts  are  ignored  by  pcre2grep;
843       only callouts with string arguments are useful.
844
845   Calling external programs or scripts
846
847       If the callout string does not start with a pipe (vertical bar) charac-
848       ter, it is parsed into a list of substrings separated by  pipe  charac-
849       ters.  The first substring must be an executable name, with the follow-
850       ing substrings specifying arguments:
851
852         executable_name|arg1|arg2|...
853
854       Any substring  (including  the  executable  name)  may  contain  escape
855       sequences  started  by  a dollar character: $<digits> or ${<digits>} is
856       replaced by the captured substring of the given decimal  number,  which
857       must  be greater than zero. If the number is greater than the number of
858       capturing substrings, or if the capture is unset,  the  replacement  is
859       empty.
860
861       Any  other  character  is  substituted  by itself. In particular, $$ is
862       replaced by a single dollar and $| is replaced  by  a  pipe  character.
863       Here is an example:
864
865         echo -e "abcde\n12345" | pcre2grep \
866           '(?x)(.)(..(.))
867           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
868
869         Output:
870
871           Arg1: [a] [bcd] [d] Arg2: |a| ()
872           abcde
873           Arg1: [1] [234] [4] Arg2: |1| ()
874           12345
875
876       The parameters for the execv() system call that is used to run the pro-
877       gram or script are zero-terminated strings. This means that binary zero
878       characters  in the callout argument will cause premature termination of
879       their substrings, and therefore  should  not  be  present.  Any  syntax
880       errors  in  the  string  (for example, a dollar not followed by another
881       character) cause the callout to be  ignored.  If  running  the  program
882       fails for any reason (including the non-existence of the executable), a
883       local matching failure occurs and the matcher backtracks in the  normal
884       way.
885
886   Echoing a specific string
887
888       If  the callout string starts with a pipe (vertical bar) character, the
889       rest of the string is written to the output, having been passed through
890       the  same escape processing as text from the --output option. This pro-
891       vides a simple echoing facility that avoids calling an external program
892       or  script. No terminator is added to the string, so if you want a new-
893       line, you must include  it  explicitly.   Matching  continues  normally
894       after  the string is output. If you want to see only the callout output
895       but not any output from an actual match, you should  end  the  relevant
896       pattern with (*FAIL).
897
898
899MATCHING ERRORS
900
901       It  is  possible  to supply a regular expression that takes a very long
902       time to fail to match certain lines.  Such  patterns  normally  involve
903       nested  indefinite repeats, for example: (a+)*\d when matched against a
904       line of a's with no final digit. The  PCRE2  matching  function  has  a
905       resource  limit that causes it to abort in these circumstances. If this
906       happens, pcre2grep outputs an error message and the  line  that  caused
907       the  problem  to  the  standard error stream. If there are more than 20
908       such errors, pcre2grep gives up.
909
910       The --match-limit option of pcre2grep can be used to  set  the  overall
911       resource  limit.  There are also other limits that affect the amount of
912       memory used during matching; see the  discussion  of  --heap-limit  and
913       --depth-limit above.
914
915
916DIAGNOSTICS
917
918       Exit status is 0 if any matches were found, 1 if no matches were found,
919       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
920       files  (even if matches were found in other files) or too many matching
921       errors. Using the -s option to suppress error messages about inaccessi-
922       ble files does not affect the return code.
923
924       When   run  under  VMS,  the  return  code  is  placed  in  the  symbol
925       PCRE2GREP_RC because VMS  does  not  distinguish  between  exit(0)  and
926       exit(1).
927
928
929SEE ALSO
930
931       pcre2pattern(3), pcre2syntax(3), pcre2callout(3).
932
933
934AUTHOR
935
936       Philip Hazel
937       University Computing Service
938       Cambridge, England.
939
940
941REVISION
942
943       Last updated: 24 February 2018
944       Copyright (c) 1997-2018 University of Cambridge.
945