• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1README file for PCRE2 (Perl-compatible regular expression library)
2------------------------------------------------------------------
3
4PCRE2 is a re-working of the original PCRE library to provide an entirely new
5API. The latest release of PCRE2 is always available in three alternative
6formats from:
7
8  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
9  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
10  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
11
12There is a mailing list for discussion about the development of PCRE (both the
13original and new APIs) at pcre-dev@exim.org. You can access the archives and
14subscribe or manage your subscription here:
15
16   https://lists.exim.org/mailman/listinfo/pcre-dev
17
18Please read the NEWS file if you are upgrading from a previous release. The
19contents of this README file are:
20
21  The PCRE2 APIs
22  Documentation for PCRE2
23  Contributions by users of PCRE2
24  Building PCRE2 on non-Unix-like systems
25  Building PCRE2 without using autotools
26  Building PCRE2 using autotools
27  Retrieving configuration information
28  Shared libraries
29  Cross-compiling using autotools
30  Making new tarballs
31  Testing PCRE2
32  Character tables
33  File manifest
34
35
36The PCRE2 APIs
37--------------
38
39PCRE2 is written in C, and it has its own API. There are three sets of
40functions, one for the 8-bit library, which processes strings of bytes, one for
41the 16-bit library, which processes strings of 16-bit values, and one for the
4232-bit library, which processes strings of 32-bit values. There are no C++
43wrappers.
44
45The distribution does contain a set of C wrapper functions for the 8-bit
46library that are based on the POSIX regular expression API (see the pcre2posix
47man page). These can be found in a library called libpcre2-posix. Note that
48this just provides a POSIX calling interface to PCRE2; the regular expressions
49themselves still follow Perl syntax and semantics. The POSIX API is restricted,
50and does not give full access to all of PCRE2's facilities.
51
52The header file for the POSIX-style functions is called pcre2posix.h. The
53official POSIX name is regex.h, but I did not want to risk possible problems
54with existing files of that name by distributing it that way. To use PCRE2 with
55an existing program that uses the POSIX API, pcre2posix.h will have to be
56renamed or pointed at by a link.
57
58If you are using the POSIX interface to PCRE2 and there is already a POSIX
59regex library installed on your system, as well as worrying about the regex.h
60header file (as mentioned above), you must also take care when linking programs
61to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
62may pick up the POSIX functions of the same name from the other library.
63
64One way of avoiding this confusion is to compile PCRE2 with the addition of
65-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
66compiler flags (CFLAGS if you are using "configure" -- see below). This has the
67effect of renaming the functions so that the names no longer clash. Of course,
68you have to do the same thing for your applications, or write them using the
69new names.
70
71
72Documentation for PCRE2
73-----------------------
74
75If you install PCRE2 in the normal way on a Unix-like system, you will end up
76with a set of man pages whose names all start with "pcre2". The one that is
77just called "pcre2" lists all the others. In addition to these man pages, the
78PCRE2 documentation is supplied in two other forms:
79
80  1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
81     doc/pcre2test.txt in the source distribution. The first of these is a
82     concatenation of the text forms of all the section 3 man pages except the
83     listing of pcre2demo.c and those that summarize individual functions. The
84     other two are the text forms of the section 1 man pages for the pcre2grep
85     and pcre2test commands. These text forms are provided for ease of scanning
86     with text editors or similar tools. They are installed in
87     <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
88     (defaulting to /usr/local).
89
90  2. A set of files containing all the documentation in HTML form, hyperlinked
91     in various ways, and rooted in a file called index.html, is distributed in
92     doc/html and installed in <prefix>/share/doc/pcre2/html.
93
94
95Building PCRE2 on non-Unix-like systems
96---------------------------------------
97
98For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
99your system supports the use of "configure" and "make" you may be able to build
100PCRE2 using autotools in the same way as for many Unix-like systems.
101
102PCRE2 can also be configured using CMake, which can be run in various ways
103(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
104NON-AUTOTOOLS-BUILD has information about CMake.
105
106PCRE2 has been compiled on many different operating systems. It should be
107straightforward to build PCRE2 on any system that has a Standard C compiler and
108library, because it uses only Standard C functions.
109
110
111Building PCRE2 without using autotools
112--------------------------------------
113
114The use of autotools (in particular, libtool) is problematic in some
115environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
116file for ways of building PCRE2 without using autotools.
117
118
119Building PCRE2 using autotools
120------------------------------
121
122The following instructions assume the use of the widely used "configure; make;
123make install" (autotools) process.
124
125To build PCRE2 on system that supports autotools, first run the "configure"
126command from the PCRE2 distribution directory, with your current directory set
127to the directory where you want the files to be created. This command is a
128standard GNU "autoconf" configuration script, for which generic instructions
129are supplied in the file INSTALL.
130
131Most commonly, people build PCRE2 within its own distribution directory, and in
132this case, on many systems, just running "./configure" is sufficient. However,
133the usual methods of changing standard defaults are available. For example:
134
135CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
136
137This command specifies that the C compiler should be run with the flags '-O2
138-Wall' instead of the default, and that "make install" should install PCRE2
139under /opt/local instead of the default /usr/local.
140
141If you want to build in a different directory, just run "configure" with that
142directory as current. For example, suppose you have unpacked the PCRE2 source
143into /source/pcre2/pcre2-xxx, but you want to build it in
144/build/pcre2/pcre2-xxx:
145
146cd /build/pcre2/pcre2-xxx
147/source/pcre2/pcre2-xxx/configure
148
149PCRE2 is written in C and is normally compiled as a C library. However, it is
150possible to build it as a C++ library, though the provided building apparatus
151does not have any features to support this.
152
153There are some optional features that can be included or omitted from the PCRE2
154library. They are also documented in the pcre2build man page.
155
156. By default, both shared and static libraries are built. You can change this
157  by adding one of these options to the "configure" command:
158
159  --disable-shared
160  --disable-static
161
162  (See also "Shared libraries on Unix-like systems" below.)
163
164. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
165  the "configure" command, the 16-bit library is also built. If you add
166  --enable-pcre2-32 to the "configure" command, the 32-bit library is also
167  built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
168  to disable building the 8-bit library.
169
170. If you want to include support for just-in-time (JIT) compiling, which can
171  give large performance improvements on certain platforms, add --enable-jit to
172  the "configure" command. This support is available only for certain hardware
173  architectures. If you try to enable it on an unsupported architecture, there
174  will be a compile time error. If in doubt, use --enable-jit=auto, which
175  enables JIT only if the current hardware is supported.
176
177. If you are enabling JIT under SELinux you may also want to add
178  --enable-jit-sealloc, which enables the use of an execmem allocator in JIT
179  that is compatible with SELinux. This has no effect if JIT is not enabled.
180
181. If you do not want to make use of the default support for UTF-8 Unicode
182  character strings in the 8-bit library, UTF-16 Unicode character strings in
183  the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
184  library, you can add --disable-unicode to the "configure" command. This
185  reduces the size of the libraries. It is not possible to configure one
186  library with Unicode support, and another without, in the same configuration.
187  It is also not possible to use --enable-ebcdic (see below) with Unicode
188  support, so if this option is set, you must also use --disable-unicode.
189
190  When Unicode support is available, the use of a UTF encoding still has to be
191  enabled by setting the PCRE2_UTF option at run time or starting a pattern
192  with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
193  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
194
195  As well as supporting UTF strings, Unicode support includes support for the
196  \P, \p, and \X sequences that recognize Unicode character properties.
197  However, only the basic two-letter properties such as Lu are supported.
198  Escape sequences such as \d and \w in patterns do not by default make use of
199  Unicode properties, but can be made to do so by setting the PCRE2_UCP option
200  or starting a pattern with (*UCP).
201
202. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
203  of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
204  character as indicating the end of a line. Whatever you specify at build time
205  is the default; the caller of PCRE2 can change the selection at run time. The
206  default newline indicator is a single LF character (the Unix standard). You
207  can specify the default newline indicator by adding --enable-newline-is-cr,
208  --enable-newline-is-lf, --enable-newline-is-crlf,
209  --enable-newline-is-anycrlf, --enable-newline-is-any, or
210  --enable-newline-is-nul to the "configure" command, respectively.
211
212. By default, the sequence \R in a pattern matches any Unicode line ending
213  sequence. This is independent of the option specifying what PCRE2 considers
214  to be the end of a line (see above). However, the caller of PCRE2 can
215  restrict \R to match only CR, LF, or CRLF. You can make this the default by
216  adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
217
218. In a pattern, the escape sequence \C matches a single code unit, even in a
219  UTF mode. This can be dangerous because it breaks up multi-code-unit
220  characters. You can build PCRE2 with the use of \C permanently locked out by
221  adding --enable-never-backslash-C (note the upper case C) to the "configure"
222  command. When \C is allowed by the library, individual applications can lock
223  it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
224
225. PCRE2 has a counter that limits the depth of nesting of parentheses in a
226  pattern. This limits the amount of system stack that a pattern uses when it
227  is compiled. The default is 250, but you can change it by setting, for
228  example,
229
230  --with-parens-nest-limit=500
231
232. PCRE2 has a counter that can be set to limit the amount of computing resource
233  it uses when matching a pattern. If the limit is exceeded during a match, the
234  match fails. The default is ten million. You can change the default by
235  setting, for example,
236
237  --with-match-limit=500000
238
239  on the "configure" command. This is just the default; individual calls to
240  pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
241  discussion in the pcre2api man page (search for pcre2_set_match_limit).
242
243. There is a separate counter that limits the depth of nested backtracking
244  (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
245  matching process, which indirectly limits the amount of heap memory that is
246  used, and in the case of pcre2_dfa_match() the amount of stack as well. This
247  counter also has a default of ten million, which is essentially "unlimited".
248  You can change the default by setting, for example,
249
250  --with-match-limit-depth=5000
251
252  There is more discussion in the pcre2api man page (search for
253  pcre2_set_depth_limit).
254
255. You can also set an explicit limit on the amount of heap memory used by
256  the pcre2_match() and pcre2_dfa_match() interpreters:
257
258  --with-heap-limit=500
259
260  The units are kibibytes (units of 1024 bytes). This limit does not apply when
261  the JIT optimization (which has its own memory control features) is used.
262  There is more discussion on the pcre2api man page (search for
263  pcre2_set_heap_limit).
264
265. In the 8-bit library, the default maximum compiled pattern size is around
266  64 kibibytes. You can increase this by adding --with-link-size=3 to the
267  "configure" command. PCRE2 then uses three bytes instead of two for offsets
268  to different parts of the compiled pattern. In the 16-bit library,
269  --with-link-size=3 is the same as --with-link-size=4, which (in both
270  libraries) uses four-byte offsets. Increasing the internal link size reduces
271  performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
272  link size setting is ignored, as 4-byte offsets are always used.
273
274. For speed, PCRE2 uses four tables for manipulating and identifying characters
275  whose code point values are less than 256. By default, it uses a set of
276  tables for ASCII encoding that is part of the distribution. If you specify
277
278  --enable-rebuild-chartables
279
280  a program called dftables is compiled and run in the default C locale when
281  you obey "make". It builds a source file called pcre2_chartables.c. If you do
282  not specify this option, pcre2_chartables.c is created as a copy of
283  pcre2_chartables.c.dist. See "Character tables" below for further
284  information.
285
286. It is possible to compile PCRE2 for use on systems that use EBCDIC as their
287  character code (as opposed to ASCII/Unicode) by specifying
288
289  --enable-ebcdic --disable-unicode
290
291  This automatically implies --enable-rebuild-chartables (see above). However,
292  when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
293  both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
294  which specifies that the code value for the EBCDIC NL character is 0x25
295  instead of the default 0x15.
296
297. If you specify --enable-debug, additional debugging code is included in the
298  build. This option is intended for use by the PCRE2 maintainers.
299
300. In environments where valgrind is installed, if you specify
301
302  --enable-valgrind
303
304  PCRE2 will use valgrind annotations to mark certain memory regions as
305  unaddressable. This allows it to detect invalid memory accesses, and is
306  mostly useful for debugging PCRE2 itself.
307
308. In environments where the gcc compiler is used and lcov version 1.6 or above
309  is installed, if you specify
310
311  --enable-coverage
312
313  the build process implements a code coverage report for the test suite. The
314  report is generated by running "make coverage". If ccache is installed on
315  your system, it must be disabled when building PCRE2 for coverage reporting.
316  You can do this by setting the environment variable CCACHE_DISABLE=1 before
317  running "make" to build PCRE2. There is more information about coverage
318  reporting in the "pcre2build" documentation.
319
320. When JIT support is enabled, pcre2grep automatically makes use of it, unless
321  you add --disable-pcre2grep-jit to the "configure" command.
322
323. There is support for calling external programs during matching in the
324  pcre2grep command, using PCRE2's callout facility with string arguments. This
325  support can be disabled by adding --disable-pcre2grep-callout to the
326  "configure" command.
327
328. The pcre2grep program currently supports only 8-bit data files, and so
329  requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
330  libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
331  specifying one or both of
332
333  --enable-pcre2grep-libz
334  --enable-pcre2grep-libbz2
335
336  Of course, the relevant libraries must be installed on your system.
337
338. The default starting size (in bytes) of the internal buffer used by pcre2grep
339  can be set by, for example:
340
341  --with-pcre2grep-bufsize=51200
342
343  The value must be a plain integer. The default is 20480. The amount of memory
344  used by pcre2grep is actually three times this number, to allow for "before"
345  and "after" lines. If very long lines are encountered, the buffer is
346  automatically enlarged, up to a fixed maximum size.
347
348. The default maximum size of pcre2grep's internal buffer can be set by, for
349  example:
350
351  --with-pcre2grep-max-bufsize=2097152
352
353  The default is either 1048576 or the value of --with-pcre2grep-bufsize,
354  whichever is the larger.
355
356. It is possible to compile pcre2test so that it links with the libreadline
357  or libedit libraries, by specifying, respectively,
358
359  --enable-pcre2test-libreadline or --enable-pcre2test-libedit
360
361  If this is done, when pcre2test's input is from a terminal, it reads it using
362  the readline() function. This provides line-editing and history facilities.
363  Note that libreadline is GPL-licenced, so if you distribute a binary of
364  pcre2test linked in this way, there may be licensing issues. These can be
365  avoided by linking with libedit (which has a BSD licence) instead.
366
367  Enabling libreadline causes the -lreadline option to be added to the
368  pcre2test build. In many operating environments with a sytem-installed
369  readline library this is sufficient. However, in some environments (e.g. if
370  an unmodified distribution version of readline is in use), it may be
371  necessary to specify something like LIBS="-lncurses" as well. This is
372  because, to quote the readline INSTALL, "Readline uses the termcap functions,
373  but does not link with the termcap or curses library itself, allowing
374  applications which link with readline the to choose an appropriate library."
375  If you get error messages about missing functions tgetstr, tgetent, tputs,
376  tgetflag, or tgoto, this is the problem, and linking with the ncurses library
377  should fix it.
378
379. There is a special option called --enable-fuzz-support for use by people who
380  want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
381  library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
382  be built, but not installed. This contains a single function called
383  LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
384  length of the string. When called, this function tries to compile the string
385  as a pattern, and if that succeeds, to match it. This is done both with no
386  options and with some random options bits that are generated from the string.
387  Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
388  be created. This is normally run under valgrind or used when PCRE2 is
389  compiled with address sanitizing enabled. It calls the fuzzing function and
390  outputs information about it is doing. The input strings are specified by
391  arguments: if an argument starts with "=" the rest of it is a literal input
392  string. Otherwise, it is assumed to be a file name, and the contents of the
393  file are the test string.
394
395. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
396  which caused pcre2_match() to use individual blocks on the heap for
397  backtracking instead of recursive function calls (which use the stack). This
398  is now obsolete since pcre2_match() was refactored always to use the heap (in
399  a much more efficient way than before). This option is retained for backwards
400  compatibility, but has no effect other than to output a warning.
401
402The "configure" script builds the following files for the basic C library:
403
404. Makefile             the makefile that builds the library
405. src/config.h         build-time configuration options for the library
406. src/pcre2.h          the public PCRE2 header file
407. pcre2-config          script that shows the building settings such as CFLAGS
408                         that were set for "configure"
409. libpcre2-8.pc        )
410. libpcre2-16.pc       ) data for the pkg-config command
411. libpcre2-32.pc       )
412. libpcre2-posix.pc    )
413. libtool              script that builds shared and/or static libraries
414
415Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
416tarballs under the names config.h.generic and pcre2.h.generic. These are
417provided for those who have to build PCRE2 without using "configure" or CMake.
418If you use "configure" or CMake, the .generic versions are not used.
419
420The "configure" script also creates config.status, which is an executable
421script that can be run to recreate the configuration, and config.log, which
422contains compiler output from tests that "configure" runs.
423
424Once "configure" has run, you can run "make". This builds whichever of the
425libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
426program called pcre2test. If you enabled JIT support with --enable-jit, another
427test program called pcre2_jit_test is built as well. If the 8-bit library is
428built, libpcre2-posix and the pcre2grep command are also built. Running
429"make" with the -j option may speed up compilation on multiprocessor systems.
430
431The command "make check" runs all the appropriate tests. Details of the PCRE2
432tests are given below in a separate section of this document. The -j option of
433"make" can also be used when running the tests.
434
435You can use "make install" to install PCRE2 into live directories on your
436system. The following are installed (file names are all relative to the
437<prefix> that is set when "configure" is run):
438
439  Commands (bin):
440    pcre2test
441    pcre2grep (if 8-bit support is enabled)
442    pcre2-config
443
444  Libraries (lib):
445    libpcre2-8      (if 8-bit support is enabled)
446    libpcre2-16     (if 16-bit support is enabled)
447    libpcre2-32     (if 32-bit support is enabled)
448    libpcre2-posix  (if 8-bit support is enabled)
449
450  Configuration information (lib/pkgconfig):
451    libpcre2-8.pc
452    libpcre2-16.pc
453    libpcre2-32.pc
454    libpcre2-posix.pc
455
456  Header files (include):
457    pcre2.h
458    pcre2posix.h
459
460  Man pages (share/man/man{1,3}):
461    pcre2grep.1
462    pcre2test.1
463    pcre2-config.1
464    pcre2.3
465    pcre2*.3 (lots more pages, all starting "pcre2")
466
467  HTML documentation (share/doc/pcre2/html):
468    index.html
469    *.html (lots more pages, hyperlinked from index.html)
470
471  Text file documentation (share/doc/pcre2):
472    AUTHORS
473    COPYING
474    ChangeLog
475    LICENCE
476    NEWS
477    README
478    pcre2.txt         (a concatenation of the man(3) pages)
479    pcre2test.txt     the pcre2test man page
480    pcre2grep.txt     the pcre2grep man page
481    pcre2-config.txt  the pcre2-config man page
482
483If you want to remove PCRE2 from your system, you can run "make uninstall".
484This removes all the files that "make install" installed. However, it does not
485remove any directories, because these are often shared with other programs.
486
487
488Retrieving configuration information
489------------------------------------
490
491Running "make install" installs the command pcre2-config, which can be used to
492recall information about the PCRE2 configuration and installation. For example:
493
494  pcre2-config --version
495
496prints the version number, and
497
498  pcre2-config --libs8
499
500outputs information about where the 8-bit library is installed. This command
501can be included in makefiles for programs that use PCRE2, saving the programmer
502from having to remember too many details. Run pcre2-config with no arguments to
503obtain a list of possible arguments.
504
505The pkg-config command is another system for saving and retrieving information
506about installed libraries. Instead of separate commands for each library, a
507single command is used. For example:
508
509  pkg-config --libs libpcre2-16
510
511The data is held in *.pc files that are installed in a directory called
512<prefix>/lib/pkgconfig.
513
514
515Shared libraries
516----------------
517
518The default distribution builds PCRE2 as shared libraries and static libraries,
519as long as the operating system supports shared libraries. Shared library
520support relies on the "libtool" script which is built as part of the
521"configure" process.
522
523The libtool script is used to compile and link both shared and static
524libraries. They are placed in a subdirectory called .libs when they are newly
525built. The programs pcre2test and pcre2grep are built to use these uninstalled
526libraries (by means of wrapper scripts in the case of shared libraries). When
527you use "make install" to install shared libraries, pcre2grep and pcre2test are
528automatically re-built to use the newly installed shared libraries before being
529installed themselves. However, the versions left in the build directory still
530use the uninstalled libraries.
531
532To build PCRE2 using static libraries only you must use --disable-shared when
533configuring it. For example:
534
535./configure --prefix=/usr/gnu --disable-shared
536
537Then run "make" in the usual way. Similarly, you can use --disable-static to
538build only shared libraries.
539
540
541Cross-compiling using autotools
542-------------------------------
543
544You can specify CC and CFLAGS in the normal way to the "configure" command, in
545order to cross-compile PCRE2 for some other host. However, you should NOT
546specify --enable-rebuild-chartables, because if you do, the dftables.c source
547file is compiled and run on the local host, in order to generate the inbuilt
548character tables (the pcre2_chartables.c file). This will probably not work,
549because dftables.c needs to be compiled with the local compiler, not the cross
550compiler.
551
552When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
553created by making a copy of pcre2_chartables.c.dist, which is a default set of
554tables that assumes ASCII code. Cross-compiling with the default tables should
555not be a problem.
556
557If you need to modify the character tables when cross-compiling, you should
558move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
559and run it on the local host to make a new version of pcre2_chartables.c.dist.
560Then when you cross-compile PCRE2 this new version of the tables will be used.
561
562
563Making new tarballs
564-------------------
565
566The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
567zip formats. The command "make distcheck" does the same, but then does a trial
568build of the new distribution to ensure that it works.
569
570If you have modified any of the man page sources in the doc directory, you
571should first run the PrepareRelease script before making a distribution. This
572script creates the .txt and HTML forms of the documentation from the man pages.
573
574
575Testing PCRE2
576-------------
577
578To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
579There is another script called RunGrepTest that tests the pcre2grep command.
580When JIT support is enabled, a third test program called pcre2_jit_test is
581built. Both the scripts and all the program tests are run if you obey "make
582check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
583
584The RunTest script runs the pcre2test test program (which is documented in its
585own man page) on each of the relevant testinput files in the testdata
586directory, and compares the output with the contents of the corresponding
587testoutput files. RunTest uses a file called testtry to hold the main output
588from pcre2test. Other files whose names begin with "test" are used as working
589files in some tests.
590
591Some tests are relevant only when certain build-time options were selected. For
592example, the tests for UTF-8/16/32 features are run only when Unicode support
593is available. RunTest outputs a comment when it skips a test.
594
595Many (but not all) of the tests that are not skipped are run twice if JIT
596support is available. On the second run, JIT compilation is forced. This
597testing can be suppressed by putting "nojit" on the RunTest command line.
598
599The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
600libraries that are enabled. If you want to run just one set of tests, call
601RunTest with either the -8, -16 or -32 option.
602
603If valgrind is installed, you can run the tests under it by putting "valgrind"
604on the RunTest command line. To run pcre2test on just one or more specific test
605files, give their numbers as arguments to RunTest, for example:
606
607  RunTest 2 7 11
608
609You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
610end), or a number preceded by ~ to exclude a test. For example:
611
612  Runtest 3-15 ~10
613
614This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
615except test 13. Whatever order the arguments are in, the tests are always run
616in numerical order.
617
618You can also call RunTest with the single argument "list" to cause it to output
619a list of tests.
620
621The test sequence starts with "test 0", which is a special test that has no
622input file, and whose output is not checked. This is because it will be
623different on different hardware and with different configurations. The test
624exists in order to exercise some of pcre2test's code that would not otherwise
625be run.
626
627Tests 1 and 2 can always be run, as they expect only plain text strings (not
628UTF) and make no use of Unicode properties. The first test file can be fed
629directly into the perltest.sh script to check that Perl gives the same results.
630The only difference you should see is in the first few lines, where the Perl
631version is given instead of the PCRE2 version. The second set of tests check
632auxiliary functions, error detection, and run-time flags that are specific to
633PCRE2. It also uses the debugging flags to check some of the internals of
634pcre2_compile().
635
636If you build PCRE2 with a locale setting that is not the standard C locale, the
637character tables may be different (see next paragraph). In some cases, this may
638cause failures in the second set of tests. For example, in a locale where the
639isprint() function yields TRUE for characters in the range 128-255, the use of
640[:isascii:] inside a character class defines a different set of characters, and
641this shows up in this test as a difference in the compiled code, which is being
642listed for checking. For example, where the comparison test output contains
643[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
644cases. This is not a bug in PCRE2.
645
646Test 3 checks pcre2_maketables(), the facility for building a set of character
647tables for a specific locale and using them instead of the default tables. The
648script uses the "locale" command to check for the availability of the "fr_FR",
649"french", or "fr" locale, and uses the first one that it finds. If the "locale"
650command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
651the list of available locales, the third test cannot be run, and a comment is
652output to say why. If running this test produces an error like this:
653
654  ** Failed to set locale "fr_FR"
655
656it means that the given locale is not available on your system, despite being
657listed by "locale". This does not mean that PCRE2 is broken. There are three
658alternative output files for the third test, because three different versions
659of the French locale have been encountered. The test passes if its output
660matches any one of them.
661
662Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
663with the perltest.sh script, and test 5 checking PCRE2-specific things.
664
665Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
666non-UTF mode and UTF-mode with Unicode property support, respectively.
667
668Test 8 checks some internal offsets and code size features, but it is run only
669when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
67032-bit modes and for different link sizes, so there are different output files
671for each mode and link size.
672
673Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
67416-bit and 32-bit modes. These are tests that generate different output in
6758-bit mode. Each pair are for general cases and Unicode support, respectively.
676
677Test 13 checks the handling of non-UTF characters greater than 255 by
678pcre2_dfa_match() in 16-bit and 32-bit modes.
679
680Test 14 contains some special UTF and UCP tests that give different output for
681different code unit widths.
682
683Test 15 contains a number of tests that must not be run with JIT. They check,
684among other non-JIT things, the match-limiting features of the intepretive
685matcher.
686
687Test 16 is run only when JIT support is not available. It checks that an
688attempt to use JIT has the expected behaviour.
689
690Test 17 is run only when JIT support is available. It checks JIT complete and
691partial modes, match-limiting under JIT, and other JIT-specific features.
692
693Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
694the 8-bit library, without and with Unicode support, respectively.
695
696Test 20 checks the serialization functions by writing a set of compiled
697patterns to a file, and then reloading and checking them.
698
699Tests 21 and 22 test \C support when the use of \C is not locked out, without
700and with UTF support, respectively. Test 23 tests \C when it is locked out.
701
702Tests 24 and 25 test the experimental pattern conversion functions, without and
703with UTF support, respectively.
704
705
706Character tables
707----------------
708
709For speed, PCRE2 uses four tables for manipulating and identifying characters
710whose code point values are less than 256. By default, a set of tables that is
711built into the library is used. The pcre2_maketables() function can be called
712by an application to create a new set of tables in the current locale. This are
713passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
714compile context.
715
716The source file called pcre2_chartables.c contains the default set of tables.
717By default, this is created as a copy of pcre2_chartables.c.dist, which
718contains tables for ASCII coding. However, if --enable-rebuild-chartables is
719specified for ./configure, a different version of pcre2_chartables.c is built
720by the program dftables (compiled from dftables.c), which uses the ANSI C
721character handling functions such as isalnum(), isalpha(), isupper(),
722islower(), etc. to build the table sources. This means that the default C
723locale that is set for your system will control the contents of these default
724tables. You can change the default tables by editing pcre2_chartables.c and
725then re-building PCRE2. If you do this, you should take care to ensure that the
726file does not get automatically re-generated. The best way to do this is to
727move pcre2_chartables.c.dist out of the way and replace it with your customized
728tables.
729
730When the dftables program is run as a result of --enable-rebuild-chartables,
731it uses the default C locale that is set on your system. It does not pay
732attention to the LC_xxx environment variables. In other words, it uses the
733system's default locale rather than whatever the compiling user happens to have
734set. If you really do want to build a source set of character tables in a
735locale that is specified by the LC_xxx variables, you can run the dftables
736program by hand with the -L option. For example:
737
738  ./dftables -L pcre2_chartables.c.special
739
740The first two 256-byte tables provide lower casing and case flipping functions,
741respectively. The next table consists of three 32-byte bit maps which identify
742digits, "word" characters, and white space, respectively. These are used when
743building 32-byte bit maps that represent character classes for code points less
744than 256. The final 256-byte table has bits indicating various character types,
745as follows:
746
747    1   white space character
748    2   letter
749    4   decimal digit
750    8   hexadecimal digit
751   16   alphanumeric or '_'
752  128   regular expression metacharacter or binary zero
753
754You should not alter the set of characters that contain the 128 bit, as that
755will cause PCRE2 to malfunction.
756
757
758File manifest
759-------------
760
761The distribution should contain the files listed below.
762
763(A) Source files for the PCRE2 library functions and their headers are found in
764    the src directory:
765
766  src/dftables.c           auxiliary program for building pcre2_chartables.c
767                           when --enable-rebuild-chartables is specified
768
769  src/pcre2_chartables.c.dist  a default set of character tables that assume
770                           ASCII coding; unless --enable-rebuild-chartables is
771                           specified, used by copying to pcre2_chartables.c
772
773  src/pcre2posix.c         )
774  src/pcre2_auto_possess.c )
775  src/pcre2_compile.c      )
776  src/pcre2_config.c       )
777  src/pcre2_context.c      )
778  src/pcre2_convert.c      )
779  src/pcre2_dfa_match.c    )
780  src/pcre2_error.c        )
781  src/pcre2_extuni.c       )
782  src/pcre2_find_bracket.c )
783  src/pcre2_jit_compile.c  )
784  src/pcre2_jit_match.c    ) sources for the functions in the library,
785  src/pcre2_jit_misc.c     )   and some internal functions that they use
786  src/pcre2_maketables.c   )
787  src/pcre2_match.c        )
788  src/pcre2_match_data.c   )
789  src/pcre2_newline.c      )
790  src/pcre2_ord2utf.c      )
791  src/pcre2_pattern_info.c )
792  src/pcre2_serialize.c    )
793  src/pcre2_string_utils.c )
794  src/pcre2_study.c        )
795  src/pcre2_substitute.c   )
796  src/pcre2_substring.c    )
797  src/pcre2_tables.c       )
798  src/pcre2_ucd.c          )
799  src/pcre2_valid_utf.c    )
800  src/pcre2_xclass.c       )
801
802  src/pcre2_printint.c     debugging function that is used by pcre2test,
803  src/pcre2_fuzzsupport.c  function for (optional) fuzzing support
804
805  src/config.h.in          template for config.h, when built by "configure"
806  src/pcre2.h.in           template for pcre2.h when built by "configure"
807  src/pcre2posix.h         header for the external POSIX wrapper API
808  src/pcre2_internal.h     header for internal use
809  src/pcre2_intmodedep.h   a mode-specific internal header
810  src/pcre2_ucp.h          header for Unicode property handling
811
812  sljit/*                  source files for the JIT compiler
813
814(B) Source files for programs that use PCRE2:
815
816  src/pcre2demo.c          simple demonstration of coding calls to PCRE2
817  src/pcre2grep.c          source of a grep utility that uses PCRE2
818  src/pcre2test.c          comprehensive test program
819  src/pcre2_jit_test.c     JIT test program
820
821(C) Auxiliary files:
822
823  132html                  script to turn "man" pages into HTML
824  AUTHORS                  information about the author of PCRE2
825  ChangeLog                log of changes to the code
826  CleanTxt                 script to clean nroff output for txt man pages
827  Detrail                  script to remove trailing spaces
828  HACKING                  some notes about the internals of PCRE2
829  INSTALL                  generic installation instructions
830  LICENCE                  conditions for the use of PCRE2
831  COPYING                  the same, using GNU's standard name
832  Makefile.in              ) template for Unix Makefile, which is built by
833                           )   "configure"
834  Makefile.am              ) the automake input that was used to create
835                           )   Makefile.in
836  NEWS                     important changes in this release
837  NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
838  PrepareRelease           script to make preparations for "make dist"
839  README                   this file
840  RunTest                  a Unix shell script for running tests
841  RunGrepTest              a Unix shell script for pcre2grep tests
842  aclocal.m4               m4 macros (generated by "aclocal")
843  config.guess             ) files used by libtool,
844  config.sub               )   used only when building a shared library
845  configure                a configuring shell script (built by autoconf)
846  configure.ac             ) the autoconf input that was used to build
847                           )   "configure" and config.h
848  depcomp                  ) script to find program dependencies, generated by
849                           )   automake
850  doc/*.3                  man page sources for PCRE2
851  doc/*.1                  man page sources for pcre2grep and pcre2test
852  doc/index.html.src       the base HTML page
853  doc/html/*               HTML documentation
854  doc/pcre2.txt            plain text version of the man pages
855  doc/pcre2test.txt        plain text documentation of test program
856  install-sh               a shell script for installing files
857  libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
858  libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
859  libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
860  libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
861  ltmain.sh                file used to build a libtool script
862  missing                  ) common stub for a few missing GNU programs while
863                           )   installing, generated by automake
864  mkinstalldirs            script for making install directories
865  perltest.sh              Script for running a Perl test program
866  pcre2-config.in          source of script which retains PCRE2 information
867  testdata/testinput*      test data for main library tests
868  testdata/testoutput*     expected test results
869  testdata/grep*           input and output for pcre2grep tests
870  testdata/*               other supporting test files
871
872(D) Auxiliary files for cmake support
873
874  cmake/COPYING-CMAKE-SCRIPTS
875  cmake/FindPackageHandleStandardArgs.cmake
876  cmake/FindEditline.cmake
877  cmake/FindReadline.cmake
878  CMakeLists.txt
879  config-cmake.h.in
880
881(E) Auxiliary files for building PCRE2 "by hand"
882
883  src/pcre2.h.generic     ) a version of the public PCRE2 header file
884                          )   for use in non-"configure" environments
885  src/config.h.generic    ) a version of config.h for use in non-"configure"
886                          )   environments
887
888Philip Hazel
889Email local part: ph10
890Email domain: cam.ac.uk
891Last updated: 17 June 2018
892