1README file for PCRE2 (Perl-compatible regular expression library)
2------------------------------------------------------------------
3
4PCRE2 is a re-working of the original PCRE1 library to provide an entirely new
5API. Since its initial release in 2015, there has been further development of
6the code and it now differs from PCRE1 in more than just the API. There are new
7features and the internals have been improved. The latest release of PCRE2 is
8always available in three alternative formats from:
9
10 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
11 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
12 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
13
14There is a mailing list for discussion about the development of PCRE (both the
15original and new APIs) at pcre-dev@exim.org. You can access the archives and
16subscribe or manage your subscription here:
17
18 https://lists.exim.org/mailman/listinfo/pcre-dev
19
20Please read the NEWS file if you are upgrading from a previous release. The
21contents of this README file are:
22
23 The PCRE2 APIs
24 Documentation for PCRE2
25 Contributions by users of PCRE2
26 Building PCRE2 on non-Unix-like systems
27 Building PCRE2 without using autotools
28 Building PCRE2 using autotools
29 Retrieving configuration information
30 Shared libraries
31 Cross-compiling using autotools
32 Making new tarballs
33 Testing PCRE2
34 Character tables
35 File manifest
36
37
38The PCRE2 APIs
39--------------
40
41PCRE2 is written in C, and it has its own API. There are three sets of
42functions, one for the 8-bit library, which processes strings of bytes, one for
43the 16-bit library, which processes strings of 16-bit values, and one for the
4432-bit library, which processes strings of 32-bit values. Unlike PCRE1, there
45are no C++ wrappers.
46
47The distribution does contain a set of C wrapper functions for the 8-bit
48library that are based on the POSIX regular expression API (see the pcre2posix
49man page). These are built into a library called libpcre2-posix. Note that this
50just provides a POSIX calling interface to PCRE2; the regular expressions
51themselves still follow Perl syntax and semantics. The POSIX API is restricted,
52and does not give full access to all of PCRE2's facilities.
53
54The header file for the POSIX-style functions is called pcre2posix.h. The
55official POSIX name is regex.h, but I did not want to risk possible problems
56with existing files of that name by distributing it that way. To use PCRE2 with
57an existing program that uses the POSIX API, pcre2posix.h will have to be
58renamed or pointed at by a link (or the program modified, of course). See the
59pcre2posix documentation for more details.
60
61
62Documentation for PCRE2
63-----------------------
64
65If you install PCRE2 in the normal way on a Unix-like system, you will end up
66with a set of man pages whose names all start with "pcre2". The one that is
67just called "pcre2" lists all the others. In addition to these man pages, the
68PCRE2 documentation is supplied in two other forms:
69
70 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
71 doc/pcre2test.txt in the source distribution. The first of these is a
72 concatenation of the text forms of all the section 3 man pages except the
73 listing of pcre2demo.c and those that summarize individual functions. The
74 other two are the text forms of the section 1 man pages for the pcre2grep
75 and pcre2test commands. These text forms are provided for ease of scanning
76 with text editors or similar tools. They are installed in
77 <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
78 (defaulting to /usr/local).
79
80 2. A set of files containing all the documentation in HTML form, hyperlinked
81 in various ways, and rooted in a file called index.html, is distributed in
82 doc/html and installed in <prefix>/share/doc/pcre2/html.
83
84
85Building PCRE2 on non-Unix-like systems
86---------------------------------------
87
88For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if
89your system supports the use of "configure" and "make" you may be able to build
90PCRE2 using autotools in the same way as for many Unix-like systems.
91
92PCRE2 can also be configured using CMake, which can be run in various ways
93(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
94NON-AUTOTOOLS-BUILD has information about CMake.
95
96PCRE2 has been compiled on many different operating systems. It should be
97straightforward to build PCRE2 on any system that has a Standard C compiler and
98library, because it uses only Standard C functions.
99
100
101Building PCRE2 without using autotools
102--------------------------------------
103
104The use of autotools (in particular, libtool) is problematic in some
105environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
106file for ways of building PCRE2 without using autotools.
107
108
109Building PCRE2 using autotools
110------------------------------
111
112The following instructions assume the use of the widely used "configure; make;
113make install" (autotools) process.
114
115To build PCRE2 on system that supports autotools, first run the "configure"
116command from the PCRE2 distribution directory, with your current directory set
117to the directory where you want the files to be created. This command is a
118standard GNU "autoconf" configuration script, for which generic instructions
119are supplied in the file INSTALL.
120
121Most commonly, people build PCRE2 within its own distribution directory, and in
122this case, on many systems, just running "./configure" is sufficient. However,
123the usual methods of changing standard defaults are available. For example:
124
125CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
126
127This command specifies that the C compiler should be run with the flags '-O2
128-Wall' instead of the default, and that "make install" should install PCRE2
129under /opt/local instead of the default /usr/local.
130
131If you want to build in a different directory, just run "configure" with that
132directory as current. For example, suppose you have unpacked the PCRE2 source
133into /source/pcre2/pcre2-xxx, but you want to build it in
134/build/pcre2/pcre2-xxx:
135
136cd /build/pcre2/pcre2-xxx
137/source/pcre2/pcre2-xxx/configure
138
139PCRE2 is written in C and is normally compiled as a C library. However, it is
140possible to build it as a C++ library, though the provided building apparatus
141does not have any features to support this.
142
143There are some optional features that can be included or omitted from the PCRE2
144library. They are also documented in the pcre2build man page.
145
146. By default, both shared and static libraries are built. You can change this
147 by adding one of these options to the "configure" command:
148
149 --disable-shared
150 --disable-static
151
152 (See also "Shared libraries on Unix-like systems" below.)
153
154. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
155 the "configure" command, the 16-bit library is also built. If you add
156 --enable-pcre2-32 to the "configure" command, the 32-bit library is also
157 built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
158 to disable building the 8-bit library.
159
160. If you want to include support for just-in-time (JIT) compiling, which can
161 give large performance improvements on certain platforms, add --enable-jit to
162 the "configure" command. This support is available only for certain hardware
163 architectures. If you try to enable it on an unsupported architecture, there
164 will be a compile time error. If in doubt, use --enable-jit=auto, which
165 enables JIT only if the current hardware is supported.
166
167. If you are enabling JIT under SELinux you may also want to add
168 --enable-jit-sealloc, which enables the use of an execmem allocator in JIT
169 that is compatible with SELinux. This has no effect if JIT is not enabled.
170
171. If you do not want to make use of the default support for UTF-8 Unicode
172 character strings in the 8-bit library, UTF-16 Unicode character strings in
173 the 16-bit library, or UTF-32 Unicode character strings in the 32-bit
174 library, you can add --disable-unicode to the "configure" command. This
175 reduces the size of the libraries. It is not possible to configure one
176 library with Unicode support, and another without, in the same configuration.
177 It is also not possible to use --enable-ebcdic (see below) with Unicode
178 support, so if this option is set, you must also use --disable-unicode.
179
180 When Unicode support is available, the use of a UTF encoding still has to be
181 enabled by setting the PCRE2_UTF option at run time or starting a pattern
182 with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
183 either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms.
184
185 As well as supporting UTF strings, Unicode support includes support for the
186 \P, \p, and \X sequences that recognize Unicode character properties.
187 However, only the basic two-letter properties such as Lu are supported.
188 Escape sequences such as \d and \w in patterns do not by default make use of
189 Unicode properties, but can be made to do so by setting the PCRE2_UCP option
190 or starting a pattern with (*UCP).
191
192. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
193 of the preceding, or any of the Unicode newline sequences, or the NUL (zero)
194 character as indicating the end of a line. Whatever you specify at build time
195 is the default; the caller of PCRE2 can change the selection at run time. The
196 default newline indicator is a single LF character (the Unix standard). You
197 can specify the default newline indicator by adding --enable-newline-is-cr,
198 --enable-newline-is-lf, --enable-newline-is-crlf,
199 --enable-newline-is-anycrlf, --enable-newline-is-any, or
200 --enable-newline-is-nul to the "configure" command, respectively.
201
202. By default, the sequence \R in a pattern matches any Unicode line ending
203 sequence. This is independent of the option specifying what PCRE2 considers
204 to be the end of a line (see above). However, the caller of PCRE2 can
205 restrict \R to match only CR, LF, or CRLF. You can make this the default by
206 adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
207
208. In a pattern, the escape sequence \C matches a single code unit, even in a
209 UTF mode. This can be dangerous because it breaks up multi-code-unit
210 characters. You can build PCRE2 with the use of \C permanently locked out by
211 adding --enable-never-backslash-C (note the upper case C) to the "configure"
212 command. When \C is allowed by the library, individual applications can lock
213 it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
214
215. PCRE2 has a counter that limits the depth of nesting of parentheses in a
216 pattern. This limits the amount of system stack that a pattern uses when it
217 is compiled. The default is 250, but you can change it by setting, for
218 example,
219
220 --with-parens-nest-limit=500
221
222. PCRE2 has a counter that can be set to limit the amount of computing resource
223 it uses when matching a pattern. If the limit is exceeded during a match, the
224 match fails. The default is ten million. You can change the default by
225 setting, for example,
226
227 --with-match-limit=500000
228
229 on the "configure" command. This is just the default; individual calls to
230 pcre2_match() or pcre2_dfa_match() can supply their own value. There is more
231 discussion in the pcre2api man page (search for pcre2_set_match_limit).
232
233. There is a separate counter that limits the depth of nested backtracking
234 (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a
235 matching process, which indirectly limits the amount of heap memory that is
236 used, and in the case of pcre2_dfa_match() the amount of stack as well. This
237 counter also has a default of ten million, which is essentially "unlimited".
238 You can change the default by setting, for example,
239
240 --with-match-limit-depth=5000
241
242 There is more discussion in the pcre2api man page (search for
243 pcre2_set_depth_limit).
244
245. You can also set an explicit limit on the amount of heap memory used by
246 the pcre2_match() and pcre2_dfa_match() interpreters:
247
248 --with-heap-limit=500
249
250 The units are kibibytes (units of 1024 bytes). This limit does not apply when
251 the JIT optimization (which has its own memory control features) is used.
252 There is more discussion on the pcre2api man page (search for
253 pcre2_set_heap_limit).
254
255. In the 8-bit library, the default maximum compiled pattern size is around
256 64 kibibytes. You can increase this by adding --with-link-size=3 to the
257 "configure" command. PCRE2 then uses three bytes instead of two for offsets
258 to different parts of the compiled pattern. In the 16-bit library,
259 --with-link-size=3 is the same as --with-link-size=4, which (in both
260 libraries) uses four-byte offsets. Increasing the internal link size reduces
261 performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
262 link size setting is ignored, as 4-byte offsets are always used.
263
264. For speed, PCRE2 uses four tables for manipulating and identifying characters
265 whose code point values are less than 256. By default, it uses a set of
266 tables for ASCII encoding that is part of the distribution. If you specify
267
268 --enable-rebuild-chartables
269
270 a program called dftables is compiled and run in the default C locale when
271 you obey "make". It builds a source file called pcre2_chartables.c. If you do
272 not specify this option, pcre2_chartables.c is created as a copy of
273 pcre2_chartables.c.dist. See "Character tables" below for further
274 information.
275
276. It is possible to compile PCRE2 for use on systems that use EBCDIC as their
277 character code (as opposed to ASCII/Unicode) by specifying
278
279 --enable-ebcdic --disable-unicode
280
281 This automatically implies --enable-rebuild-chartables (see above). However,
282 when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
283 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
284 which specifies that the code value for the EBCDIC NL character is 0x25
285 instead of the default 0x15.
286
287. If you specify --enable-debug, additional debugging code is included in the
288 build. This option is intended for use by the PCRE2 maintainers.
289
290. In environments where valgrind is installed, if you specify
291
292 --enable-valgrind
293
294 PCRE2 will use valgrind annotations to mark certain memory regions as
295 unaddressable. This allows it to detect invalid memory accesses, and is
296 mostly useful for debugging PCRE2 itself.
297
298. In environments where the gcc compiler is used and lcov version 1.6 or above
299 is installed, if you specify
300
301 --enable-coverage
302
303 the build process implements a code coverage report for the test suite. The
304 report is generated by running "make coverage". If ccache is installed on
305 your system, it must be disabled when building PCRE2 for coverage reporting.
306 You can do this by setting the environment variable CCACHE_DISABLE=1 before
307 running "make" to build PCRE2. There is more information about coverage
308 reporting in the "pcre2build" documentation.
309
310. When JIT support is enabled, pcre2grep automatically makes use of it, unless
311 you add --disable-pcre2grep-jit to the "configure" command.
312
313. There is support for calling external programs during matching in the
314 pcre2grep command, using PCRE2's callout facility with string arguments. This
315 support can be disabled by adding --disable-pcre2grep-callout to the
316 "configure" command. There are two kinds of callout: one that generates
317 output from inbuilt code, and another that calls an external program. The
318 latter has special support for Windows and VMS; otherwise it assumes the
319 existence of the fork() function. This facility can be disabled by adding
320 --disable-pcre2grep-callout-fork to the "configure" command.
321
322. The pcre2grep program currently supports only 8-bit data files, and so
323 requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
324 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
325 specifying one or both of
326
327 --enable-pcre2grep-libz
328 --enable-pcre2grep-libbz2
329
330 Of course, the relevant libraries must be installed on your system.
331
332. The default starting size (in bytes) of the internal buffer used by pcre2grep
333 can be set by, for example:
334
335 --with-pcre2grep-bufsize=51200
336
337 The value must be a plain integer. The default is 20480. The amount of memory
338 used by pcre2grep is actually three times this number, to allow for "before"
339 and "after" lines. If very long lines are encountered, the buffer is
340 automatically enlarged, up to a fixed maximum size.
341
342. The default maximum size of pcre2grep's internal buffer can be set by, for
343 example:
344
345 --with-pcre2grep-max-bufsize=2097152
346
347 The default is either 1048576 or the value of --with-pcre2grep-bufsize,
348 whichever is the larger.
349
350. It is possible to compile pcre2test so that it links with the libreadline
351 or libedit libraries, by specifying, respectively,
352
353 --enable-pcre2test-libreadline or --enable-pcre2test-libedit
354
355 If this is done, when pcre2test's input is from a terminal, it reads it using
356 the readline() function. This provides line-editing and history facilities.
357 Note that libreadline is GPL-licenced, so if you distribute a binary of
358 pcre2test linked in this way, there may be licensing issues. These can be
359 avoided by linking with libedit (which has a BSD licence) instead.
360
361 Enabling libreadline causes the -lreadline option to be added to the
362 pcre2test build. In many operating environments with a sytem-installed
363 readline library this is sufficient. However, in some environments (e.g. if
364 an unmodified distribution version of readline is in use), it may be
365 necessary to specify something like LIBS="-lncurses" as well. This is
366 because, to quote the readline INSTALL, "Readline uses the termcap functions,
367 but does not link with the termcap or curses library itself, allowing
368 applications which link with readline the to choose an appropriate library."
369 If you get error messages about missing functions tgetstr, tgetent, tputs,
370 tgetflag, or tgoto, this is the problem, and linking with the ncurses library
371 should fix it.
372
373. The C99 standard defines formatting modifiers z and t for size_t and
374 ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in
375 environments other than Microsoft Visual Studio when __STDC_VERSION__ is
376 defined and has a value greater than or equal to 199901L (indicating C99).
377 However, there is at least one environment that claims to be C99 but does not
378 support these modifiers. If --disable-percent-zt is specified, no use is made
379 of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for
380 size_t values.
381
382. There is a special option called --enable-fuzz-support for use by people who
383 want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
384 library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
385 be built, but not installed. This contains a single function called
386 LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
387 length of the string. When called, this function tries to compile the string
388 as a pattern, and if that succeeds, to match it. This is done both with no
389 options and with some random options bits that are generated from the string.
390 Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
391 be created. This is normally run under valgrind or used when PCRE2 is
392 compiled with address sanitizing enabled. It calls the fuzzing function and
393 outputs information about it is doing. The input strings are specified by
394 arguments: if an argument starts with "=" the rest of it is a literal input
395 string. Otherwise, it is assumed to be a file name, and the contents of the
396 file are the test string.
397
398. Releases before 10.30 could be compiled with --disable-stack-for-recursion,
399 which caused pcre2_match() to use individual blocks on the heap for
400 backtracking instead of recursive function calls (which use the stack). This
401 is now obsolete since pcre2_match() was refactored always to use the heap (in
402 a much more efficient way than before). This option is retained for backwards
403 compatibility, but has no effect other than to output a warning.
404
405The "configure" script builds the following files for the basic C library:
406
407. Makefile the makefile that builds the library
408. src/config.h build-time configuration options for the library
409. src/pcre2.h the public PCRE2 header file
410. pcre2-config script that shows the building settings such as CFLAGS
411 that were set for "configure"
412. libpcre2-8.pc )
413. libpcre2-16.pc ) data for the pkg-config command
414. libpcre2-32.pc )
415. libpcre2-posix.pc )
416. libtool script that builds shared and/or static libraries
417
418Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
419tarballs under the names config.h.generic and pcre2.h.generic. These are
420provided for those who have to build PCRE2 without using "configure" or CMake.
421If you use "configure" or CMake, the .generic versions are not used.
422
423The "configure" script also creates config.status, which is an executable
424script that can be run to recreate the configuration, and config.log, which
425contains compiler output from tests that "configure" runs.
426
427Once "configure" has run, you can run "make". This builds whichever of the
428libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
429program called pcre2test. If you enabled JIT support with --enable-jit, another
430test program called pcre2_jit_test is built as well. If the 8-bit library is
431built, libpcre2-posix and the pcre2grep command are also built. Running
432"make" with the -j option may speed up compilation on multiprocessor systems.
433
434The command "make check" runs all the appropriate tests. Details of the PCRE2
435tests are given below in a separate section of this document. The -j option of
436"make" can also be used when running the tests.
437
438You can use "make install" to install PCRE2 into live directories on your
439system. The following are installed (file names are all relative to the
440<prefix> that is set when "configure" is run):
441
442 Commands (bin):
443 pcre2test
444 pcre2grep (if 8-bit support is enabled)
445 pcre2-config
446
447 Libraries (lib):
448 libpcre2-8 (if 8-bit support is enabled)
449 libpcre2-16 (if 16-bit support is enabled)
450 libpcre2-32 (if 32-bit support is enabled)
451 libpcre2-posix (if 8-bit support is enabled)
452
453 Configuration information (lib/pkgconfig):
454 libpcre2-8.pc
455 libpcre2-16.pc
456 libpcre2-32.pc
457 libpcre2-posix.pc
458
459 Header files (include):
460 pcre2.h
461 pcre2posix.h
462
463 Man pages (share/man/man{1,3}):
464 pcre2grep.1
465 pcre2test.1
466 pcre2-config.1
467 pcre2.3
468 pcre2*.3 (lots more pages, all starting "pcre2")
469
470 HTML documentation (share/doc/pcre2/html):
471 index.html
472 *.html (lots more pages, hyperlinked from index.html)
473
474 Text file documentation (share/doc/pcre2):
475 AUTHORS
476 COPYING
477 ChangeLog
478 LICENCE
479 NEWS
480 README
481 pcre2.txt (a concatenation of the man(3) pages)
482 pcre2test.txt the pcre2test man page
483 pcre2grep.txt the pcre2grep man page
484 pcre2-config.txt the pcre2-config man page
485
486If you want to remove PCRE2 from your system, you can run "make uninstall".
487This removes all the files that "make install" installed. However, it does not
488remove any directories, because these are often shared with other programs.
489
490
491Retrieving configuration information
492------------------------------------
493
494Running "make install" installs the command pcre2-config, which can be used to
495recall information about the PCRE2 configuration and installation. For example:
496
497 pcre2-config --version
498
499prints the version number, and
500
501 pcre2-config --libs8
502
503outputs information about where the 8-bit library is installed. This command
504can be included in makefiles for programs that use PCRE2, saving the programmer
505from having to remember too many details. Run pcre2-config with no arguments to
506obtain a list of possible arguments.
507
508The pkg-config command is another system for saving and retrieving information
509about installed libraries. Instead of separate commands for each library, a
510single command is used. For example:
511
512 pkg-config --libs libpcre2-16
513
514The data is held in *.pc files that are installed in a directory called
515<prefix>/lib/pkgconfig.
516
517
518Shared libraries
519----------------
520
521The default distribution builds PCRE2 as shared libraries and static libraries,
522as long as the operating system supports shared libraries. Shared library
523support relies on the "libtool" script which is built as part of the
524"configure" process.
525
526The libtool script is used to compile and link both shared and static
527libraries. They are placed in a subdirectory called .libs when they are newly
528built. The programs pcre2test and pcre2grep are built to use these uninstalled
529libraries (by means of wrapper scripts in the case of shared libraries). When
530you use "make install" to install shared libraries, pcre2grep and pcre2test are
531automatically re-built to use the newly installed shared libraries before being
532installed themselves. However, the versions left in the build directory still
533use the uninstalled libraries.
534
535To build PCRE2 using static libraries only you must use --disable-shared when
536configuring it. For example:
537
538./configure --prefix=/usr/gnu --disable-shared
539
540Then run "make" in the usual way. Similarly, you can use --disable-static to
541build only shared libraries.
542
543
544Cross-compiling using autotools
545-------------------------------
546
547You can specify CC and CFLAGS in the normal way to the "configure" command, in
548order to cross-compile PCRE2 for some other host. However, you should NOT
549specify --enable-rebuild-chartables, because if you do, the dftables.c source
550file is compiled and run on the local host, in order to generate the inbuilt
551character tables (the pcre2_chartables.c file). This will probably not work,
552because dftables.c needs to be compiled with the local compiler, not the cross
553compiler.
554
555When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
556created by making a copy of pcre2_chartables.c.dist, which is a default set of
557tables that assumes ASCII code. Cross-compiling with the default tables should
558not be a problem.
559
560If you need to modify the character tables when cross-compiling, you should
561move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
562and run it on the local host to make a new version of pcre2_chartables.c.dist.
563Then when you cross-compile PCRE2 this new version of the tables will be used.
564
565
566Making new tarballs
567-------------------
568
569The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
570zip formats. The command "make distcheck" does the same, but then does a trial
571build of the new distribution to ensure that it works.
572
573If you have modified any of the man page sources in the doc directory, you
574should first run the PrepareRelease script before making a distribution. This
575script creates the .txt and HTML forms of the documentation from the man pages.
576
577
578Testing PCRE2
579-------------
580
581To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
582There is another script called RunGrepTest that tests the pcre2grep command.
583When JIT support is enabled, a third test program called pcre2_jit_test is
584built. Both the scripts and all the program tests are run if you obey "make
585check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
586
587The RunTest script runs the pcre2test test program (which is documented in its
588own man page) on each of the relevant testinput files in the testdata
589directory, and compares the output with the contents of the corresponding
590testoutput files. RunTest uses a file called testtry to hold the main output
591from pcre2test. Other files whose names begin with "test" are used as working
592files in some tests.
593
594Some tests are relevant only when certain build-time options were selected. For
595example, the tests for UTF-8/16/32 features are run only when Unicode support
596is available. RunTest outputs a comment when it skips a test.
597
598Many (but not all) of the tests that are not skipped are run twice if JIT
599support is available. On the second run, JIT compilation is forced. This
600testing can be suppressed by putting "nojit" on the RunTest command line.
601
602The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
603libraries that are enabled. If you want to run just one set of tests, call
604RunTest with either the -8, -16 or -32 option.
605
606If valgrind is installed, you can run the tests under it by putting "valgrind"
607on the RunTest command line. To run pcre2test on just one or more specific test
608files, give their numbers as arguments to RunTest, for example:
609
610 RunTest 2 7 11
611
612You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
613end), or a number preceded by ~ to exclude a test. For example:
614
615 Runtest 3-15 ~10
616
617This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
618except test 13. Whatever order the arguments are in, the tests are always run
619in numerical order.
620
621You can also call RunTest with the single argument "list" to cause it to output
622a list of tests.
623
624The test sequence starts with "test 0", which is a special test that has no
625input file, and whose output is not checked. This is because it will be
626different on different hardware and with different configurations. The test
627exists in order to exercise some of pcre2test's code that would not otherwise
628be run.
629
630Tests 1 and 2 can always be run, as they expect only plain text strings (not
631UTF) and make no use of Unicode properties. The first test file can be fed
632directly into the perltest.sh script to check that Perl gives the same results.
633The only difference you should see is in the first few lines, where the Perl
634version is given instead of the PCRE2 version. The second set of tests check
635auxiliary functions, error detection, and run-time flags that are specific to
636PCRE2. It also uses the debugging flags to check some of the internals of
637pcre2_compile().
638
639If you build PCRE2 with a locale setting that is not the standard C locale, the
640character tables may be different (see next paragraph). In some cases, this may
641cause failures in the second set of tests. For example, in a locale where the
642isprint() function yields TRUE for characters in the range 128-255, the use of
643[:isascii:] inside a character class defines a different set of characters, and
644this shows up in this test as a difference in the compiled code, which is being
645listed for checking. For example, where the comparison test output contains
646[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
647cases. This is not a bug in PCRE2.
648
649Test 3 checks pcre2_maketables(), the facility for building a set of character
650tables for a specific locale and using them instead of the default tables. The
651script uses the "locale" command to check for the availability of the "fr_FR",
652"french", or "fr" locale, and uses the first one that it finds. If the "locale"
653command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
654the list of available locales, the third test cannot be run, and a comment is
655output to say why. If running this test produces an error like this:
656
657 ** Failed to set locale "fr_FR"
658
659it means that the given locale is not available on your system, despite being
660listed by "locale". This does not mean that PCRE2 is broken. There are three
661alternative output files for the third test, because three different versions
662of the French locale have been encountered. The test passes if its output
663matches any one of them.
664
665Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
666with the perltest.sh script, and test 5 checking PCRE2-specific things.
667
668Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
669non-UTF mode and UTF-mode with Unicode property support, respectively.
670
671Test 8 checks some internal offsets and code size features, but it is run only
672when Unicode support is enabled. The output is different in 8-bit, 16-bit, and
67332-bit modes and for different link sizes, so there are different output files
674for each mode and link size.
675
676Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
67716-bit and 32-bit modes. These are tests that generate different output in
6788-bit mode. Each pair are for general cases and Unicode support, respectively.
679
680Test 13 checks the handling of non-UTF characters greater than 255 by
681pcre2_dfa_match() in 16-bit and 32-bit modes.
682
683Test 14 contains some special UTF and UCP tests that give different output for
684different code unit widths.
685
686Test 15 contains a number of tests that must not be run with JIT. They check,
687among other non-JIT things, the match-limiting features of the intepretive
688matcher.
689
690Test 16 is run only when JIT support is not available. It checks that an
691attempt to use JIT has the expected behaviour.
692
693Test 17 is run only when JIT support is available. It checks JIT complete and
694partial modes, match-limiting under JIT, and other JIT-specific features.
695
696Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to
697the 8-bit library, without and with Unicode support, respectively.
698
699Test 20 checks the serialization functions by writing a set of compiled
700patterns to a file, and then reloading and checking them.
701
702Tests 21 and 22 test \C support when the use of \C is not locked out, without
703and with UTF support, respectively. Test 23 tests \C when it is locked out.
704
705Tests 24 and 25 test the experimental pattern conversion functions, without and
706with UTF support, respectively.
707
708
709Character tables
710----------------
711
712For speed, PCRE2 uses four tables for manipulating and identifying characters
713whose code point values are less than 256. By default, a set of tables that is
714built into the library is used. The pcre2_maketables() function can be called
715by an application to create a new set of tables in the current locale. This are
716passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
717compile context.
718
719The source file called pcre2_chartables.c contains the default set of tables.
720By default, this is created as a copy of pcre2_chartables.c.dist, which
721contains tables for ASCII coding. However, if --enable-rebuild-chartables is
722specified for ./configure, a different version of pcre2_chartables.c is built
723by the program dftables (compiled from dftables.c), which uses the ANSI C
724character handling functions such as isalnum(), isalpha(), isupper(),
725islower(), etc. to build the table sources. This means that the default C
726locale that is set for your system will control the contents of these default
727tables. You can change the default tables by editing pcre2_chartables.c and
728then re-building PCRE2. If you do this, you should take care to ensure that the
729file does not get automatically re-generated. The best way to do this is to
730move pcre2_chartables.c.dist out of the way and replace it with your customized
731tables.
732
733When the dftables program is run as a result of --enable-rebuild-chartables,
734it uses the default C locale that is set on your system. It does not pay
735attention to the LC_xxx environment variables. In other words, it uses the
736system's default locale rather than whatever the compiling user happens to have
737set. If you really do want to build a source set of character tables in a
738locale that is specified by the LC_xxx variables, you can run the dftables
739program by hand with the -L option. For example:
740
741 ./dftables -L pcre2_chartables.c.special
742
743The first two 256-byte tables provide lower casing and case flipping functions,
744respectively. The next table consists of three 32-byte bit maps which identify
745digits, "word" characters, and white space, respectively. These are used when
746building 32-byte bit maps that represent character classes for code points less
747than 256. The final 256-byte table has bits indicating various character types,
748as follows:
749
750 1 white space character
751 2 letter
752 4 decimal digit
753 8 hexadecimal digit
754 16 alphanumeric or '_'
755 128 regular expression metacharacter or binary zero
756
757You should not alter the set of characters that contain the 128 bit, as that
758will cause PCRE2 to malfunction.
759
760
761File manifest
762-------------
763
764The distribution should contain the files listed below.
765
766(A) Source files for the PCRE2 library functions and their headers are found in
767 the src directory:
768
769 src/dftables.c auxiliary program for building pcre2_chartables.c
770 when --enable-rebuild-chartables is specified
771
772 src/pcre2_chartables.c.dist a default set of character tables that assume
773 ASCII coding; unless --enable-rebuild-chartables is
774 specified, used by copying to pcre2_chartables.c
775
776 src/pcre2posix.c )
777 src/pcre2_auto_possess.c )
778 src/pcre2_compile.c )
779 src/pcre2_config.c )
780 src/pcre2_context.c )
781 src/pcre2_convert.c )
782 src/pcre2_dfa_match.c )
783 src/pcre2_error.c )
784 src/pcre2_extuni.c )
785 src/pcre2_find_bracket.c )
786 src/pcre2_jit_compile.c )
787 src/pcre2_jit_match.c ) sources for the functions in the library,
788 src/pcre2_jit_misc.c ) and some internal functions that they use
789 src/pcre2_maketables.c )
790 src/pcre2_match.c )
791 src/pcre2_match_data.c )
792 src/pcre2_newline.c )
793 src/pcre2_ord2utf.c )
794 src/pcre2_pattern_info.c )
795 src/pcre2_script_run.c )
796 src/pcre2_serialize.c )
797 src/pcre2_string_utils.c )
798 src/pcre2_study.c )
799 src/pcre2_substitute.c )
800 src/pcre2_substring.c )
801 src/pcre2_tables.c )
802 src/pcre2_ucd.c )
803 src/pcre2_valid_utf.c )
804 src/pcre2_xclass.c )
805
806 src/pcre2_printint.c debugging function that is used by pcre2test,
807 src/pcre2_fuzzsupport.c function for (optional) fuzzing support
808
809 src/config.h.in template for config.h, when built by "configure"
810 src/pcre2.h.in template for pcre2.h when built by "configure"
811 src/pcre2posix.h header for the external POSIX wrapper API
812 src/pcre2_internal.h header for internal use
813 src/pcre2_intmodedep.h a mode-specific internal header
814 src/pcre2_ucp.h header for Unicode property handling
815
816 sljit/* source files for the JIT compiler
817
818(B) Source files for programs that use PCRE2:
819
820 src/pcre2demo.c simple demonstration of coding calls to PCRE2
821 src/pcre2grep.c source of a grep utility that uses PCRE2
822 src/pcre2test.c comprehensive test program
823 src/pcre2_jit_test.c JIT test program
824
825(C) Auxiliary files:
826
827 132html script to turn "man" pages into HTML
828 AUTHORS information about the author of PCRE2
829 ChangeLog log of changes to the code
830 CleanTxt script to clean nroff output for txt man pages
831 Detrail script to remove trailing spaces
832 HACKING some notes about the internals of PCRE2
833 INSTALL generic installation instructions
834 LICENCE conditions for the use of PCRE2
835 COPYING the same, using GNU's standard name
836 Makefile.in ) template for Unix Makefile, which is built by
837 ) "configure"
838 Makefile.am ) the automake input that was used to create
839 ) Makefile.in
840 NEWS important changes in this release
841 NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools
842 PrepareRelease script to make preparations for "make dist"
843 README this file
844 RunTest a Unix shell script for running tests
845 RunGrepTest a Unix shell script for pcre2grep tests
846 aclocal.m4 m4 macros (generated by "aclocal")
847 config.guess ) files used by libtool,
848 config.sub ) used only when building a shared library
849 configure a configuring shell script (built by autoconf)
850 configure.ac ) the autoconf input that was used to build
851 ) "configure" and config.h
852 depcomp ) script to find program dependencies, generated by
853 ) automake
854 doc/*.3 man page sources for PCRE2
855 doc/*.1 man page sources for pcre2grep and pcre2test
856 doc/index.html.src the base HTML page
857 doc/html/* HTML documentation
858 doc/pcre2.txt plain text version of the man pages
859 doc/pcre2test.txt plain text documentation of test program
860 install-sh a shell script for installing files
861 libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config
862 libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config
863 libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config
864 libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config
865 ltmain.sh file used to build a libtool script
866 missing ) common stub for a few missing GNU programs while
867 ) installing, generated by automake
868 mkinstalldirs script for making install directories
869 perltest.sh Script for running a Perl test program
870 pcre2-config.in source of script which retains PCRE2 information
871 testdata/testinput* test data for main library tests
872 testdata/testoutput* expected test results
873 testdata/grep* input and output for pcre2grep tests
874 testdata/* other supporting test files
875
876(D) Auxiliary files for cmake support
877
878 cmake/COPYING-CMAKE-SCRIPTS
879 cmake/FindPackageHandleStandardArgs.cmake
880 cmake/FindEditline.cmake
881 cmake/FindReadline.cmake
882 CMakeLists.txt
883 config-cmake.h.in
884
885(E) Auxiliary files for building PCRE2 "by hand"
886
887 src/pcre2.h.generic ) a version of the public PCRE2 header file
888 ) for use in non-"configure" environments
889 src/config.h.generic ) a version of config.h for use in non-"configure"
890 ) environments
891
892Philip Hazel
893Email local part: ph10
894Email domain: cam.ac.uk
895Last updated: 16 April 2019
896