• Home
Name Date Size #Lines LOC

..--

cmake/03-May-2024-130112

doc/03-May-2024-47,82543,208

m4/03-May-2024-9,4558,547

src/03-May-2024-100,86476,137

testdata/03-May-2024-86,77274,147

132htmlD03-May-20246.8 KiB314218

AUTHORSD03-May-2024728 3724

CMakeLists.txtD03-May-202428.4 KiB807672

COPYINGD03-May-202497 63

ChangeLogD03-May-202442.2 KiB935678

CheckManD03-May-20241.5 KiB6854

CleanTxtD03-May-20242.9 KiB11472

DetrailD03-May-2024643 3623

HACKINGD03-May-202427.6 KiB610478

INSTALLD03-May-202415.4 KiB371289

LICENCED03-May-20242.9 KiB8459

Makefile.amD03-May-202422.9 KiB800612

Makefile.inD03-May-2024195.5 KiB3,1392,807

NEWSD03-May-20243.9 KiB11172

NON-AUTOTOOLS-BUILDD03-May-202417.3 KiB393295

PrepareReleaseD03-May-20246.8 KiB236199

READMED03-May-202438.6 KiB849665

RunGrepTestD03-May-202431.5 KiB662466

RunTestD03-May-202425.3 KiB871631

RunTest.batD03-May-202413.7 KiB533481

aclocal.m4D03-May-202453.3 KiB1,4961,357

ar-libD03-May-20245.7 KiB271210

compileD03-May-20247.2 KiB348258

config-cmake.h.inD03-May-20241.3 KiB4938

config.guessD03-May-202441.9 KiB1,4221,230

config.subD03-May-202435.1 KiB1,8081,670

configureD03-May-2024529.4 KiB18,30515,314

configure.acD03-May-202434.3 KiB950802

depcompD03-May-202423 KiB792502

install-shD03-May-202414.3 KiB502327

libpcre2-16.pc.inD03-May-2024393 1411

libpcre2-32.pc.inD03-May-2024393 1411

libpcre2-8.pc.inD03-May-2024390 1411

libpcre2-posix.pc.inD03-May-2024329 1411

ltmain.shD03-May-2024316.5 KiB11,1487,979

missingD03-May-20246.7 KiB216143

pcre2-config.inD03-May-20242.2 KiB122109

perltest.shD03-May-20248.1 KiB298169

test-driverD03-May-20244.5 KiB14987

README

1README file for PCRE2 (Perl-compatible regular expression library)
2------------------------------------------------------------------
3
4PCRE2 is a re-working of the original PCRE library to provide an entirely new
5API. The latest release of PCRE2 is always available in three alternative
6formats from:
7
8  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz
9  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2
10  ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip
11
12There is a mailing list for discussion about the development of PCRE (both the
13original and new APIs) at pcre-dev@exim.org. You can access the archives and
14subscribe or manage your subscription here:
15
16   https://lists.exim.org/mailman/listinfo/pcre-dev
17
18Please read the NEWS file if you are upgrading from a previous release.
19The contents of this README file are:
20
21  The PCRE2 APIs
22  Documentation for PCRE2
23  Contributions by users of PCRE2
24  Building PCRE2 on non-Unix-like systems
25  Building PCRE2 without using autotools
26  Building PCRE2 using autotools
27  Retrieving configuration information
28  Shared libraries
29  Cross-compiling using autotools
30  Making new tarballs
31  Testing PCRE2
32  Character tables
33  File manifest
34
35
36The PCRE2 APIs
37--------------
38
39PCRE2 is written in C, and it has its own API. There are three sets of
40functions, one for the 8-bit library, which processes strings of bytes, one for
41the 16-bit library, which processes strings of 16-bit values, and one for the
4232-bit library, which processes strings of 32-bit values. There are no C++
43wrappers.
44
45The distribution does contain a set of C wrapper functions for the 8-bit
46library that are based on the POSIX regular expression API (see the pcre2posix
47man page). These can be found in a library called libpcre2posix. Note that this
48just provides a POSIX calling interface to PCRE2; the regular expressions
49themselves still follow Perl syntax and semantics. The POSIX API is restricted,
50and does not give full access to all of PCRE2's facilities.
51
52The header file for the POSIX-style functions is called pcre2posix.h. The
53official POSIX name is regex.h, but I did not want to risk possible problems
54with existing files of that name by distributing it that way. To use PCRE2 with
55an existing program that uses the POSIX API, pcre2posix.h will have to be
56renamed or pointed at by a link.
57
58If you are using the POSIX interface to PCRE2 and there is already a POSIX
59regex library installed on your system, as well as worrying about the regex.h
60header file (as mentioned above), you must also take care when linking programs
61to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
62pick up the POSIX functions of the same name from the other library.
63
64One way of avoiding this confusion is to compile PCRE2 with the addition of
65-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
66compiler flags (CFLAGS if you are using "configure" -- see below). This has the
67effect of renaming the functions so that the names no longer clash. Of course,
68you have to do the same thing for your applications, or write them using the
69new names.
70
71
72Documentation for PCRE2
73-----------------------
74
75If you install PCRE2 in the normal way on a Unix-like system, you will end up
76with a set of man pages whose names all start with "pcre2". The one that is
77just called "pcre2" lists all the others. In addition to these man pages, the
78PCRE2 documentation is supplied in two other forms:
79
80  1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and
81     doc/pcre2test.txt in the source distribution. The first of these is a
82     concatenation of the text forms of all the section 3 man pages except the
83     listing of pcre2demo.c and those that summarize individual functions. The
84     other two are the text forms of the section 1 man pages for the pcre2grep
85     and pcre2test commands. These text forms are provided for ease of scanning
86     with text editors or similar tools. They are installed in
87     <prefix>/share/doc/pcre2, where <prefix> is the installation prefix
88     (defaulting to /usr/local).
89
90  2. A set of files containing all the documentation in HTML form, hyperlinked
91     in various ways, and rooted in a file called index.html, is distributed in
92     doc/html and installed in <prefix>/share/doc/pcre2/html.
93
94
95Building PCRE2 on non-Unix-like systems
96---------------------------------------
97
98For a non-Unix-like system, please read the comments in the file
99NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and
100"make" you may be able to build PCRE2 using autotools in the same way as for
101many Unix-like systems.
102
103PCRE2 can also be configured using CMake, which can be run in various ways
104(command line, GUI, etc). This creates Makefiles, solution files, etc. The file
105NON-AUTOTOOLS-BUILD has information about CMake.
106
107PCRE2 has been compiled on many different operating systems. It should be
108straightforward to build PCRE2 on any system that has a Standard C compiler and
109library, because it uses only Standard C functions.
110
111
112Building PCRE2 without using autotools
113--------------------------------------
114
115The use of autotools (in particular, libtool) is problematic in some
116environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD
117file for ways of building PCRE2 without using autotools.
118
119
120Building PCRE2 using autotools
121------------------------------
122
123The following instructions assume the use of the widely used "configure; make;
124make install" (autotools) process.
125
126To build PCRE2 on system that supports autotools, first run the "configure"
127command from the PCRE2 distribution directory, with your current directory set
128to the directory where you want the files to be created. This command is a
129standard GNU "autoconf" configuration script, for which generic instructions
130are supplied in the file INSTALL.
131
132Most commonly, people build PCRE2 within its own distribution directory, and in
133this case, on many systems, just running "./configure" is sufficient. However,
134the usual methods of changing standard defaults are available. For example:
135
136CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
137
138This command specifies that the C compiler should be run with the flags '-O2
139-Wall' instead of the default, and that "make install" should install PCRE2
140under /opt/local instead of the default /usr/local.
141
142If you want to build in a different directory, just run "configure" with that
143directory as current. For example, suppose you have unpacked the PCRE2 source
144into /source/pcre2/pcre2-xxx, but you want to build it in
145/build/pcre2/pcre2-xxx:
146
147cd /build/pcre2/pcre2-xxx
148/source/pcre2/pcre2-xxx/configure
149
150PCRE2 is written in C and is normally compiled as a C library. However, it is
151possible to build it as a C++ library, though the provided building apparatus
152does not have any features to support this.
153
154There are some optional features that can be included or omitted from the PCRE2
155library. They are also documented in the pcre2build man page.
156
157. By default, both shared and static libraries are built. You can change this
158  by adding one of these options to the "configure" command:
159
160  --disable-shared
161  --disable-static
162
163  (See also "Shared libraries on Unix-like systems" below.)
164
165. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to
166  the "configure" command, the 16-bit library is also built. If you add
167  --enable-pcre2-32 to the "configure" command, the 32-bit library is also
168  built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
169  to disable building the 8-bit library.
170
171. If you want to include support for just-in-time (JIT) compiling, which can
172  give large performance improvements on certain platforms, add --enable-jit to
173  the "configure" command. This support is available only for certain hardware
174  architectures. If you try to enable it on an unsupported architecture, there
175  will be a compile time error.
176
177. If you do not want to make use of the support for UTF-8 Unicode character
178  strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
179  library, or UTF-32 Unicode character strings in the 32-bit library, you can
180  add --disable-unicode to the "configure" command. This reduces the size of
181  the libraries. It is not possible to configure one library with Unicode
182  support, and another without, in the same configuration.
183
184  When Unicode support is available, the use of a UTF encoding still has to be
185  enabled by setting the PCRE2_UTF option at run time or starting a pattern
186  with (*UTF). When PCRE2 is compiled with Unicode support, its input can only
187  either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is
188  not possible to use both --enable-unicode and --enable-ebcdic at the same
189  time.
190
191  As well as supporting UTF strings, Unicode support includes support for the
192  \P, \p, and \X sequences that recognize Unicode character properties.
193  However, only the basic two-letter properties such as Lu are supported.
194  Escape sequences such as \d and \w in patterns do not by default make use of
195  Unicode properties, but can be made to do so by setting the PCRE2_UCP option
196  or starting a pattern with (*UCP).
197
198. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any
199  of the preceding, or any of the Unicode newline sequences, as indicating the
200  end of a line. Whatever you specify at build time is the default; the caller
201  of PCRE2 can change the selection at run time. The default newline indicator
202  is a single LF character (the Unix standard). You can specify the default
203  newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf,
204  --enable-newline-is-crlf, --enable-newline-is-anycrlf, or
205  --enable-newline-is-any to the "configure" command, respectively.
206
207  If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
208  the standard tests will fail, because the lines in the test files end with
209  LF. Even if the files are edited to change the line endings, there are likely
210  to be some failures. With --enable-newline-is-anycrlf or
211  --enable-newline-is-any, many tests should succeed, but there may be some
212  failures.
213
214. By default, the sequence \R in a pattern matches any Unicode line ending
215  sequence. This is independent of the option specifying what PCRE2 considers
216  to be the end of a line (see above). However, the caller of PCRE2 can
217  restrict \R to match only CR, LF, or CRLF. You can make this the default by
218  adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").
219
220. In a pattern, the escape sequence \C matches a single code unit, even in a
221  UTF mode. This can be dangerous because it breaks up multi-code-unit
222  characters. You can build PCRE2 with the use of \C permanently locked out by
223  adding --enable-never-backslash-C (note the upper case C) to the "configure"
224  command. When \C is allowed by the library, individual applications can lock
225  it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
226
227. PCRE2 has a counter that limits the depth of nesting of parentheses in a
228  pattern. This limits the amount of system stack that a pattern uses when it
229  is compiled. The default is 250, but you can change it by setting, for
230  example,
231
232  --with-parens-nest-limit=500
233
234. PCRE2 has a counter that can be set to limit the amount of resources it uses
235  when matching a pattern. If the limit is exceeded during a match, the match
236  fails. The default is ten million. You can change the default by setting, for
237  example,
238
239  --with-match-limit=500000
240
241  on the "configure" command. This is just the default; individual calls to
242  pcre2_match() can supply their own value. There is more discussion on the
243  pcre2api man page.
244
245. There is a separate counter that limits the depth of recursive function calls
246  during a matching process. This also has a default of ten million, which is
247  essentially "unlimited". You can change the default by setting, for example,
248
249  --with-match-limit-recursion=500000
250
251  Recursive function calls use up the runtime stack; running out of stack can
252  cause programs to crash in strange ways. There is a discussion about stack
253  sizes in the pcre2stack man page.
254
255. In the 8-bit library, the default maximum compiled pattern size is around
256  64K. You can increase this by adding --with-link-size=3 to the "configure"
257  command. PCRE2 then uses three bytes instead of two for offsets to different
258  parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
259  the same as --with-link-size=4, which (in both libraries) uses four-byte
260  offsets. Increasing the internal link size reduces performance in the 8-bit
261  and 16-bit libraries. In the 32-bit library, the link size setting is
262  ignored, as 4-byte offsets are always used.
263
264. You can build PCRE2 so that its internal match() function that is called from
265  pcre2_match() does not call itself recursively. Instead, it uses memory
266  blocks obtained from the heap to save data that would otherwise be saved on
267  the stack. To build PCRE2 like this, use
268
269  --disable-stack-for-recursion
270
271  on the "configure" command. PCRE2 runs more slowly in this mode, but it may
272  be necessary in environments with limited stack sizes. This applies only to
273  the normal execution of the pcre2_match() function; if JIT support is being
274  successfully used, it is not relevant. Equally, it does not apply to
275  pcre2_dfa_match(), which does not use deeply nested recursion. There is a
276  discussion about stack sizes in the pcre2stack man page.
277
278. For speed, PCRE2 uses four tables for manipulating and identifying characters
279  whose code point values are less than 256. By default, it uses a set of
280  tables for ASCII encoding that is part of the distribution. If you specify
281
282  --enable-rebuild-chartables
283
284  a program called dftables is compiled and run in the default C locale when
285  you obey "make". It builds a source file called pcre2_chartables.c. If you do
286  not specify this option, pcre2_chartables.c is created as a copy of
287  pcre2_chartables.c.dist. See "Character tables" below for further
288  information.
289
290. It is possible to compile PCRE2 for use on systems that use EBCDIC as their
291  character code (as opposed to ASCII/Unicode) by specifying
292
293  --enable-ebcdic --disable-unicode
294
295  This automatically implies --enable-rebuild-chartables (see above). However,
296  when PCRE2 is built this way, it always operates in EBCDIC. It cannot support
297  both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25,
298  which specifies that the code value for the EBCDIC NL character is 0x25
299  instead of the default 0x15.
300
301. If you specify --enable-debug, additional debugging code is included in the
302  build. This option is intended for use by the PCRE2 maintainers.
303
304. In environments where valgrind is installed, if you specify
305
306  --enable-valgrind
307
308  PCRE2 will use valgrind annotations to mark certain memory regions as
309  unaddressable. This allows it to detect invalid memory accesses, and is
310  mostly useful for debugging PCRE2 itself.
311
312. In environments where the gcc compiler is used and lcov version 1.6 or above
313  is installed, if you specify
314
315  --enable-coverage
316
317  the build process implements a code coverage report for the test suite. The
318  report is generated by running "make coverage". If ccache is installed on
319  your system, it must be disabled when building PCRE2 for coverage reporting.
320  You can do this by setting the environment variable CCACHE_DISABLE=1 before
321  running "make" to build PCRE2. There is more information about coverage
322  reporting in the "pcre2build" documentation.
323
324. When JIT support is enabled, pcre2grep automatically makes use of it, unless
325  you add --disable-pcre2grep-jit to the "configure" command.
326
327. On non-Windows sytems there is support for calling external scripts during
328  matching in the pcre2grep command via PCRE2's callout facility with string
329  arguments. This support can be disabled by adding --disable-pcre2grep-callout
330  to the "configure" command.
331
332. The pcre2grep program currently supports only 8-bit data files, and so
333  requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
334  libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
335  specifying one or both of
336
337  --enable-pcre2grep-libz
338  --enable-pcre2grep-libbz2
339
340  Of course, the relevant libraries must be installed on your system.
341
342. The default size (in bytes) of the internal buffer used by pcre2grep can be
343  set by, for example:
344
345  --with-pcre2grep-bufsize=51200
346
347  The value must be a plain integer. The default is 20480.
348
349. It is possible to compile pcre2test so that it links with the libreadline
350  or libedit libraries, by specifying, respectively,
351
352  --enable-pcre2test-libreadline or --enable-pcre2test-libedit
353
354  If this is done, when pcre2test's input is from a terminal, it reads it using
355  the readline() function. This provides line-editing and history facilities.
356  Note that libreadline is GPL-licenced, so if you distribute a binary of
357  pcre2test linked in this way, there may be licensing issues. These can be
358  avoided by linking with libedit (which has a BSD licence) instead.
359
360  Enabling libreadline causes the -lreadline option to be added to the
361  pcre2test build. In many operating environments with a sytem-installed
362  readline library this is sufficient. However, in some environments (e.g. if
363  an unmodified distribution version of readline is in use), it may be
364  necessary to specify something like LIBS="-lncurses" as well. This is
365  because, to quote the readline INSTALL, "Readline uses the termcap functions,
366  but does not link with the termcap or curses library itself, allowing
367  applications which link with readline the to choose an appropriate library."
368  If you get error messages about missing functions tgetstr, tgetent, tputs,
369  tgetflag, or tgoto, this is the problem, and linking with the ncurses library
370  should fix it.
371
372The "configure" script builds the following files for the basic C library:
373
374. Makefile             the makefile that builds the library
375. src/config.h         build-time configuration options for the library
376. src/pcre2.h          the public PCRE2 header file
377. pcre2-config          script that shows the building settings such as CFLAGS
378                         that were set for "configure"
379. libpcre2-8.pc        )
380. libpcre2-16.pc       ) data for the pkg-config command
381. libpcre2-32.pc       )
382. libpcre2-posix.pc    )
383. libtool              script that builds shared and/or static libraries
384
385Versions of config.h and pcre2.h are distributed in the src directory of PCRE2
386tarballs under the names config.h.generic and pcre2.h.generic. These are
387provided for those who have to build PCRE2 without using "configure" or CMake.
388If you use "configure" or CMake, the .generic versions are not used.
389
390The "configure" script also creates config.status, which is an executable
391script that can be run to recreate the configuration, and config.log, which
392contains compiler output from tests that "configure" runs.
393
394Once "configure" has run, you can run "make". This builds whichever of the
395libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test
396program called pcre2test. If you enabled JIT support with --enable-jit, another
397test program called pcre2_jit_test is built as well. If the 8-bit library is
398built, libpcre2-posix and the pcre2grep command are also built. Running
399"make" with the -j option may speed up compilation on multiprocessor systems.
400
401The command "make check" runs all the appropriate tests. Details of the PCRE2
402tests are given below in a separate section of this document. The -j option of
403"make" can also be used when running the tests.
404
405You can use "make install" to install PCRE2 into live directories on your
406system. The following are installed (file names are all relative to the
407<prefix> that is set when "configure" is run):
408
409  Commands (bin):
410    pcre2test
411    pcre2grep (if 8-bit support is enabled)
412    pcre2-config
413
414  Libraries (lib):
415    libpcre2-8      (if 8-bit support is enabled)
416    libpcre2-16     (if 16-bit support is enabled)
417    libpcre2-32     (if 32-bit support is enabled)
418    libpcre2-posix  (if 8-bit support is enabled)
419
420  Configuration information (lib/pkgconfig):
421    libpcre2-8.pc
422    libpcre2-16.pc
423    libpcre2-32.pc
424    libpcre2-posix.pc
425
426  Header files (include):
427    pcre2.h
428    pcre2posix.h
429
430  Man pages (share/man/man{1,3}):
431    pcre2grep.1
432    pcre2test.1
433    pcre2-config.1
434    pcre2.3
435    pcre2*.3 (lots more pages, all starting "pcre2")
436
437  HTML documentation (share/doc/pcre2/html):
438    index.html
439    *.html (lots more pages, hyperlinked from index.html)
440
441  Text file documentation (share/doc/pcre2):
442    AUTHORS
443    COPYING
444    ChangeLog
445    LICENCE
446    NEWS
447    README
448    pcre2.txt         (a concatenation of the man(3) pages)
449    pcre2test.txt     the pcre2test man page
450    pcre2grep.txt     the pcre2grep man page
451    pcre2-config.txt  the pcre2-config man page
452
453If you want to remove PCRE2 from your system, you can run "make uninstall".
454This removes all the files that "make install" installed. However, it does not
455remove any directories, because these are often shared with other programs.
456
457
458Retrieving configuration information
459------------------------------------
460
461Running "make install" installs the command pcre2-config, which can be used to
462recall information about the PCRE2 configuration and installation. For example:
463
464  pcre2-config --version
465
466prints the version number, and
467
468  pcre2-config --libs8
469
470outputs information about where the 8-bit library is installed. This command
471can be included in makefiles for programs that use PCRE2, saving the programmer
472from having to remember too many details. Run pcre2-config with no arguments to
473obtain a list of possible arguments.
474
475The pkg-config command is another system for saving and retrieving information
476about installed libraries. Instead of separate commands for each library, a
477single command is used. For example:
478
479  pkg-config --libs libpcre2-16
480
481The data is held in *.pc files that are installed in a directory called
482<prefix>/lib/pkgconfig.
483
484
485Shared libraries
486----------------
487
488The default distribution builds PCRE2 as shared libraries and static libraries,
489as long as the operating system supports shared libraries. Shared library
490support relies on the "libtool" script which is built as part of the
491"configure" process.
492
493The libtool script is used to compile and link both shared and static
494libraries. They are placed in a subdirectory called .libs when they are newly
495built. The programs pcre2test and pcre2grep are built to use these uninstalled
496libraries (by means of wrapper scripts in the case of shared libraries). When
497you use "make install" to install shared libraries, pcre2grep and pcre2test are
498automatically re-built to use the newly installed shared libraries before being
499installed themselves. However, the versions left in the build directory still
500use the uninstalled libraries.
501
502To build PCRE2 using static libraries only you must use --disable-shared when
503configuring it. For example:
504
505./configure --prefix=/usr/gnu --disable-shared
506
507Then run "make" in the usual way. Similarly, you can use --disable-static to
508build only shared libraries.
509
510
511Cross-compiling using autotools
512-------------------------------
513
514You can specify CC and CFLAGS in the normal way to the "configure" command, in
515order to cross-compile PCRE2 for some other host. However, you should NOT
516specify --enable-rebuild-chartables, because if you do, the dftables.c source
517file is compiled and run on the local host, in order to generate the inbuilt
518character tables (the pcre2_chartables.c file). This will probably not work,
519because dftables.c needs to be compiled with the local compiler, not the cross
520compiler.
521
522When --enable-rebuild-chartables is not specified, pcre2_chartables.c is
523created by making a copy of pcre2_chartables.c.dist, which is a default set of
524tables that assumes ASCII code. Cross-compiling with the default tables should
525not be a problem.
526
527If you need to modify the character tables when cross-compiling, you should
528move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand
529and run it on the local host to make a new version of pcre2_chartables.c.dist.
530Then when you cross-compile PCRE2 this new version of the tables will be used.
531
532
533Making new tarballs
534-------------------
535
536The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and
537zip formats. The command "make distcheck" does the same, but then does a trial
538build of the new distribution to ensure that it works.
539
540If you have modified any of the man page sources in the doc directory, you
541should first run the PrepareRelease script before making a distribution. This
542script creates the .txt and HTML forms of the documentation from the man pages.
543
544
545Testing PCRE2
546------------
547
548To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
549There is another script called RunGrepTest that tests the pcre2grep command.
550When JIT support is enabled, a third test program called pcre2_jit_test is
551built. Both the scripts and all the program tests are run if you obey "make
552check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD.
553
554The RunTest script runs the pcre2test test program (which is documented in its
555own man page) on each of the relevant testinput files in the testdata
556directory, and compares the output with the contents of the corresponding
557testoutput files. RunTest uses a file called testtry to hold the main output
558from pcre2test. Other files whose names begin with "test" are used as working
559files in some tests.
560
561Some tests are relevant only when certain build-time options were selected. For
562example, the tests for UTF-8/16/32 features are run only when Unicode support
563is available. RunTest outputs a comment when it skips a test.
564
565Many (but not all) of the tests that are not skipped are run twice if JIT
566support is available. On the second run, JIT compilation is forced. This
567testing can be suppressed by putting "nojit" on the RunTest command line.
568
569The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit
570libraries that are enabled. If you want to run just one set of tests, call
571RunTest with either the -8, -16 or -32 option.
572
573If valgrind is installed, you can run the tests under it by putting "valgrind"
574on the RunTest command line. To run pcre2test on just one or more specific test
575files, give their numbers as arguments to RunTest, for example:
576
577  RunTest 2 7 11
578
579You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the
580end), or a number preceded by ~ to exclude a test. For example:
581
582  Runtest 3-15 ~10
583
584This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests
585except test 13. Whatever order the arguments are in, the tests are always run
586in numerical order.
587
588You can also call RunTest with the single argument "list" to cause it to output
589a list of tests.
590
591The test sequence starts with "test 0", which is a special test that has no
592input file, and whose output is not checked. This is because it will be
593different on different hardware and with different configurations. The test
594exists in order to exercise some of pcre2test's code that would not otherwise
595be run.
596
597Tests 1 and 2 can always be run, as they expect only plain text strings (not
598UTF) and make no use of Unicode properties. The first test file can be fed
599directly into the perltest.sh script to check that Perl gives the same results.
600The only difference you should see is in the first few lines, where the Perl
601version is given instead of the PCRE2 version. The second set of tests check
602auxiliary functions, error detection, and run-time flags that are specific to
603PCRE2. It also uses the debugging flags to check some of the internals of
604pcre2_compile().
605
606If you build PCRE2 with a locale setting that is not the standard C locale, the
607character tables may be different (see next paragraph). In some cases, this may
608cause failures in the second set of tests. For example, in a locale where the
609isprint() function yields TRUE for characters in the range 128-255, the use of
610[:isascii:] inside a character class defines a different set of characters, and
611this shows up in this test as a difference in the compiled code, which is being
612listed for checking. For example, where the comparison test output contains
613[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other
614cases. This is not a bug in PCRE2.
615
616Test 3 checks pcre2_maketables(), the facility for building a set of character
617tables for a specific locale and using them instead of the default tables. The
618script uses the "locale" command to check for the availability of the "fr_FR",
619"french", or "fr" locale, and uses the first one that it finds. If the "locale"
620command fails, or if its output doesn't include "fr_FR", "french", or "fr" in
621the list of available locales, the third test cannot be run, and a comment is
622output to say why. If running this test produces an error like this:
623
624  ** Failed to set locale "fr_FR"
625
626it means that the given locale is not available on your system, despite being
627listed by "locale". This does not mean that PCRE2 is broken. There are three
628alternative output files for the third test, because three different versions
629of the French locale have been encountered. The test passes if its output
630matches any one of them.
631
632Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible
633with the perltest.sh script, and test 5 checking PCRE2-specific things.
634
635Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in
636non-UTF mode and UTF-mode with Unicode property support, respectively.
637
638Test 8 checks some internal offsets and code size features; it is run only when
639the default "link size" of 2 is set (in other cases the sizes change) and when
640Unicode support is enabled.
641
642Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in
64316-bit and 32-bit modes. These are tests that generate different output in
6448-bit mode. Each pair are for general cases and Unicode support, respectively.
645Test 13 checks the handling of non-UTF characters greater than 255 by
646pcre2_dfa_match() in 16-bit and 32-bit modes.
647
648Test 14 contains a number of tests that must not be run with JIT. They check,
649among other non-JIT things, the match-limiting features of the intepretive
650matcher.
651
652Test 15 is run only when JIT support is not available. It checks that an
653attempt to use JIT has the expected behaviour.
654
655Test 16 is run only when JIT support is available. It checks JIT complete and
656partial modes, match-limiting under JIT, and other JIT-specific features.
657
658Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to
659the 8-bit library, without and with Unicode support, respectively.
660
661Test 19 checks the serialization functions by writing a set of compiled
662patterns to a file, and then reloading and checking them.
663
664
665Character tables
666----------------
667
668For speed, PCRE2 uses four tables for manipulating and identifying characters
669whose code point values are less than 256. By default, a set of tables that is
670built into the library is used. The pcre2_maketables() function can be called
671by an application to create a new set of tables in the current locale. This are
672passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a
673compile context.
674
675The source file called pcre2_chartables.c contains the default set of tables.
676By default, this is created as a copy of pcre2_chartables.c.dist, which
677contains tables for ASCII coding. However, if --enable-rebuild-chartables is
678specified for ./configure, a different version of pcre2_chartables.c is built
679by the program dftables (compiled from dftables.c), which uses the ANSI C
680character handling functions such as isalnum(), isalpha(), isupper(),
681islower(), etc. to build the table sources. This means that the default C
682locale which is set for your system will control the contents of these default
683tables. You can change the default tables by editing pcre2_chartables.c and
684then re-building PCRE2. If you do this, you should take care to ensure that the
685file does not get automatically re-generated. The best way to do this is to
686move pcre2_chartables.c.dist out of the way and replace it with your customized
687tables.
688
689When the dftables program is run as a result of --enable-rebuild-chartables,
690it uses the default C locale that is set on your system. It does not pay
691attention to the LC_xxx environment variables. In other words, it uses the
692system's default locale rather than whatever the compiling user happens to have
693set. If you really do want to build a source set of character tables in a
694locale that is specified by the LC_xxx variables, you can run the dftables
695program by hand with the -L option. For example:
696
697  ./dftables -L pcre2_chartables.c.special
698
699The first two 256-byte tables provide lower casing and case flipping functions,
700respectively. The next table consists of three 32-byte bit maps which identify
701digits, "word" characters, and white space, respectively. These are used when
702building 32-byte bit maps that represent character classes for code points less
703than 256. The final 256-byte table has bits indicating various character types,
704as follows:
705
706    1   white space character
707    2   letter
708    4   decimal digit
709    8   hexadecimal digit
710   16   alphanumeric or '_'
711  128   regular expression metacharacter or binary zero
712
713You should not alter the set of characters that contain the 128 bit, as that
714will cause PCRE2 to malfunction.
715
716
717File manifest
718-------------
719
720The distribution should contain the files listed below.
721
722(A) Source files for the PCRE2 library functions and their headers are found in
723    the src directory:
724
725  src/dftables.c           auxiliary program for building pcre2_chartables.c
726                           when --enable-rebuild-chartables is specified
727
728  src/pcre2_chartables.c.dist  a default set of character tables that assume
729                           ASCII coding; unless --enable-rebuild-chartables is
730                           specified, used by copying to pcre2_chartables.c
731
732  src/pcre2posix.c         )
733  src/pcre2_auto_possess.c )
734  src/pcre2_compile.c      )
735  src/pcre2_config.c       )
736  src/pcre2_context.c      )
737  src/pcre2_dfa_match.c    )
738  src/pcre2_error.c        )
739  src/pcre2_find_bracket.c )
740  src/pcre2_jit_compile.c  )
741  src/pcre2_jit_match.c    ) sources for the functions in the library,
742  src/pcre2_jit_misc.c     )   and some internal functions that they use
743  src/pcre2_maketables.c   )
744  src/pcre2_match.c        )
745  src/pcre2_match_data.c   )
746  src/pcre2_newline.c      )
747  src/pcre2_ord2utf.c      )
748  src/pcre2_pattern_info.c )
749  src/pcre2_serialize.c    )
750  src/pcre2_string_utils.c )
751  src/pcre2_study.c        )
752  src/pcre2_substitute.c   )
753  src/pcre2_substring.c    )
754  src/pcre2_tables.c       )
755  src/pcre2_ucd.c          )
756  src/pcre2_valid_utf.c    )
757  src/pcre2_xclass.c       )
758
759  src/pcre2_printint.c     debugging function that is used by pcre2test,
760
761  src/config.h.in          template for config.h, when built by "configure"
762  src/pcre2.h.in           template for pcre2.h when built by "configure"
763  src/pcre2posix.h         header for the external POSIX wrapper API
764  src/pcre2_internal.h     header for internal use
765  src/pcre2_intmodedep.h   a mode-specific internal header
766  src/pcre2_ucp.h          header for Unicode property handling
767
768  sljit/*                  source files for the JIT compiler
769
770(B) Source files for programs that use PCRE2:
771
772  src/pcre2demo.c          simple demonstration of coding calls to PCRE2
773  src/pcre2grep.c          source of a grep utility that uses PCRE2
774  src/pcre2test.c          comprehensive test program
775  src/pcre2_printint.c     part of pcre2test
776  src/pcre2_jit_test.c     JIT test program
777
778(C) Auxiliary files:
779
780  132html                  script to turn "man" pages into HTML
781  AUTHORS                  information about the author of PCRE2
782  ChangeLog                log of changes to the code
783  CleanTxt                 script to clean nroff output for txt man pages
784  Detrail                  script to remove trailing spaces
785  HACKING                  some notes about the internals of PCRE2
786  INSTALL                  generic installation instructions
787  LICENCE                  conditions for the use of PCRE2
788  COPYING                  the same, using GNU's standard name
789  Makefile.in              ) template for Unix Makefile, which is built by
790                           )   "configure"
791  Makefile.am              ) the automake input that was used to create
792                           )   Makefile.in
793  NEWS                     important changes in this release
794  NON-AUTOTOOLS-BUILD      notes on building PCRE2 without using autotools
795  PrepareRelease           script to make preparations for "make dist"
796  README                   this file
797  RunTest                  a Unix shell script for running tests
798  RunGrepTest              a Unix shell script for pcre2grep tests
799  aclocal.m4               m4 macros (generated by "aclocal")
800  config.guess             ) files used by libtool,
801  config.sub               )   used only when building a shared library
802  configure                a configuring shell script (built by autoconf)
803  configure.ac             ) the autoconf input that was used to build
804                           )   "configure" and config.h
805  depcomp                  ) script to find program dependencies, generated by
806                           )   automake
807  doc/*.3                  man page sources for PCRE2
808  doc/*.1                  man page sources for pcre2grep and pcre2test
809  doc/index.html.src       the base HTML page
810  doc/html/*               HTML documentation
811  doc/pcre2.txt            plain text version of the man pages
812  doc/pcre2test.txt        plain text documentation of test program
813  install-sh               a shell script for installing files
814  libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
815  libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
816  libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
817  libpcre2posix.pc.in      template for libpcre2posix.pc for pkg-config
818  ltmain.sh                file used to build a libtool script
819  missing                  ) common stub for a few missing GNU programs while
820                           )   installing, generated by automake
821  mkinstalldirs            script for making install directories
822  perltest.sh              Script for running a Perl test program
823  pcre2-config.in          source of script which retains PCRE2 information
824  testdata/testinput*      test data for main library tests
825  testdata/testoutput*     expected test results
826  testdata/grep*           input and output for pcre2grep tests
827  testdata/*               other supporting test files
828
829(D) Auxiliary files for cmake support
830
831  cmake/COPYING-CMAKE-SCRIPTS
832  cmake/FindPackageHandleStandardArgs.cmake
833  cmake/FindEditline.cmake
834  cmake/FindReadline.cmake
835  CMakeLists.txt
836  config-cmake.h.in
837
838(E) Auxiliary files for building PCRE2 "by hand"
839
840  pcre2.h.generic         ) a version of the public PCRE2 header file
841                          )   for use in non-"configure" environments
842  config.h.generic        ) a version of config.h for use in non-"configure"
843                          )   environments
844
845Philip Hazel
846Email local part: ph10
847Email domain: cam.ac.uk
848Last updated: 01 April 2016
849