1README file for PCRE2 (Perl-compatible regular expression library) 2------------------------------------------------------------------ 3 4PCRE2 is a re-working of the original PCRE1 library to provide an entirely new 5API. Since its initial release in 2015, there has been further development of 6the code and it now differs from PCRE1 in more than just the API. There are new 7features and the internals have been improved. The latest release of PCRE2 is 8available in three alternative formats from: 9 10https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.gz 11https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.bz2 12https://ftp.pcre.org/pub/pcre/pcre2-10.xx.tar.zip 13 14There is a mailing list for discussion about the development of PCRE (both the 15original and new APIs) at pcre-dev@exim.org. You can access the archives and 16subscribe or manage your subscription here: 17 18 https://lists.exim.org/mailman/listinfo/pcre-dev 19 20Please read the NEWS file if you are upgrading from a previous release. The 21contents of this README file are: 22 23 The PCRE2 APIs 24 Documentation for PCRE2 25 Contributions by users of PCRE2 26 Building PCRE2 on non-Unix-like systems 27 Building PCRE2 without using autotools 28 Building PCRE2 using autotools 29 Retrieving configuration information 30 Shared libraries 31 Cross-compiling using autotools 32 Making new tarballs 33 Testing PCRE2 34 Character tables 35 File manifest 36 37 38The PCRE2 APIs 39-------------- 40 41PCRE2 is written in C, and it has its own API. There are three sets of 42functions, one for the 8-bit library, which processes strings of bytes, one for 43the 16-bit library, which processes strings of 16-bit values, and one for the 4432-bit library, which processes strings of 32-bit values. Unlike PCRE1, there 45are no C++ wrappers. 46 47The distribution does contain a set of C wrapper functions for the 8-bit 48library that are based on the POSIX regular expression API (see the pcre2posix 49man page). These are built into a library called libpcre2-posix. Note that this 50just provides a POSIX calling interface to PCRE2; the regular expressions 51themselves still follow Perl syntax and semantics. The POSIX API is restricted, 52and does not give full access to all of PCRE2's facilities. 53 54The header file for the POSIX-style functions is called pcre2posix.h. The 55official POSIX name is regex.h, but I did not want to risk possible problems 56with existing files of that name by distributing it that way. To use PCRE2 with 57an existing program that uses the POSIX API, pcre2posix.h will have to be 58renamed or pointed at by a link (or the program modified, of course). See the 59pcre2posix documentation for more details. 60 61 62Documentation for PCRE2 63----------------------- 64 65If you install PCRE2 in the normal way on a Unix-like system, you will end up 66with a set of man pages whose names all start with "pcre2". The one that is 67just called "pcre2" lists all the others. In addition to these man pages, the 68PCRE2 documentation is supplied in two other forms: 69 70 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and 71 doc/pcre2test.txt in the source distribution. The first of these is a 72 concatenation of the text forms of all the section 3 man pages except the 73 listing of pcre2demo.c and those that summarize individual functions. The 74 other two are the text forms of the section 1 man pages for the pcre2grep 75 and pcre2test commands. These text forms are provided for ease of scanning 76 with text editors or similar tools. They are installed in 77 <prefix>/share/doc/pcre2, where <prefix> is the installation prefix 78 (defaulting to /usr/local). 79 80 2. A set of files containing all the documentation in HTML form, hyperlinked 81 in various ways, and rooted in a file called index.html, is distributed in 82 doc/html and installed in <prefix>/share/doc/pcre2/html. 83 84 85Building PCRE2 on non-Unix-like systems 86--------------------------------------- 87 88For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if 89your system supports the use of "configure" and "make" you may be able to build 90PCRE2 using autotools in the same way as for many Unix-like systems. 91 92PCRE2 can also be configured using CMake, which can be run in various ways 93(command line, GUI, etc). This creates Makefiles, solution files, etc. The file 94NON-AUTOTOOLS-BUILD has information about CMake. 95 96PCRE2 has been compiled on many different operating systems. It should be 97straightforward to build PCRE2 on any system that has a Standard C compiler and 98library, because it uses only Standard C functions. 99 100 101Building PCRE2 without using autotools 102-------------------------------------- 103 104The use of autotools (in particular, libtool) is problematic in some 105environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD 106file for ways of building PCRE2 without using autotools. 107 108 109Building PCRE2 using autotools 110------------------------------ 111 112The following instructions assume the use of the widely used "configure; make; 113make install" (autotools) process. 114 115To build PCRE2 on system that supports autotools, first run the "configure" 116command from the PCRE2 distribution directory, with your current directory set 117to the directory where you want the files to be created. This command is a 118standard GNU "autoconf" configuration script, for which generic instructions 119are supplied in the file INSTALL. 120 121Most commonly, people build PCRE2 within its own distribution directory, and in 122this case, on many systems, just running "./configure" is sufficient. However, 123the usual methods of changing standard defaults are available. For example: 124 125CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local 126 127This command specifies that the C compiler should be run with the flags '-O2 128-Wall' instead of the default, and that "make install" should install PCRE2 129under /opt/local instead of the default /usr/local. 130 131If you want to build in a different directory, just run "configure" with that 132directory as current. For example, suppose you have unpacked the PCRE2 source 133into /source/pcre2/pcre2-xxx, but you want to build it in 134/build/pcre2/pcre2-xxx: 135 136cd /build/pcre2/pcre2-xxx 137/source/pcre2/pcre2-xxx/configure 138 139PCRE2 is written in C and is normally compiled as a C library. However, it is 140possible to build it as a C++ library, though the provided building apparatus 141does not have any features to support this. 142 143There are some optional features that can be included or omitted from the PCRE2 144library. They are also documented in the pcre2build man page. 145 146. By default, both shared and static libraries are built. You can change this 147 by adding one of these options to the "configure" command: 148 149 --disable-shared 150 --disable-static 151 152 (See also "Shared libraries on Unix-like systems" below.) 153 154. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to 155 the "configure" command, the 16-bit library is also built. If you add 156 --enable-pcre2-32 to the "configure" command, the 32-bit library is also 157 built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 158 to disable building the 8-bit library. 159 160. If you want to include support for just-in-time (JIT) compiling, which can 161 give large performance improvements on certain platforms, add --enable-jit to 162 the "configure" command. This support is available only for certain hardware 163 architectures. If you try to enable it on an unsupported architecture, there 164 will be a compile time error. If in doubt, use --enable-jit=auto, which 165 enables JIT only if the current hardware is supported. 166 167. If you are enabling JIT under SELinux environment you may also want to add 168 --enable-jit-sealloc, which enables the use of an executable memory allocator 169 that is compatible with SELinux. Warning: this allocator is experimental! 170 It does not support fork() operation and may crash when no disk space is 171 available. This option has no effect if JIT is disabled. 172 173. If you do not want to make use of the default support for UTF-8 Unicode 174 character strings in the 8-bit library, UTF-16 Unicode character strings in 175 the 16-bit library, or UTF-32 Unicode character strings in the 32-bit 176 library, you can add --disable-unicode to the "configure" command. This 177 reduces the size of the libraries. It is not possible to configure one 178 library with Unicode support, and another without, in the same configuration. 179 It is also not possible to use --enable-ebcdic (see below) with Unicode 180 support, so if this option is set, you must also use --disable-unicode. 181 182 When Unicode support is available, the use of a UTF encoding still has to be 183 enabled by setting the PCRE2_UTF option at run time or starting a pattern 184 with (*UTF). When PCRE2 is compiled with Unicode support, its input can only 185 either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. 186 187 As well as supporting UTF strings, Unicode support includes support for the 188 \P, \p, and \X sequences that recognize Unicode character properties. 189 However, only the basic two-letter properties such as Lu are supported. 190 Escape sequences such as \d and \w in patterns do not by default make use of 191 Unicode properties, but can be made to do so by setting the PCRE2_UCP option 192 or starting a pattern with (*UCP). 193 194. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any 195 of the preceding, or any of the Unicode newline sequences, or the NUL (zero) 196 character as indicating the end of a line. Whatever you specify at build time 197 is the default; the caller of PCRE2 can change the selection at run time. The 198 default newline indicator is a single LF character (the Unix standard). You 199 can specify the default newline indicator by adding --enable-newline-is-cr, 200 --enable-newline-is-lf, --enable-newline-is-crlf, 201 --enable-newline-is-anycrlf, --enable-newline-is-any, or 202 --enable-newline-is-nul to the "configure" command, respectively. 203 204. By default, the sequence \R in a pattern matches any Unicode line ending 205 sequence. This is independent of the option specifying what PCRE2 considers 206 to be the end of a line (see above). However, the caller of PCRE2 can 207 restrict \R to match only CR, LF, or CRLF. You can make this the default by 208 adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). 209 210. In a pattern, the escape sequence \C matches a single code unit, even in a 211 UTF mode. This can be dangerous because it breaks up multi-code-unit 212 characters. You can build PCRE2 with the use of \C permanently locked out by 213 adding --enable-never-backslash-C (note the upper case C) to the "configure" 214 command. When \C is allowed by the library, individual applications can lock 215 it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. 216 217. PCRE2 has a counter that limits the depth of nesting of parentheses in a 218 pattern. This limits the amount of system stack that a pattern uses when it 219 is compiled. The default is 250, but you can change it by setting, for 220 example, 221 222 --with-parens-nest-limit=500 223 224. PCRE2 has a counter that can be set to limit the amount of computing resource 225 it uses when matching a pattern. If the limit is exceeded during a match, the 226 match fails. The default is ten million. You can change the default by 227 setting, for example, 228 229 --with-match-limit=500000 230 231 on the "configure" command. This is just the default; individual calls to 232 pcre2_match() or pcre2_dfa_match() can supply their own value. There is more 233 discussion in the pcre2api man page (search for pcre2_set_match_limit). 234 235. There is a separate counter that limits the depth of nested backtracking 236 (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a 237 matching process, which indirectly limits the amount of heap memory that is 238 used, and in the case of pcre2_dfa_match() the amount of stack as well. This 239 counter also has a default of ten million, which is essentially "unlimited". 240 You can change the default by setting, for example, 241 242 --with-match-limit-depth=5000 243 244 There is more discussion in the pcre2api man page (search for 245 pcre2_set_depth_limit). 246 247. You can also set an explicit limit on the amount of heap memory used by 248 the pcre2_match() and pcre2_dfa_match() interpreters: 249 250 --with-heap-limit=500 251 252 The units are kibibytes (units of 1024 bytes). This limit does not apply when 253 the JIT optimization (which has its own memory control features) is used. 254 There is more discussion on the pcre2api man page (search for 255 pcre2_set_heap_limit). 256 257. In the 8-bit library, the default maximum compiled pattern size is around 258 64 kibibytes. You can increase this by adding --with-link-size=3 to the 259 "configure" command. PCRE2 then uses three bytes instead of two for offsets 260 to different parts of the compiled pattern. In the 16-bit library, 261 --with-link-size=3 is the same as --with-link-size=4, which (in both 262 libraries) uses four-byte offsets. Increasing the internal link size reduces 263 performance in the 8-bit and 16-bit libraries. In the 32-bit library, the 264 link size setting is ignored, as 4-byte offsets are always used. 265 266. For speed, PCRE2 uses four tables for manipulating and identifying characters 267 whose code point values are less than 256. By default, it uses a set of 268 tables for ASCII encoding that is part of the distribution. If you specify 269 270 --enable-rebuild-chartables 271 272 a program called pcre2_dftables is compiled and run in the default C locale 273 when you obey "make". It builds a source file called pcre2_chartables.c. If 274 you do not specify this option, pcre2_chartables.c is created as a copy of 275 pcre2_chartables.c.dist. See "Character tables" below for further 276 information. 277 278. It is possible to compile PCRE2 for use on systems that use EBCDIC as their 279 character code (as opposed to ASCII/Unicode) by specifying 280 281 --enable-ebcdic --disable-unicode 282 283 This automatically implies --enable-rebuild-chartables (see above). However, 284 when PCRE2 is built this way, it always operates in EBCDIC. It cannot support 285 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, 286 which specifies that the code value for the EBCDIC NL character is 0x25 287 instead of the default 0x15. 288 289. If you specify --enable-debug, additional debugging code is included in the 290 build. This option is intended for use by the PCRE2 maintainers. 291 292. In environments where valgrind is installed, if you specify 293 294 --enable-valgrind 295 296 PCRE2 will use valgrind annotations to mark certain memory regions as 297 unaddressable. This allows it to detect invalid memory accesses, and is 298 mostly useful for debugging PCRE2 itself. 299 300. In environments where the gcc compiler is used and lcov is installed, if you 301 specify 302 303 --enable-coverage 304 305 the build process implements a code coverage report for the test suite. The 306 report is generated by running "make coverage". If ccache is installed on 307 your system, it must be disabled when building PCRE2 for coverage reporting. 308 You can do this by setting the environment variable CCACHE_DISABLE=1 before 309 running "make" to build PCRE2. There is more information about coverage 310 reporting in the "pcre2build" documentation. 311 312. When JIT support is enabled, pcre2grep automatically makes use of it, unless 313 you add --disable-pcre2grep-jit to the "configure" command. 314 315. There is support for calling external programs during matching in the 316 pcre2grep command, using PCRE2's callout facility with string arguments. This 317 support can be disabled by adding --disable-pcre2grep-callout to the 318 "configure" command. There are two kinds of callout: one that generates 319 output from inbuilt code, and another that calls an external program. The 320 latter has special support for Windows and VMS; otherwise it assumes the 321 existence of the fork() function. This facility can be disabled by adding 322 --disable-pcre2grep-callout-fork to the "configure" command. 323 324. The pcre2grep program currently supports only 8-bit data files, and so 325 requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use 326 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by 327 specifying one or both of 328 329 --enable-pcre2grep-libz 330 --enable-pcre2grep-libbz2 331 332 Of course, the relevant libraries must be installed on your system. 333 334. The default starting size (in bytes) of the internal buffer used by pcre2grep 335 can be set by, for example: 336 337 --with-pcre2grep-bufsize=51200 338 339 The value must be a plain integer. The default is 20480. The amount of memory 340 used by pcre2grep is actually three times this number, to allow for "before" 341 and "after" lines. If very long lines are encountered, the buffer is 342 automatically enlarged, up to a fixed maximum size. 343 344. The default maximum size of pcre2grep's internal buffer can be set by, for 345 example: 346 347 --with-pcre2grep-max-bufsize=2097152 348 349 The default is either 1048576 or the value of --with-pcre2grep-bufsize, 350 whichever is the larger. 351 352. It is possible to compile pcre2test so that it links with the libreadline 353 or libedit libraries, by specifying, respectively, 354 355 --enable-pcre2test-libreadline or --enable-pcre2test-libedit 356 357 If this is done, when pcre2test's input is from a terminal, it reads it using 358 the readline() function. This provides line-editing and history facilities. 359 Note that libreadline is GPL-licenced, so if you distribute a binary of 360 pcre2test linked in this way, there may be licensing issues. These can be 361 avoided by linking with libedit (which has a BSD licence) instead. 362 363 Enabling libreadline causes the -lreadline option to be added to the 364 pcre2test build. In many operating environments with a sytem-installed 365 readline library this is sufficient. However, in some environments (e.g. if 366 an unmodified distribution version of readline is in use), it may be 367 necessary to specify something like LIBS="-lncurses" as well. This is 368 because, to quote the readline INSTALL, "Readline uses the termcap functions, 369 but does not link with the termcap or curses library itself, allowing 370 applications which link with readline the to choose an appropriate library." 371 If you get error messages about missing functions tgetstr, tgetent, tputs, 372 tgetflag, or tgoto, this is the problem, and linking with the ncurses library 373 should fix it. 374 375. The C99 standard defines formatting modifiers z and t for size_t and 376 ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in 377 environments other than Microsoft Visual Studio when __STDC_VERSION__ is 378 defined and has a value greater than or equal to 199901L (indicating C99). 379 However, there is at least one environment that claims to be C99 but does not 380 support these modifiers. If --disable-percent-zt is specified, no use is made 381 of the z or t modifiers. Instead or %td or %zu, %lu is used, with a cast for 382 size_t values. 383 384. There is a special option called --enable-fuzz-support for use by people who 385 want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit 386 library. If set, it causes an extra library called libpcre2-fuzzsupport.a to 387 be built, but not installed. This contains a single function called 388 LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the 389 length of the string. When called, this function tries to compile the string 390 as a pattern, and if that succeeds, to match it. This is done both with no 391 options and with some random options bits that are generated from the string. 392 Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to 393 be created. This is normally run under valgrind or used when PCRE2 is 394 compiled with address sanitizing enabled. It calls the fuzzing function and 395 outputs information about it is doing. The input strings are specified by 396 arguments: if an argument starts with "=" the rest of it is a literal input 397 string. Otherwise, it is assumed to be a file name, and the contents of the 398 file are the test string. 399 400. Releases before 10.30 could be compiled with --disable-stack-for-recursion, 401 which caused pcre2_match() to use individual blocks on the heap for 402 backtracking instead of recursive function calls (which use the stack). This 403 is now obsolete since pcre2_match() was refactored always to use the heap (in 404 a much more efficient way than before). This option is retained for backwards 405 compatibility, but has no effect other than to output a warning. 406 407The "configure" script builds the following files for the basic C library: 408 409. Makefile the makefile that builds the library 410. src/config.h build-time configuration options for the library 411. src/pcre2.h the public PCRE2 header file 412. pcre2-config script that shows the building settings such as CFLAGS 413 that were set for "configure" 414. libpcre2-8.pc ) 415. libpcre2-16.pc ) data for the pkg-config command 416. libpcre2-32.pc ) 417. libpcre2-posix.pc ) 418. libtool script that builds shared and/or static libraries 419 420Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 421tarballs under the names config.h.generic and pcre2.h.generic. These are 422provided for those who have to build PCRE2 without using "configure" or CMake. 423If you use "configure" or CMake, the .generic versions are not used. 424 425The "configure" script also creates config.status, which is an executable 426script that can be run to recreate the configuration, and config.log, which 427contains compiler output from tests that "configure" runs. 428 429Once "configure" has run, you can run "make". This builds whichever of the 430libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test 431program called pcre2test. If you enabled JIT support with --enable-jit, another 432test program called pcre2_jit_test is built as well. If the 8-bit library is 433built, libpcre2-posix and the pcre2grep command are also built. Running 434"make" with the -j option may speed up compilation on multiprocessor systems. 435 436The command "make check" runs all the appropriate tests. Details of the PCRE2 437tests are given below in a separate section of this document. The -j option of 438"make" can also be used when running the tests. 439 440You can use "make install" to install PCRE2 into live directories on your 441system. The following are installed (file names are all relative to the 442<prefix> that is set when "configure" is run): 443 444 Commands (bin): 445 pcre2test 446 pcre2grep (if 8-bit support is enabled) 447 pcre2-config 448 449 Libraries (lib): 450 libpcre2-8 (if 8-bit support is enabled) 451 libpcre2-16 (if 16-bit support is enabled) 452 libpcre2-32 (if 32-bit support is enabled) 453 libpcre2-posix (if 8-bit support is enabled) 454 455 Configuration information (lib/pkgconfig): 456 libpcre2-8.pc 457 libpcre2-16.pc 458 libpcre2-32.pc 459 libpcre2-posix.pc 460 461 Header files (include): 462 pcre2.h 463 pcre2posix.h 464 465 Man pages (share/man/man{1,3}): 466 pcre2grep.1 467 pcre2test.1 468 pcre2-config.1 469 pcre2.3 470 pcre2*.3 (lots more pages, all starting "pcre2") 471 472 HTML documentation (share/doc/pcre2/html): 473 index.html 474 *.html (lots more pages, hyperlinked from index.html) 475 476 Text file documentation (share/doc/pcre2): 477 AUTHORS 478 COPYING 479 ChangeLog 480 LICENCE 481 NEWS 482 README 483 pcre2.txt (a concatenation of the man(3) pages) 484 pcre2test.txt the pcre2test man page 485 pcre2grep.txt the pcre2grep man page 486 pcre2-config.txt the pcre2-config man page 487 488If you want to remove PCRE2 from your system, you can run "make uninstall". 489This removes all the files that "make install" installed. However, it does not 490remove any directories, because these are often shared with other programs. 491 492 493Retrieving configuration information 494------------------------------------ 495 496Running "make install" installs the command pcre2-config, which can be used to 497recall information about the PCRE2 configuration and installation. For example: 498 499 pcre2-config --version 500 501prints the version number, and 502 503 pcre2-config --libs8 504 505outputs information about where the 8-bit library is installed. This command 506can be included in makefiles for programs that use PCRE2, saving the programmer 507from having to remember too many details. Run pcre2-config with no arguments to 508obtain a list of possible arguments. 509 510The pkg-config command is another system for saving and retrieving information 511about installed libraries. Instead of separate commands for each library, a 512single command is used. For example: 513 514 pkg-config --libs libpcre2-16 515 516The data is held in *.pc files that are installed in a directory called 517<prefix>/lib/pkgconfig. 518 519 520Shared libraries 521---------------- 522 523The default distribution builds PCRE2 as shared libraries and static libraries, 524as long as the operating system supports shared libraries. Shared library 525support relies on the "libtool" script which is built as part of the 526"configure" process. 527 528The libtool script is used to compile and link both shared and static 529libraries. They are placed in a subdirectory called .libs when they are newly 530built. The programs pcre2test and pcre2grep are built to use these uninstalled 531libraries (by means of wrapper scripts in the case of shared libraries). When 532you use "make install" to install shared libraries, pcre2grep and pcre2test are 533automatically re-built to use the newly installed shared libraries before being 534installed themselves. However, the versions left in the build directory still 535use the uninstalled libraries. 536 537To build PCRE2 using static libraries only you must use --disable-shared when 538configuring it. For example: 539 540./configure --prefix=/usr/gnu --disable-shared 541 542Then run "make" in the usual way. Similarly, you can use --disable-static to 543build only shared libraries. 544 545 546Cross-compiling using autotools 547------------------------------- 548 549You can specify CC and CFLAGS in the normal way to the "configure" command, in 550order to cross-compile PCRE2 for some other host. However, you should NOT 551specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c 552source file is compiled and run on the local host, in order to generate the 553inbuilt character tables (the pcre2_chartables.c file). This will probably not 554work, because pcre2_dftables.c needs to be compiled with the local compiler, 555not the cross compiler. 556 557When --enable-rebuild-chartables is not specified, pcre2_chartables.c is 558created by making a copy of pcre2_chartables.c.dist, which is a default set of 559tables that assumes ASCII code. Cross-compiling with the default tables should 560not be a problem. 561 562If you need to modify the character tables when cross-compiling, you should 563move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by 564hand and run it on the local host to make a new version of 565pcre2_chartables.c.dist. See the pcre2build section "Creating character tables 566at build time" for more details. 567 568 569Making new tarballs 570------------------- 571 572The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and 573zip formats. The command "make distcheck" does the same, but then does a trial 574build of the new distribution to ensure that it works. 575 576If you have modified any of the man page sources in the doc directory, you 577should first run the PrepareRelease script before making a distribution. This 578script creates the .txt and HTML forms of the documentation from the man pages. 579 580 581Testing PCRE2 582------------- 583 584To test the basic PCRE2 library on a Unix-like system, run the RunTest script. 585There is another script called RunGrepTest that tests the pcre2grep command. 586When JIT support is enabled, a third test program called pcre2_jit_test is 587built. Both the scripts and all the program tests are run if you obey "make 588check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD. 589 590The RunTest script runs the pcre2test test program (which is documented in its 591own man page) on each of the relevant testinput files in the testdata 592directory, and compares the output with the contents of the corresponding 593testoutput files. RunTest uses a file called testtry to hold the main output 594from pcre2test. Other files whose names begin with "test" are used as working 595files in some tests. 596 597Some tests are relevant only when certain build-time options were selected. For 598example, the tests for UTF-8/16/32 features are run only when Unicode support 599is available. RunTest outputs a comment when it skips a test. 600 601Many (but not all) of the tests that are not skipped are run twice if JIT 602support is available. On the second run, JIT compilation is forced. This 603testing can be suppressed by putting "nojit" on the RunTest command line. 604 605The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit 606libraries that are enabled. If you want to run just one set of tests, call 607RunTest with either the -8, -16 or -32 option. 608 609If valgrind is installed, you can run the tests under it by putting "valgrind" 610on the RunTest command line. To run pcre2test on just one or more specific test 611files, give their numbers as arguments to RunTest, for example: 612 613 RunTest 2 7 11 614 615You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the 616end), or a number preceded by ~ to exclude a test. For example: 617 618 Runtest 3-15 ~10 619 620This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests 621except test 13. Whatever order the arguments are in, the tests are always run 622in numerical order. 623 624You can also call RunTest with the single argument "list" to cause it to output 625a list of tests. 626 627The test sequence starts with "test 0", which is a special test that has no 628input file, and whose output is not checked. This is because it will be 629different on different hardware and with different configurations. The test 630exists in order to exercise some of pcre2test's code that would not otherwise 631be run. 632 633Tests 1 and 2 can always be run, as they expect only plain text strings (not 634UTF) and make no use of Unicode properties. The first test file can be fed 635directly into the perltest.sh script to check that Perl gives the same results. 636The only difference you should see is in the first few lines, where the Perl 637version is given instead of the PCRE2 version. The second set of tests check 638auxiliary functions, error detection, and run-time flags that are specific to 639PCRE2. It also uses the debugging flags to check some of the internals of 640pcre2_compile(). 641 642If you build PCRE2 with a locale setting that is not the standard C locale, the 643character tables may be different (see next paragraph). In some cases, this may 644cause failures in the second set of tests. For example, in a locale where the 645isprint() function yields TRUE for characters in the range 128-255, the use of 646[:isascii:] inside a character class defines a different set of characters, and 647this shows up in this test as a difference in the compiled code, which is being 648listed for checking. For example, where the comparison test output contains 649[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other 650cases. This is not a bug in PCRE2. 651 652Test 3 checks pcre2_maketables(), the facility for building a set of character 653tables for a specific locale and using them instead of the default tables. The 654script uses the "locale" command to check for the availability of the "fr_FR", 655"french", or "fr" locale, and uses the first one that it finds. If the "locale" 656command fails, or if its output doesn't include "fr_FR", "french", or "fr" in 657the list of available locales, the third test cannot be run, and a comment is 658output to say why. If running this test produces an error like this: 659 660 ** Failed to set locale "fr_FR" 661 662it means that the given locale is not available on your system, despite being 663listed by "locale". This does not mean that PCRE2 is broken. There are three 664alternative output files for the third test, because three different versions 665of the French locale have been encountered. The test passes if its output 666matches any one of them. 667 668Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible 669with the perltest.sh script, and test 5 checking PCRE2-specific things. 670 671Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in 672non-UTF mode and UTF-mode with Unicode property support, respectively. 673 674Test 8 checks some internal offsets and code size features, but it is run only 675when Unicode support is enabled. The output is different in 8-bit, 16-bit, and 67632-bit modes and for different link sizes, so there are different output files 677for each mode and link size. 678 679Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in 68016-bit and 32-bit modes. These are tests that generate different output in 6818-bit mode. Each pair are for general cases and Unicode support, respectively. 682 683Test 13 checks the handling of non-UTF characters greater than 255 by 684pcre2_dfa_match() in 16-bit and 32-bit modes. 685 686Test 14 contains some special UTF and UCP tests that give different output for 687different code unit widths. 688 689Test 15 contains a number of tests that must not be run with JIT. They check, 690among other non-JIT things, the match-limiting features of the intepretive 691matcher. 692 693Test 16 is run only when JIT support is not available. It checks that an 694attempt to use JIT has the expected behaviour. 695 696Test 17 is run only when JIT support is available. It checks JIT complete and 697partial modes, match-limiting under JIT, and other JIT-specific features. 698 699Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to 700the 8-bit library, without and with Unicode support, respectively. 701 702Test 20 checks the serialization functions by writing a set of compiled 703patterns to a file, and then reloading and checking them. 704 705Tests 21 and 22 test \C support when the use of \C is not locked out, without 706and with UTF support, respectively. Test 23 tests \C when it is locked out. 707 708Tests 24 and 25 test the experimental pattern conversion functions, without and 709with UTF support, respectively. 710 711 712Character tables 713---------------- 714 715For speed, PCRE2 uses four tables for manipulating and identifying characters 716whose code point values are less than 256. By default, a set of tables that is 717built into the library is used. The pcre2_maketables() function can be called 718by an application to create a new set of tables in the current locale. This are 719passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a 720compile context. 721 722The source file called pcre2_chartables.c contains the default set of tables. 723By default, this is created as a copy of pcre2_chartables.c.dist, which 724contains tables for ASCII coding. However, if --enable-rebuild-chartables is 725specified for ./configure, a new version of pcre2_chartables.c is built by the 726program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C 727character handling functions such as isalnum(), isalpha(), isupper(), 728islower(), etc. to build the table sources. This means that the default C 729locale that is set for your system will control the contents of these default 730tables. You can change the default tables by editing pcre2_chartables.c and 731then re-building PCRE2. If you do this, you should take care to ensure that the 732file does not get automatically re-generated. The best way to do this is to 733move pcre2_chartables.c.dist out of the way and replace it with your customized 734tables. 735 736When the pcre2_dftables program is run as a result of specifying 737--enable-rebuild-chartables, it uses the default C locale that is set on your 738system. It does not pay attention to the LC_xxx environment variables. In other 739words, it uses the system's default locale rather than whatever the compiling 740user happens to have set. If you really do want to build a source set of 741character tables in a locale that is specified by the LC_xxx variables, you can 742run the pcre2_dftables program by hand with the -L option. For example: 743 744 ./pcre2_dftables -L pcre2_chartables.c.special 745 746The second argument names the file where the source code for the tables is 747written. The first two 256-byte tables provide lower casing and case flipping 748functions, respectively. The next table consists of a number of 32-byte bit 749maps which identify certain character classes such as digits, "word" 750characters, white space, etc. These are used when building 32-byte bit maps 751that represent character classes for code points less than 256. The final 752256-byte table has bits indicating various character types, as follows: 753 754 1 white space character 755 2 letter 756 4 lower case letter 757 8 decimal digit 758 16 alphanumeric or '_' 759 760You can also specify -b (with or without -L) when running pcre2_dftables. This 761causes the tables to be written in binary instead of as source code. A set of 762binary tables can be loaded into memory by an application and passed to 763pcre2_compile() in the same way as tables created dynamically by calling 764pcre2_maketables(). The tables are just a string of bytes, independent of 765hardware characteristics such as endianness. This means they can be bundled 766with an application that runs in different environments, to ensure consistent 767behaviour. 768 769See also the pcre2build section "Creating character tables at build time". 770 771 772File manifest 773------------- 774 775The distribution should contain the files listed below. 776 777(A) Source files for the PCRE2 library functions and their headers are found in 778 the src directory: 779 780 src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c 781 when --enable-rebuild-chartables is specified 782 783 src/pcre2_chartables.c.dist a default set of character tables that assume 784 ASCII coding; unless --enable-rebuild-chartables is 785 specified, used by copying to pcre2_chartables.c 786 787 src/pcre2posix.c ) 788 src/pcre2_auto_possess.c ) 789 src/pcre2_compile.c ) 790 src/pcre2_config.c ) 791 src/pcre2_context.c ) 792 src/pcre2_convert.c ) 793 src/pcre2_dfa_match.c ) 794 src/pcre2_error.c ) 795 src/pcre2_extuni.c ) 796 src/pcre2_find_bracket.c ) 797 src/pcre2_jit_compile.c ) 798 src/pcre2_jit_match.c ) sources for the functions in the library, 799 src/pcre2_jit_misc.c ) and some internal functions that they use 800 src/pcre2_maketables.c ) 801 src/pcre2_match.c ) 802 src/pcre2_match_data.c ) 803 src/pcre2_newline.c ) 804 src/pcre2_ord2utf.c ) 805 src/pcre2_pattern_info.c ) 806 src/pcre2_script_run.c ) 807 src/pcre2_serialize.c ) 808 src/pcre2_string_utils.c ) 809 src/pcre2_study.c ) 810 src/pcre2_substitute.c ) 811 src/pcre2_substring.c ) 812 src/pcre2_tables.c ) 813 src/pcre2_ucd.c ) 814 src/pcre2_valid_utf.c ) 815 src/pcre2_xclass.c ) 816 817 src/pcre2_printint.c debugging function that is used by pcre2test, 818 src/pcre2_fuzzsupport.c function for (optional) fuzzing support 819 820 src/config.h.in template for config.h, when built by "configure" 821 src/pcre2.h.in template for pcre2.h when built by "configure" 822 src/pcre2posix.h header for the external POSIX wrapper API 823 src/pcre2_internal.h header for internal use 824 src/pcre2_intmodedep.h a mode-specific internal header 825 src/pcre2_ucp.h header for Unicode property handling 826 827 sljit/* source files for the JIT compiler 828 829(B) Source files for programs that use PCRE2: 830 831 src/pcre2demo.c simple demonstration of coding calls to PCRE2 832 src/pcre2grep.c source of a grep utility that uses PCRE2 833 src/pcre2test.c comprehensive test program 834 src/pcre2_jit_test.c JIT test program 835 836(C) Auxiliary files: 837 838 132html script to turn "man" pages into HTML 839 AUTHORS information about the author of PCRE2 840 ChangeLog log of changes to the code 841 CleanTxt script to clean nroff output for txt man pages 842 Detrail script to remove trailing spaces 843 HACKING some notes about the internals of PCRE2 844 INSTALL generic installation instructions 845 LICENCE conditions for the use of PCRE2 846 COPYING the same, using GNU's standard name 847 Makefile.in ) template for Unix Makefile, which is built by 848 ) "configure" 849 Makefile.am ) the automake input that was used to create 850 ) Makefile.in 851 NEWS important changes in this release 852 NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools 853 PrepareRelease script to make preparations for "make dist" 854 README this file 855 RunTest a Unix shell script for running tests 856 RunGrepTest a Unix shell script for pcre2grep tests 857 aclocal.m4 m4 macros (generated by "aclocal") 858 config.guess ) files used by libtool, 859 config.sub ) used only when building a shared library 860 configure a configuring shell script (built by autoconf) 861 configure.ac ) the autoconf input that was used to build 862 ) "configure" and config.h 863 depcomp ) script to find program dependencies, generated by 864 ) automake 865 doc/*.3 man page sources for PCRE2 866 doc/*.1 man page sources for pcre2grep and pcre2test 867 doc/index.html.src the base HTML page 868 doc/html/* HTML documentation 869 doc/pcre2.txt plain text version of the man pages 870 doc/pcre2test.txt plain text documentation of test program 871 install-sh a shell script for installing files 872 libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config 873 libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config 874 libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config 875 libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config 876 ltmain.sh file used to build a libtool script 877 missing ) common stub for a few missing GNU programs while 878 ) installing, generated by automake 879 mkinstalldirs script for making install directories 880 perltest.sh Script for running a Perl test program 881 pcre2-config.in source of script which retains PCRE2 information 882 testdata/testinput* test data for main library tests 883 testdata/testoutput* expected test results 884 testdata/grep* input and output for pcre2grep tests 885 testdata/* other supporting test files 886 887(D) Auxiliary files for cmake support 888 889 cmake/COPYING-CMAKE-SCRIPTS 890 cmake/FindPackageHandleStandardArgs.cmake 891 cmake/FindEditline.cmake 892 cmake/FindReadline.cmake 893 CMakeLists.txt 894 config-cmake.h.in 895 896(E) Auxiliary files for building PCRE2 "by hand" 897 898 src/pcre2.h.generic ) a version of the public PCRE2 header file 899 ) for use in non-"configure" environments 900 src/config.h.generic ) a version of config.h for use in non-"configure" 901 ) environments 902 903Philip Hazel 904Email local part: Philip.Hazel 905Email domain: gmail.com 906Last updated: 04 December 2020 907