1README file for PCRE2 (Perl-compatible regular expression library) 2------------------------------------------------------------------ 3 4PCRE2 is a re-working of the original PCRE library to provide an entirely new 5API. The latest release of PCRE2 is always available in three alternative 6formats from: 7 8 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz 9 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2 10 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip 11 12There is a mailing list for discussion about the development of PCRE (both the 13original and new APIs) at pcre-dev@exim.org. You can access the archives and 14subscribe or manage your subscription here: 15 16 https://lists.exim.org/mailman/listinfo/pcre-dev 17 18Please read the NEWS file if you are upgrading from a previous release. 19The contents of this README file are: 20 21 The PCRE2 APIs 22 Documentation for PCRE2 23 Contributions by users of PCRE2 24 Building PCRE2 on non-Unix-like systems 25 Building PCRE2 without using autotools 26 Building PCRE2 using autotools 27 Retrieving configuration information 28 Shared libraries 29 Cross-compiling using autotools 30 Making new tarballs 31 Testing PCRE2 32 Character tables 33 File manifest 34 35 36The PCRE2 APIs 37-------------- 38 39PCRE2 is written in C, and it has its own API. There are three sets of 40functions, one for the 8-bit library, which processes strings of bytes, one for 41the 16-bit library, which processes strings of 16-bit values, and one for the 4232-bit library, which processes strings of 32-bit values. There are no C++ 43wrappers. 44 45The distribution does contain a set of C wrapper functions for the 8-bit 46library that are based on the POSIX regular expression API (see the pcre2posix 47man page). These can be found in a library called libpcre2posix. Note that this 48just provides a POSIX calling interface to PCRE2; the regular expressions 49themselves still follow Perl syntax and semantics. The POSIX API is restricted, 50and does not give full access to all of PCRE2's facilities. 51 52The header file for the POSIX-style functions is called pcre2posix.h. The 53official POSIX name is regex.h, but I did not want to risk possible problems 54with existing files of that name by distributing it that way. To use PCRE2 with 55an existing program that uses the POSIX API, pcre2posix.h will have to be 56renamed or pointed at by a link. 57 58If you are using the POSIX interface to PCRE2 and there is already a POSIX 59regex library installed on your system, as well as worrying about the regex.h 60header file (as mentioned above), you must also take care when linking programs 61to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may 62pick up the POSIX functions of the same name from the other library. 63 64One way of avoiding this confusion is to compile PCRE2 with the addition of 65-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the 66compiler flags (CFLAGS if you are using "configure" -- see below). This has the 67effect of renaming the functions so that the names no longer clash. Of course, 68you have to do the same thing for your applications, or write them using the 69new names. 70 71 72Documentation for PCRE2 73----------------------- 74 75If you install PCRE2 in the normal way on a Unix-like system, you will end up 76with a set of man pages whose names all start with "pcre2". The one that is 77just called "pcre2" lists all the others. In addition to these man pages, the 78PCRE2 documentation is supplied in two other forms: 79 80 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and 81 doc/pcre2test.txt in the source distribution. The first of these is a 82 concatenation of the text forms of all the section 3 man pages except the 83 listing of pcre2demo.c and those that summarize individual functions. The 84 other two are the text forms of the section 1 man pages for the pcre2grep 85 and pcre2test commands. These text forms are provided for ease of scanning 86 with text editors or similar tools. They are installed in 87 <prefix>/share/doc/pcre2, where <prefix> is the installation prefix 88 (defaulting to /usr/local). 89 90 2. A set of files containing all the documentation in HTML form, hyperlinked 91 in various ways, and rooted in a file called index.html, is distributed in 92 doc/html and installed in <prefix>/share/doc/pcre2/html. 93 94 95Building PCRE2 on non-Unix-like systems 96--------------------------------------- 97 98For a non-Unix-like system, please read the comments in the file 99NON-AUTOTOOLS-BUILD, though if your system supports the use of "configure" and 100"make" you may be able to build PCRE2 using autotools in the same way as for 101many Unix-like systems. 102 103PCRE2 can also be configured using CMake, which can be run in various ways 104(command line, GUI, etc). This creates Makefiles, solution files, etc. The file 105NON-AUTOTOOLS-BUILD has information about CMake. 106 107PCRE2 has been compiled on many different operating systems. It should be 108straightforward to build PCRE2 on any system that has a Standard C compiler and 109library, because it uses only Standard C functions. 110 111 112Building PCRE2 without using autotools 113-------------------------------------- 114 115The use of autotools (in particular, libtool) is problematic in some 116environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD 117file for ways of building PCRE2 without using autotools. 118 119 120Building PCRE2 using autotools 121------------------------------ 122 123The following instructions assume the use of the widely used "configure; make; 124make install" (autotools) process. 125 126To build PCRE2 on system that supports autotools, first run the "configure" 127command from the PCRE2 distribution directory, with your current directory set 128to the directory where you want the files to be created. This command is a 129standard GNU "autoconf" configuration script, for which generic instructions 130are supplied in the file INSTALL. 131 132Most commonly, people build PCRE2 within its own distribution directory, and in 133this case, on many systems, just running "./configure" is sufficient. However, 134the usual methods of changing standard defaults are available. For example: 135 136CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local 137 138This command specifies that the C compiler should be run with the flags '-O2 139-Wall' instead of the default, and that "make install" should install PCRE2 140under /opt/local instead of the default /usr/local. 141 142If you want to build in a different directory, just run "configure" with that 143directory as current. For example, suppose you have unpacked the PCRE2 source 144into /source/pcre2/pcre2-xxx, but you want to build it in 145/build/pcre2/pcre2-xxx: 146 147cd /build/pcre2/pcre2-xxx 148/source/pcre2/pcre2-xxx/configure 149 150PCRE2 is written in C and is normally compiled as a C library. However, it is 151possible to build it as a C++ library, though the provided building apparatus 152does not have any features to support this. 153 154There are some optional features that can be included or omitted from the PCRE2 155library. They are also documented in the pcre2build man page. 156 157. By default, both shared and static libraries are built. You can change this 158 by adding one of these options to the "configure" command: 159 160 --disable-shared 161 --disable-static 162 163 (See also "Shared libraries on Unix-like systems" below.) 164 165. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to 166 the "configure" command, the 16-bit library is also built. If you add 167 --enable-pcre2-32 to the "configure" command, the 32-bit library is also 168 built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 169 to disable building the 8-bit library. 170 171. If you want to include support for just-in-time (JIT) compiling, which can 172 give large performance improvements on certain platforms, add --enable-jit to 173 the "configure" command. This support is available only for certain hardware 174 architectures. If you try to enable it on an unsupported architecture, there 175 will be a compile time error. 176 177. If you do not want to make use of the support for UTF-8 Unicode character 178 strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit 179 library, or UTF-32 Unicode character strings in the 32-bit library, you can 180 add --disable-unicode to the "configure" command. This reduces the size of 181 the libraries. It is not possible to configure one library with Unicode 182 support, and another without, in the same configuration. 183 184 When Unicode support is available, the use of a UTF encoding still has to be 185 enabled by setting the PCRE2_UTF option at run time or starting a pattern 186 with (*UTF). When PCRE2 is compiled with Unicode support, its input can only 187 either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. It is 188 not possible to use both --enable-unicode and --enable-ebcdic at the same 189 time. 190 191 As well as supporting UTF strings, Unicode support includes support for the 192 \P, \p, and \X sequences that recognize Unicode character properties. 193 However, only the basic two-letter properties such as Lu are supported. 194 Escape sequences such as \d and \w in patterns do not by default make use of 195 Unicode properties, but can be made to do so by setting the PCRE2_UCP option 196 or starting a pattern with (*UCP). 197 198. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any 199 of the preceding, or any of the Unicode newline sequences, as indicating the 200 end of a line. Whatever you specify at build time is the default; the caller 201 of PCRE2 can change the selection at run time. The default newline indicator 202 is a single LF character (the Unix standard). You can specify the default 203 newline indicator by adding --enable-newline-is-cr, --enable-newline-is-lf, 204 --enable-newline-is-crlf, --enable-newline-is-anycrlf, or 205 --enable-newline-is-any to the "configure" command, respectively. 206 207 If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of 208 the standard tests will fail, because the lines in the test files end with 209 LF. Even if the files are edited to change the line endings, there are likely 210 to be some failures. With --enable-newline-is-anycrlf or 211 --enable-newline-is-any, many tests should succeed, but there may be some 212 failures. 213 214. By default, the sequence \R in a pattern matches any Unicode line ending 215 sequence. This is independent of the option specifying what PCRE2 considers 216 to be the end of a line (see above). However, the caller of PCRE2 can 217 restrict \R to match only CR, LF, or CRLF. You can make this the default by 218 adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). 219 220. In a pattern, the escape sequence \C matches a single code unit, even in a 221 UTF mode. This can be dangerous because it breaks up multi-code-unit 222 characters. You can build PCRE2 with the use of \C permanently locked out by 223 adding --enable-never-backslash-C (note the upper case C) to the "configure" 224 command. When \C is allowed by the library, individual applications can lock 225 it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. 226 227. PCRE2 has a counter that limits the depth of nesting of parentheses in a 228 pattern. This limits the amount of system stack that a pattern uses when it 229 is compiled. The default is 250, but you can change it by setting, for 230 example, 231 232 --with-parens-nest-limit=500 233 234. PCRE2 has a counter that can be set to limit the amount of resources it uses 235 when matching a pattern. If the limit is exceeded during a match, the match 236 fails. The default is ten million. You can change the default by setting, for 237 example, 238 239 --with-match-limit=500000 240 241 on the "configure" command. This is just the default; individual calls to 242 pcre2_match() can supply their own value. There is more discussion on the 243 pcre2api man page. 244 245. There is a separate counter that limits the depth of recursive function calls 246 during a matching process. This also has a default of ten million, which is 247 essentially "unlimited". You can change the default by setting, for example, 248 249 --with-match-limit-recursion=500000 250 251 Recursive function calls use up the runtime stack; running out of stack can 252 cause programs to crash in strange ways. There is a discussion about stack 253 sizes in the pcre2stack man page. 254 255. In the 8-bit library, the default maximum compiled pattern size is around 256 64K. You can increase this by adding --with-link-size=3 to the "configure" 257 command. PCRE2 then uses three bytes instead of two for offsets to different 258 parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is 259 the same as --with-link-size=4, which (in both libraries) uses four-byte 260 offsets. Increasing the internal link size reduces performance in the 8-bit 261 and 16-bit libraries. In the 32-bit library, the link size setting is 262 ignored, as 4-byte offsets are always used. 263 264. You can build PCRE2 so that its internal match() function that is called from 265 pcre2_match() does not call itself recursively. Instead, it uses memory 266 blocks obtained from the heap to save data that would otherwise be saved on 267 the stack. To build PCRE2 like this, use 268 269 --disable-stack-for-recursion 270 271 on the "configure" command. PCRE2 runs more slowly in this mode, but it may 272 be necessary in environments with limited stack sizes. This applies only to 273 the normal execution of the pcre2_match() function; if JIT support is being 274 successfully used, it is not relevant. Equally, it does not apply to 275 pcre2_dfa_match(), which does not use deeply nested recursion. There is a 276 discussion about stack sizes in the pcre2stack man page. 277 278. For speed, PCRE2 uses four tables for manipulating and identifying characters 279 whose code point values are less than 256. By default, it uses a set of 280 tables for ASCII encoding that is part of the distribution. If you specify 281 282 --enable-rebuild-chartables 283 284 a program called dftables is compiled and run in the default C locale when 285 you obey "make". It builds a source file called pcre2_chartables.c. If you do 286 not specify this option, pcre2_chartables.c is created as a copy of 287 pcre2_chartables.c.dist. See "Character tables" below for further 288 information. 289 290. It is possible to compile PCRE2 for use on systems that use EBCDIC as their 291 character code (as opposed to ASCII/Unicode) by specifying 292 293 --enable-ebcdic --disable-unicode 294 295 This automatically implies --enable-rebuild-chartables (see above). However, 296 when PCRE2 is built this way, it always operates in EBCDIC. It cannot support 297 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, 298 which specifies that the code value for the EBCDIC NL character is 0x25 299 instead of the default 0x15. 300 301. If you specify --enable-debug, additional debugging code is included in the 302 build. This option is intended for use by the PCRE2 maintainers. 303 304. In environments where valgrind is installed, if you specify 305 306 --enable-valgrind 307 308 PCRE2 will use valgrind annotations to mark certain memory regions as 309 unaddressable. This allows it to detect invalid memory accesses, and is 310 mostly useful for debugging PCRE2 itself. 311 312. In environments where the gcc compiler is used and lcov version 1.6 or above 313 is installed, if you specify 314 315 --enable-coverage 316 317 the build process implements a code coverage report for the test suite. The 318 report is generated by running "make coverage". If ccache is installed on 319 your system, it must be disabled when building PCRE2 for coverage reporting. 320 You can do this by setting the environment variable CCACHE_DISABLE=1 before 321 running "make" to build PCRE2. There is more information about coverage 322 reporting in the "pcre2build" documentation. 323 324. When JIT support is enabled, pcre2grep automatically makes use of it, unless 325 you add --disable-pcre2grep-jit to the "configure" command. 326 327. On non-Windows sytems there is support for calling external scripts during 328 matching in the pcre2grep command via PCRE2's callout facility with string 329 arguments. This support can be disabled by adding --disable-pcre2grep-callout 330 to the "configure" command. 331 332. The pcre2grep program currently supports only 8-bit data files, and so 333 requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use 334 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by 335 specifying one or both of 336 337 --enable-pcre2grep-libz 338 --enable-pcre2grep-libbz2 339 340 Of course, the relevant libraries must be installed on your system. 341 342. The default size (in bytes) of the internal buffer used by pcre2grep can be 343 set by, for example: 344 345 --with-pcre2grep-bufsize=51200 346 347 The value must be a plain integer. The default is 20480. 348 349. It is possible to compile pcre2test so that it links with the libreadline 350 or libedit libraries, by specifying, respectively, 351 352 --enable-pcre2test-libreadline or --enable-pcre2test-libedit 353 354 If this is done, when pcre2test's input is from a terminal, it reads it using 355 the readline() function. This provides line-editing and history facilities. 356 Note that libreadline is GPL-licenced, so if you distribute a binary of 357 pcre2test linked in this way, there may be licensing issues. These can be 358 avoided by linking with libedit (which has a BSD licence) instead. 359 360 Enabling libreadline causes the -lreadline option to be added to the 361 pcre2test build. In many operating environments with a sytem-installed 362 readline library this is sufficient. However, in some environments (e.g. if 363 an unmodified distribution version of readline is in use), it may be 364 necessary to specify something like LIBS="-lncurses" as well. This is 365 because, to quote the readline INSTALL, "Readline uses the termcap functions, 366 but does not link with the termcap or curses library itself, allowing 367 applications which link with readline the to choose an appropriate library." 368 If you get error messages about missing functions tgetstr, tgetent, tputs, 369 tgetflag, or tgoto, this is the problem, and linking with the ncurses library 370 should fix it. 371 372The "configure" script builds the following files for the basic C library: 373 374. Makefile the makefile that builds the library 375. src/config.h build-time configuration options for the library 376. src/pcre2.h the public PCRE2 header file 377. pcre2-config script that shows the building settings such as CFLAGS 378 that were set for "configure" 379. libpcre2-8.pc ) 380. libpcre2-16.pc ) data for the pkg-config command 381. libpcre2-32.pc ) 382. libpcre2-posix.pc ) 383. libtool script that builds shared and/or static libraries 384 385Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 386tarballs under the names config.h.generic and pcre2.h.generic. These are 387provided for those who have to build PCRE2 without using "configure" or CMake. 388If you use "configure" or CMake, the .generic versions are not used. 389 390The "configure" script also creates config.status, which is an executable 391script that can be run to recreate the configuration, and config.log, which 392contains compiler output from tests that "configure" runs. 393 394Once "configure" has run, you can run "make". This builds whichever of the 395libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test 396program called pcre2test. If you enabled JIT support with --enable-jit, another 397test program called pcre2_jit_test is built as well. If the 8-bit library is 398built, libpcre2-posix and the pcre2grep command are also built. Running 399"make" with the -j option may speed up compilation on multiprocessor systems. 400 401The command "make check" runs all the appropriate tests. Details of the PCRE2 402tests are given below in a separate section of this document. The -j option of 403"make" can also be used when running the tests. 404 405You can use "make install" to install PCRE2 into live directories on your 406system. The following are installed (file names are all relative to the 407<prefix> that is set when "configure" is run): 408 409 Commands (bin): 410 pcre2test 411 pcre2grep (if 8-bit support is enabled) 412 pcre2-config 413 414 Libraries (lib): 415 libpcre2-8 (if 8-bit support is enabled) 416 libpcre2-16 (if 16-bit support is enabled) 417 libpcre2-32 (if 32-bit support is enabled) 418 libpcre2-posix (if 8-bit support is enabled) 419 420 Configuration information (lib/pkgconfig): 421 libpcre2-8.pc 422 libpcre2-16.pc 423 libpcre2-32.pc 424 libpcre2-posix.pc 425 426 Header files (include): 427 pcre2.h 428 pcre2posix.h 429 430 Man pages (share/man/man{1,3}): 431 pcre2grep.1 432 pcre2test.1 433 pcre2-config.1 434 pcre2.3 435 pcre2*.3 (lots more pages, all starting "pcre2") 436 437 HTML documentation (share/doc/pcre2/html): 438 index.html 439 *.html (lots more pages, hyperlinked from index.html) 440 441 Text file documentation (share/doc/pcre2): 442 AUTHORS 443 COPYING 444 ChangeLog 445 LICENCE 446 NEWS 447 README 448 pcre2.txt (a concatenation of the man(3) pages) 449 pcre2test.txt the pcre2test man page 450 pcre2grep.txt the pcre2grep man page 451 pcre2-config.txt the pcre2-config man page 452 453If you want to remove PCRE2 from your system, you can run "make uninstall". 454This removes all the files that "make install" installed. However, it does not 455remove any directories, because these are often shared with other programs. 456 457 458Retrieving configuration information 459------------------------------------ 460 461Running "make install" installs the command pcre2-config, which can be used to 462recall information about the PCRE2 configuration and installation. For example: 463 464 pcre2-config --version 465 466prints the version number, and 467 468 pcre2-config --libs8 469 470outputs information about where the 8-bit library is installed. This command 471can be included in makefiles for programs that use PCRE2, saving the programmer 472from having to remember too many details. Run pcre2-config with no arguments to 473obtain a list of possible arguments. 474 475The pkg-config command is another system for saving and retrieving information 476about installed libraries. Instead of separate commands for each library, a 477single command is used. For example: 478 479 pkg-config --libs libpcre2-16 480 481The data is held in *.pc files that are installed in a directory called 482<prefix>/lib/pkgconfig. 483 484 485Shared libraries 486---------------- 487 488The default distribution builds PCRE2 as shared libraries and static libraries, 489as long as the operating system supports shared libraries. Shared library 490support relies on the "libtool" script which is built as part of the 491"configure" process. 492 493The libtool script is used to compile and link both shared and static 494libraries. They are placed in a subdirectory called .libs when they are newly 495built. The programs pcre2test and pcre2grep are built to use these uninstalled 496libraries (by means of wrapper scripts in the case of shared libraries). When 497you use "make install" to install shared libraries, pcre2grep and pcre2test are 498automatically re-built to use the newly installed shared libraries before being 499installed themselves. However, the versions left in the build directory still 500use the uninstalled libraries. 501 502To build PCRE2 using static libraries only you must use --disable-shared when 503configuring it. For example: 504 505./configure --prefix=/usr/gnu --disable-shared 506 507Then run "make" in the usual way. Similarly, you can use --disable-static to 508build only shared libraries. 509 510 511Cross-compiling using autotools 512------------------------------- 513 514You can specify CC and CFLAGS in the normal way to the "configure" command, in 515order to cross-compile PCRE2 for some other host. However, you should NOT 516specify --enable-rebuild-chartables, because if you do, the dftables.c source 517file is compiled and run on the local host, in order to generate the inbuilt 518character tables (the pcre2_chartables.c file). This will probably not work, 519because dftables.c needs to be compiled with the local compiler, not the cross 520compiler. 521 522When --enable-rebuild-chartables is not specified, pcre2_chartables.c is 523created by making a copy of pcre2_chartables.c.dist, which is a default set of 524tables that assumes ASCII code. Cross-compiling with the default tables should 525not be a problem. 526 527If you need to modify the character tables when cross-compiling, you should 528move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand 529and run it on the local host to make a new version of pcre2_chartables.c.dist. 530Then when you cross-compile PCRE2 this new version of the tables will be used. 531 532 533Making new tarballs 534------------------- 535 536The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and 537zip formats. The command "make distcheck" does the same, but then does a trial 538build of the new distribution to ensure that it works. 539 540If you have modified any of the man page sources in the doc directory, you 541should first run the PrepareRelease script before making a distribution. This 542script creates the .txt and HTML forms of the documentation from the man pages. 543 544 545Testing PCRE2 546------------ 547 548To test the basic PCRE2 library on a Unix-like system, run the RunTest script. 549There is another script called RunGrepTest that tests the pcre2grep command. 550When JIT support is enabled, a third test program called pcre2_jit_test is 551built. Both the scripts and all the program tests are run if you obey "make 552check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD. 553 554The RunTest script runs the pcre2test test program (which is documented in its 555own man page) on each of the relevant testinput files in the testdata 556directory, and compares the output with the contents of the corresponding 557testoutput files. RunTest uses a file called testtry to hold the main output 558from pcre2test. Other files whose names begin with "test" are used as working 559files in some tests. 560 561Some tests are relevant only when certain build-time options were selected. For 562example, the tests for UTF-8/16/32 features are run only when Unicode support 563is available. RunTest outputs a comment when it skips a test. 564 565Many (but not all) of the tests that are not skipped are run twice if JIT 566support is available. On the second run, JIT compilation is forced. This 567testing can be suppressed by putting "nojit" on the RunTest command line. 568 569The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit 570libraries that are enabled. If you want to run just one set of tests, call 571RunTest with either the -8, -16 or -32 option. 572 573If valgrind is installed, you can run the tests under it by putting "valgrind" 574on the RunTest command line. To run pcre2test on just one or more specific test 575files, give their numbers as arguments to RunTest, for example: 576 577 RunTest 2 7 11 578 579You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the 580end), or a number preceded by ~ to exclude a test. For example: 581 582 Runtest 3-15 ~10 583 584This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests 585except test 13. Whatever order the arguments are in, the tests are always run 586in numerical order. 587 588You can also call RunTest with the single argument "list" to cause it to output 589a list of tests. 590 591The test sequence starts with "test 0", which is a special test that has no 592input file, and whose output is not checked. This is because it will be 593different on different hardware and with different configurations. The test 594exists in order to exercise some of pcre2test's code that would not otherwise 595be run. 596 597Tests 1 and 2 can always be run, as they expect only plain text strings (not 598UTF) and make no use of Unicode properties. The first test file can be fed 599directly into the perltest.sh script to check that Perl gives the same results. 600The only difference you should see is in the first few lines, where the Perl 601version is given instead of the PCRE2 version. The second set of tests check 602auxiliary functions, error detection, and run-time flags that are specific to 603PCRE2. It also uses the debugging flags to check some of the internals of 604pcre2_compile(). 605 606If you build PCRE2 with a locale setting that is not the standard C locale, the 607character tables may be different (see next paragraph). In some cases, this may 608cause failures in the second set of tests. For example, in a locale where the 609isprint() function yields TRUE for characters in the range 128-255, the use of 610[:isascii:] inside a character class defines a different set of characters, and 611this shows up in this test as a difference in the compiled code, which is being 612listed for checking. For example, where the comparison test output contains 613[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other 614cases. This is not a bug in PCRE2. 615 616Test 3 checks pcre2_maketables(), the facility for building a set of character 617tables for a specific locale and using them instead of the default tables. The 618script uses the "locale" command to check for the availability of the "fr_FR", 619"french", or "fr" locale, and uses the first one that it finds. If the "locale" 620command fails, or if its output doesn't include "fr_FR", "french", or "fr" in 621the list of available locales, the third test cannot be run, and a comment is 622output to say why. If running this test produces an error like this: 623 624 ** Failed to set locale "fr_FR" 625 626it means that the given locale is not available on your system, despite being 627listed by "locale". This does not mean that PCRE2 is broken. There are three 628alternative output files for the third test, because three different versions 629of the French locale have been encountered. The test passes if its output 630matches any one of them. 631 632Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible 633with the perltest.sh script, and test 5 checking PCRE2-specific things. 634 635Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in 636non-UTF mode and UTF-mode with Unicode property support, respectively. 637 638Test 8 checks some internal offsets and code size features; it is run only when 639the default "link size" of 2 is set (in other cases the sizes change) and when 640Unicode support is enabled. 641 642Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in 64316-bit and 32-bit modes. These are tests that generate different output in 6448-bit mode. Each pair are for general cases and Unicode support, respectively. 645Test 13 checks the handling of non-UTF characters greater than 255 by 646pcre2_dfa_match() in 16-bit and 32-bit modes. 647 648Test 14 contains a number of tests that must not be run with JIT. They check, 649among other non-JIT things, the match-limiting features of the intepretive 650matcher. 651 652Test 15 is run only when JIT support is not available. It checks that an 653attempt to use JIT has the expected behaviour. 654 655Test 16 is run only when JIT support is available. It checks JIT complete and 656partial modes, match-limiting under JIT, and other JIT-specific features. 657 658Tests 17 and 18 are run only in 8-bit mode. They check the POSIX interface to 659the 8-bit library, without and with Unicode support, respectively. 660 661Test 19 checks the serialization functions by writing a set of compiled 662patterns to a file, and then reloading and checking them. 663 664 665Character tables 666---------------- 667 668For speed, PCRE2 uses four tables for manipulating and identifying characters 669whose code point values are less than 256. By default, a set of tables that is 670built into the library is used. The pcre2_maketables() function can be called 671by an application to create a new set of tables in the current locale. This are 672passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a 673compile context. 674 675The source file called pcre2_chartables.c contains the default set of tables. 676By default, this is created as a copy of pcre2_chartables.c.dist, which 677contains tables for ASCII coding. However, if --enable-rebuild-chartables is 678specified for ./configure, a different version of pcre2_chartables.c is built 679by the program dftables (compiled from dftables.c), which uses the ANSI C 680character handling functions such as isalnum(), isalpha(), isupper(), 681islower(), etc. to build the table sources. This means that the default C 682locale which is set for your system will control the contents of these default 683tables. You can change the default tables by editing pcre2_chartables.c and 684then re-building PCRE2. If you do this, you should take care to ensure that the 685file does not get automatically re-generated. The best way to do this is to 686move pcre2_chartables.c.dist out of the way and replace it with your customized 687tables. 688 689When the dftables program is run as a result of --enable-rebuild-chartables, 690it uses the default C locale that is set on your system. It does not pay 691attention to the LC_xxx environment variables. In other words, it uses the 692system's default locale rather than whatever the compiling user happens to have 693set. If you really do want to build a source set of character tables in a 694locale that is specified by the LC_xxx variables, you can run the dftables 695program by hand with the -L option. For example: 696 697 ./dftables -L pcre2_chartables.c.special 698 699The first two 256-byte tables provide lower casing and case flipping functions, 700respectively. The next table consists of three 32-byte bit maps which identify 701digits, "word" characters, and white space, respectively. These are used when 702building 32-byte bit maps that represent character classes for code points less 703than 256. The final 256-byte table has bits indicating various character types, 704as follows: 705 706 1 white space character 707 2 letter 708 4 decimal digit 709 8 hexadecimal digit 710 16 alphanumeric or '_' 711 128 regular expression metacharacter or binary zero 712 713You should not alter the set of characters that contain the 128 bit, as that 714will cause PCRE2 to malfunction. 715 716 717File manifest 718------------- 719 720The distribution should contain the files listed below. 721 722(A) Source files for the PCRE2 library functions and their headers are found in 723 the src directory: 724 725 src/dftables.c auxiliary program for building pcre2_chartables.c 726 when --enable-rebuild-chartables is specified 727 728 src/pcre2_chartables.c.dist a default set of character tables that assume 729 ASCII coding; unless --enable-rebuild-chartables is 730 specified, used by copying to pcre2_chartables.c 731 732 src/pcre2posix.c ) 733 src/pcre2_auto_possess.c ) 734 src/pcre2_compile.c ) 735 src/pcre2_config.c ) 736 src/pcre2_context.c ) 737 src/pcre2_dfa_match.c ) 738 src/pcre2_error.c ) 739 src/pcre2_find_bracket.c ) 740 src/pcre2_jit_compile.c ) 741 src/pcre2_jit_match.c ) sources for the functions in the library, 742 src/pcre2_jit_misc.c ) and some internal functions that they use 743 src/pcre2_maketables.c ) 744 src/pcre2_match.c ) 745 src/pcre2_match_data.c ) 746 src/pcre2_newline.c ) 747 src/pcre2_ord2utf.c ) 748 src/pcre2_pattern_info.c ) 749 src/pcre2_serialize.c ) 750 src/pcre2_string_utils.c ) 751 src/pcre2_study.c ) 752 src/pcre2_substitute.c ) 753 src/pcre2_substring.c ) 754 src/pcre2_tables.c ) 755 src/pcre2_ucd.c ) 756 src/pcre2_valid_utf.c ) 757 src/pcre2_xclass.c ) 758 759 src/pcre2_printint.c debugging function that is used by pcre2test, 760 761 src/config.h.in template for config.h, when built by "configure" 762 src/pcre2.h.in template for pcre2.h when built by "configure" 763 src/pcre2posix.h header for the external POSIX wrapper API 764 src/pcre2_internal.h header for internal use 765 src/pcre2_intmodedep.h a mode-specific internal header 766 src/pcre2_ucp.h header for Unicode property handling 767 768 sljit/* source files for the JIT compiler 769 770(B) Source files for programs that use PCRE2: 771 772 src/pcre2demo.c simple demonstration of coding calls to PCRE2 773 src/pcre2grep.c source of a grep utility that uses PCRE2 774 src/pcre2test.c comprehensive test program 775 src/pcre2_printint.c part of pcre2test 776 src/pcre2_jit_test.c JIT test program 777 778(C) Auxiliary files: 779 780 132html script to turn "man" pages into HTML 781 AUTHORS information about the author of PCRE2 782 ChangeLog log of changes to the code 783 CleanTxt script to clean nroff output for txt man pages 784 Detrail script to remove trailing spaces 785 HACKING some notes about the internals of PCRE2 786 INSTALL generic installation instructions 787 LICENCE conditions for the use of PCRE2 788 COPYING the same, using GNU's standard name 789 Makefile.in ) template for Unix Makefile, which is built by 790 ) "configure" 791 Makefile.am ) the automake input that was used to create 792 ) Makefile.in 793 NEWS important changes in this release 794 NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools 795 PrepareRelease script to make preparations for "make dist" 796 README this file 797 RunTest a Unix shell script for running tests 798 RunGrepTest a Unix shell script for pcre2grep tests 799 aclocal.m4 m4 macros (generated by "aclocal") 800 config.guess ) files used by libtool, 801 config.sub ) used only when building a shared library 802 configure a configuring shell script (built by autoconf) 803 configure.ac ) the autoconf input that was used to build 804 ) "configure" and config.h 805 depcomp ) script to find program dependencies, generated by 806 ) automake 807 doc/*.3 man page sources for PCRE2 808 doc/*.1 man page sources for pcre2grep and pcre2test 809 doc/index.html.src the base HTML page 810 doc/html/* HTML documentation 811 doc/pcre2.txt plain text version of the man pages 812 doc/pcre2test.txt plain text documentation of test program 813 install-sh a shell script for installing files 814 libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config 815 libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config 816 libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config 817 libpcre2posix.pc.in template for libpcre2posix.pc for pkg-config 818 ltmain.sh file used to build a libtool script 819 missing ) common stub for a few missing GNU programs while 820 ) installing, generated by automake 821 mkinstalldirs script for making install directories 822 perltest.sh Script for running a Perl test program 823 pcre2-config.in source of script which retains PCRE2 information 824 testdata/testinput* test data for main library tests 825 testdata/testoutput* expected test results 826 testdata/grep* input and output for pcre2grep tests 827 testdata/* other supporting test files 828 829(D) Auxiliary files for cmake support 830 831 cmake/COPYING-CMAKE-SCRIPTS 832 cmake/FindPackageHandleStandardArgs.cmake 833 cmake/FindEditline.cmake 834 cmake/FindReadline.cmake 835 CMakeLists.txt 836 config-cmake.h.in 837 838(E) Auxiliary files for building PCRE2 "by hand" 839 840 pcre2.h.generic ) a version of the public PCRE2 header file 841 ) for use in non-"configure" environments 842 config.h.generic ) a version of config.h for use in non-"configure" 843 ) environments 844 845Philip Hazel 846Email local part: ph10 847Email domain: cam.ac.uk 848Last updated: 01 April 2016 849