Name |
Date |
Size |
#Lines |
LOC |
||
---|---|---|---|---|---|---|
.. | - | - | ||||
.github/workflows/ | 03-May-2024 | - | 261 | 172 | ||
cmake/ | 03-May-2024 | - | 291 | 258 | ||
doc/ | 03-May-2024 | - | 56,447 | 51,163 | ||
include/ | 03-May-2024 | - | 6 | 3 | ||
m4/ | 03-May-2024 | - | 9,485 | 8,573 | ||
maint/ | 03-May-2024 | - | 71,017 | 68,015 | ||
src/ | 03-May-2024 | - | 121,615 | 88,557 | ||
testdata/ | 03-May-2024 | - | 106,678 | 89,866 | ||
.bazelrc | D | 03-May-2024 | 127 | 4 | 3 | |
.gitignore | D | 03-May-2024 | 926 | 88 | 77 | |
132html | D | 03-May-2024 | 6.9 KiB | 315 | 218 | |
AUTHORS | D | 03-May-2024 | 749 | 37 | 24 | |
Android.bp | D | 03-May-2024 | 2.9 KiB | 108 | 103 | |
BUILD.bazel | D | 03-May-2024 | 1.8 KiB | 73 | 67 | |
CMakeLists.txt | D | 03-May-2024 | 46.3 KiB | 1,202 | 1,022 | |
COPYING | D | 03-May-2024 | 97 | 6 | 3 | |
ChangeLog | D | 03-May-2024 | 129.1 KiB | 2,828 | 2,057 | |
CheckMan | D | 03-May-2024 | 1.7 KiB | 79 | 65 | |
CleanTxt | D | 03-May-2024 | 2.9 KiB | 114 | 72 | |
Detrail | D | 03-May-2024 | 643 | 36 | 23 | |
HACKING | D | 03-May-2024 | 37.9 KiB | 831 | 656 | |
INSTALL | D | 03-May-2024 | 15.4 KiB | 369 | 287 | |
LICENCE | D | 03-May-2024 | 3.4 KiB | 95 | 67 | |
METADATA | D | 03-May-2024 | 912 | 27 | 25 | |
MODULE.bazel | D | 03-May-2024 | 183 | 9 | 7 | |
MODULE_LICENSE_BSD | D | 03-May-2024 | 0 | |||
Makefile.am | D | 03-May-2024 | 26.3 KiB | 901 | 698 | |
Makefile.in | D | 03-May-2024 | 235.8 KiB | 3,670 | 3,315 | |
NEWS | D | 03-May-2024 | 15.7 KiB | 437 | 286 | |
NON-AUTOTOOLS-BUILD | D | 03-May-2024 | 18.3 KiB | 416 | 312 | |
NOTICE | D | 03-May-2024 | 3 KiB | 93 | 64 | |
OWNERS | D | 03-May-2024 | 46 | 2 | 1 | |
PrepareRelease | D | 03-May-2024 | 6.9 KiB | 240 | 203 | |
README | D | 03-May-2024 | 43 KiB | 925 | 729 | |
README.md | D | 03-May-2024 | 2.3 KiB | 57 | 40 | |
RunGrepTest | D | 03-May-2024 | 51.4 KiB | 1,046 | 755 | |
RunGrepTest.bat | D | 03-May-2024 | 34.4 KiB | 700 | 526 | |
RunTest | D | 03-May-2024 | 26.1 KiB | 917 | 661 | |
RunTest.bat | D | 03-May-2024 | 13.5 KiB | 527 | 474 | |
WORKSPACE.bazel | D | 03-May-2024 | 19 | 2 | 1 | |
aclocal.m4 | D | 03-May-2024 | 53.5 KiB | 1,494 | 1,354 | |
ar-lib | D | 03-May-2024 | 5.7 KiB | 272 | 211 | |
autogen.sh | D | 03-May-2024 | 1.2 KiB | 46 | 25 | |
compile | D | 03-May-2024 | 7.2 KiB | 349 | 259 | |
config-cmake.h.in | D | 03-May-2024 | 1.5 KiB | 55 | 45 | |
config.guess | D | 03-May-2024 | 48.2 KiB | 1,749 | 1,522 | |
config.sub | D | 03-May-2024 | 34.4 KiB | 1,885 | 1,698 | |
configure | D | 03-May-2024 | 534.2 KiB | 18,666 | 15,639 | |
configure.ac | D | 03-May-2024 | 40.8 KiB | 1,130 | 958 | |
depcomp | D | 03-May-2024 | 23 KiB | 792 | 502 | |
index.md | D | 03-May-2024 | 2.3 KiB | 57 | 40 | |
install-sh | D | 03-May-2024 | 15 KiB | 542 | 352 | |
libpcre2-16.pc.in | D | 03-May-2024 | 406 | 14 | 11 | |
libpcre2-32.pc.in | D | 03-May-2024 | 406 | 14 | 11 | |
libpcre2-8.pc.in | D | 03-May-2024 | 403 | 14 | 11 | |
libpcre2-posix.pc.in | D | 03-May-2024 | 342 | 14 | 11 | |
ltmain.sh | D | 03-May-2024 | 325.3 KiB | 11,437 | 8,214 | |
missing | D | 03-May-2024 | 6.7 KiB | 216 | 143 | |
pcre2-config.in | D | 03-May-2024 | 2.3 KiB | 122 | 109 | |
pcre2_fuzzer.dict | D | 03-May-2024 | 435 | 51 | 45 | |
pcre2_fuzzer.options | D | 03-May-2024 | 37 | 3 | 2 | |
perltest.sh | D | 03-May-2024 | 11.1 KiB | 401 | 227 | |
test-driver | D | 03-May-2024 | 4.8 KiB | 154 | 89 |
README
1README file for PCRE2 (Perl-compatible regular expression library) 2------------------------------------------------------------------ 3 4PCRE2 is a re-working of the original PCRE1 library to provide an entirely new 5API. Since its initial release in 2015, there has been further development of 6the code and it now differs from PCRE1 in more than just the API. There are new 7features, and the internals have been improved. The original PCRE1 library is 8now obsolete and no longer maintained. The latest release of PCRE2 is available 9in .tar.gz, tar.bz2, or .zip form from this GitHub repository: 10 11https://github.com/PCRE2Project/pcre2/releases 12 13There is a mailing list for discussion about the development of PCRE2 at 14pcre2-dev@googlegroups.com. You can subscribe by sending an email to 15pcre2-dev+subscribe@googlegroups.com. 16 17You can access the archives and also subscribe or manage your subscription 18here: 19 20https://groups.google.com/g/pcre2-dev 21 22Please read the NEWS file if you are upgrading from a previous release. The 23contents of this README file are: 24 25 The PCRE2 APIs 26 Documentation for PCRE2 27 Contributions by users of PCRE2 28 Building PCRE2 on non-Unix-like systems 29 Building PCRE2 without using autotools 30 Building PCRE2 using autotools 31 Retrieving configuration information 32 Shared libraries 33 Cross-compiling using autotools 34 Making new tarballs 35 Testing PCRE2 36 Character tables 37 File manifest 38 39 40The PCRE2 APIs 41-------------- 42 43PCRE2 is written in C, and it has its own API. There are three sets of 44functions, one for the 8-bit library, which processes strings of bytes, one for 45the 16-bit library, which processes strings of 16-bit values, and one for the 4632-bit library, which processes strings of 32-bit values. Unlike PCRE1, there 47are no C++ wrappers. 48 49The distribution does contain a set of C wrapper functions for the 8-bit 50library that are based on the POSIX regular expression API (see the pcre2posix 51man page). These are built into a library called libpcre2-posix. Note that this 52just provides a POSIX calling interface to PCRE2; the regular expressions 53themselves still follow Perl syntax and semantics. The POSIX API is restricted, 54and does not give full access to all of PCRE2's facilities. 55 56The header file for the POSIX-style functions is called pcre2posix.h. The 57official POSIX name is regex.h, but I did not want to risk possible problems 58with existing files of that name by distributing it that way. To use PCRE2 with 59an existing program that uses the POSIX API, pcre2posix.h will have to be 60renamed or pointed at by a link (or the program modified, of course). See the 61pcre2posix documentation for more details. 62 63 64Documentation for PCRE2 65----------------------- 66 67If you install PCRE2 in the normal way on a Unix-like system, you will end up 68with a set of man pages whose names all start with "pcre2". The one that is 69just called "pcre2" lists all the others. In addition to these man pages, the 70PCRE2 documentation is supplied in two other forms: 71 72 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and 73 doc/pcre2test.txt in the source distribution. The first of these is a 74 concatenation of the text forms of all the section 3 man pages except the 75 listing of pcre2demo.c and those that summarize individual functions. The 76 other two are the text forms of the section 1 man pages for the pcre2grep 77 and pcre2test commands. These text forms are provided for ease of scanning 78 with text editors or similar tools. They are installed in 79 <prefix>/share/doc/pcre2, where <prefix> is the installation prefix 80 (defaulting to /usr/local). 81 82 2. A set of files containing all the documentation in HTML form, hyperlinked 83 in various ways, and rooted in a file called index.html, is distributed in 84 doc/html and installed in <prefix>/share/doc/pcre2/html. 85 86 87Building PCRE2 on non-Unix-like systems 88--------------------------------------- 89 90For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if 91your system supports the use of "configure" and "make" you may be able to build 92PCRE2 using autotools in the same way as for many Unix-like systems. 93 94PCRE2 can also be configured using CMake, which can be run in various ways 95(command line, GUI, etc). This creates Makefiles, solution files, etc. The file 96NON-AUTOTOOLS-BUILD has information about CMake. 97 98PCRE2 has been compiled on many different operating systems. It should be 99straightforward to build PCRE2 on any system that has a Standard C compiler and 100library, because it uses only Standard C functions. 101 102 103Building PCRE2 without using autotools 104-------------------------------------- 105 106The use of autotools (in particular, libtool) is problematic in some 107environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD 108file for ways of building PCRE2 without using autotools. 109 110 111Building PCRE2 using autotools 112------------------------------ 113 114The following instructions assume the use of the widely used "configure; make; 115make install" (autotools) process. 116 117If you have downloaded and unpacked a PCRE2 release tarball, run the 118"configure" command from the PCRE2 directory, with your current directory set 119to the directory where you want the files to be created. This command is a 120standard GNU "autoconf" configuration script, for which generic instructions 121are supplied in the file INSTALL. 122 123The files in the GitHub repository do not contain "configure". If you have 124downloaded the PCRE2 source files from GitHub, before you can run "configure" 125you must run the shell script called autogen.sh. This runs a number of 126autotools to create a "configure" script (you must of course have the autotools 127commands installed in order to do this). 128 129Most commonly, people build PCRE2 within its own distribution directory, and in 130this case, on many systems, just running "./configure" is sufficient. However, 131the usual methods of changing standard defaults are available. For example: 132 133CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local 134 135This command specifies that the C compiler should be run with the flags '-O2 136-Wall' instead of the default, and that "make install" should install PCRE2 137under /opt/local instead of the default /usr/local. 138 139If you want to build in a different directory, just run "configure" with that 140directory as current. For example, suppose you have unpacked the PCRE2 source 141into /source/pcre2/pcre2-xxx, but you want to build it in 142/build/pcre2/pcre2-xxx: 143 144cd /build/pcre2/pcre2-xxx 145/source/pcre2/pcre2-xxx/configure 146 147PCRE2 is written in C and is normally compiled as a C library. However, it is 148possible to build it as a C++ library, though the provided building apparatus 149does not have any features to support this. 150 151There are some optional features that can be included or omitted from the PCRE2 152library. They are also documented in the pcre2build man page. 153 154. By default, both shared and static libraries are built. You can change this 155 by adding one of these options to the "configure" command: 156 157 --disable-shared 158 --disable-static 159 160 (See also "Shared libraries on Unix-like systems" below.) 161 162. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to 163 the "configure" command, the 16-bit library is also built. If you add 164 --enable-pcre2-32 to the "configure" command, the 32-bit library is also 165 built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 166 to disable building the 8-bit library. 167 168. If you want to include support for just-in-time (JIT) compiling, which can 169 give large performance improvements on certain platforms, add --enable-jit to 170 the "configure" command. This support is available only for certain hardware 171 architectures. If you try to enable it on an unsupported architecture, there 172 will be a compile time error. If in doubt, use --enable-jit=auto, which 173 enables JIT only if the current hardware is supported. 174 175. If you are enabling JIT under SELinux environment you may also want to add 176 --enable-jit-sealloc, which enables the use of an executable memory allocator 177 that is compatible with SELinux. Warning: this allocator is experimental! 178 It does not support fork() operation and may crash when no disk space is 179 available. This option has no effect if JIT is disabled. 180 181. If you do not want to make use of the default support for UTF-8 Unicode 182 character strings in the 8-bit library, UTF-16 Unicode character strings in 183 the 16-bit library, or UTF-32 Unicode character strings in the 32-bit 184 library, you can add --disable-unicode to the "configure" command. This 185 reduces the size of the libraries. It is not possible to configure one 186 library with Unicode support, and another without, in the same configuration. 187 It is also not possible to use --enable-ebcdic (see below) with Unicode 188 support, so if this option is set, you must also use --disable-unicode. 189 190 When Unicode support is available, the use of a UTF encoding still has to be 191 enabled by setting the PCRE2_UTF option at run time or starting a pattern 192 with (*UTF). When PCRE2 is compiled with Unicode support, its input can only 193 either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. 194 195 As well as supporting UTF strings, Unicode support includes support for the 196 \P, \p, and \X sequences that recognize Unicode character properties. 197 However, only a subset of Unicode properties are supported; see the 198 pcre2pattern man page for details. Escape sequences such as \d and \w in 199 patterns do not by default make use of Unicode properties, but can be made to 200 do so by setting the PCRE2_UCP option or starting a pattern with (*UCP). 201 202. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any 203 of the preceding, or any of the Unicode newline sequences, or the NUL (zero) 204 character as indicating the end of a line. Whatever you specify at build time 205 is the default; the caller of PCRE2 can change the selection at run time. The 206 default newline indicator is a single LF character (the Unix standard). You 207 can specify the default newline indicator by adding --enable-newline-is-cr, 208 --enable-newline-is-lf, --enable-newline-is-crlf, 209 --enable-newline-is-anycrlf, --enable-newline-is-any, or 210 --enable-newline-is-nul to the "configure" command, respectively. 211 212. By default, the sequence \R in a pattern matches any Unicode line ending 213 sequence. This is independent of the option specifying what PCRE2 considers 214 to be the end of a line (see above). However, the caller of PCRE2 can 215 restrict \R to match only CR, LF, or CRLF. You can make this the default by 216 adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). 217 218. In a pattern, the escape sequence \C matches a single code unit, even in a 219 UTF mode. This can be dangerous because it breaks up multi-code-unit 220 characters. You can build PCRE2 with the use of \C permanently locked out by 221 adding --enable-never-backslash-C (note the upper case C) to the "configure" 222 command. When \C is allowed by the library, individual applications can lock 223 it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. 224 225. PCRE2 has a counter that limits the depth of nesting of parentheses in a 226 pattern. This limits the amount of system stack that a pattern uses when it 227 is compiled. The default is 250, but you can change it by setting, for 228 example, 229 230 --with-parens-nest-limit=500 231 232. PCRE2 has a counter that can be set to limit the amount of computing resource 233 it uses when matching a pattern. If the limit is exceeded during a match, the 234 match fails. The default is ten million. You can change the default by 235 setting, for example, 236 237 --with-match-limit=500000 238 239 on the "configure" command. This is just the default; individual calls to 240 pcre2_match() or pcre2_dfa_match() can supply their own value. There is more 241 discussion in the pcre2api man page (search for pcre2_set_match_limit). 242 243. There is a separate counter that limits the depth of nested backtracking 244 (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a 245 matching process, which indirectly limits the amount of heap memory that is 246 used, and in the case of pcre2_dfa_match() the amount of stack as well. This 247 counter also has a default of ten million, which is essentially "unlimited". 248 You can change the default by setting, for example, 249 250 --with-match-limit-depth=5000 251 252 There is more discussion in the pcre2api man page (search for 253 pcre2_set_depth_limit). 254 255. You can also set an explicit limit on the amount of heap memory used by 256 the pcre2_match() and pcre2_dfa_match() interpreters: 257 258 --with-heap-limit=500 259 260 The units are kibibytes (units of 1024 bytes). This limit does not apply when 261 the JIT optimization (which has its own memory control features) is used. 262 There is more discussion on the pcre2api man page (search for 263 pcre2_set_heap_limit). 264 265. In the 8-bit library, the default maximum compiled pattern size is around 266 64 kibibytes. You can increase this by adding --with-link-size=3 to the 267 "configure" command. PCRE2 then uses three bytes instead of two for offsets 268 to different parts of the compiled pattern. In the 16-bit library, 269 --with-link-size=3 is the same as --with-link-size=4, which (in both 270 libraries) uses four-byte offsets. Increasing the internal link size reduces 271 performance in the 8-bit and 16-bit libraries. In the 32-bit library, the 272 link size setting is ignored, as 4-byte offsets are always used. 273 274. For speed, PCRE2 uses four tables for manipulating and identifying characters 275 whose code point values are less than 256. By default, it uses a set of 276 tables for ASCII encoding that is part of the distribution. If you specify 277 278 --enable-rebuild-chartables 279 280 a program called pcre2_dftables is compiled and run in the default C locale 281 when you obey "make". It builds a source file called pcre2_chartables.c. If 282 you do not specify this option, pcre2_chartables.c is created as a copy of 283 pcre2_chartables.c.dist. See "Character tables" below for further 284 information. 285 286. It is possible to compile PCRE2 for use on systems that use EBCDIC as their 287 character code (as opposed to ASCII/Unicode) by specifying 288 289 --enable-ebcdic --disable-unicode 290 291 This automatically implies --enable-rebuild-chartables (see above). However, 292 when PCRE2 is built this way, it always operates in EBCDIC. It cannot support 293 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, 294 which specifies that the code value for the EBCDIC NL character is 0x25 295 instead of the default 0x15. 296 297. If you specify --enable-debug, additional debugging code is included in the 298 build. This option is intended for use by the PCRE2 maintainers. 299 300. In environments where valgrind is installed, if you specify 301 302 --enable-valgrind 303 304 PCRE2 will use valgrind annotations to mark certain memory regions as 305 unaddressable. This allows it to detect invalid memory accesses, and is 306 mostly useful for debugging PCRE2 itself. 307 308. In environments where the gcc compiler is used and lcov is installed, if you 309 specify 310 311 --enable-coverage 312 313 the build process implements a code coverage report for the test suite. The 314 report is generated by running "make coverage". If ccache is installed on 315 your system, it must be disabled when building PCRE2 for coverage reporting. 316 You can do this by setting the environment variable CCACHE_DISABLE=1 before 317 running "make" to build PCRE2. There is more information about coverage 318 reporting in the "pcre2build" documentation. 319 320. When JIT support is enabled, pcre2grep automatically makes use of it, unless 321 you add --disable-pcre2grep-jit to the "configure" command. 322 323. There is support for calling external programs during matching in the 324 pcre2grep command, using PCRE2's callout facility with string arguments. This 325 support can be disabled by adding --disable-pcre2grep-callout to the 326 "configure" command. There are two kinds of callout: one that generates 327 output from inbuilt code, and another that calls an external program. The 328 latter has special support for Windows and VMS; otherwise it assumes the 329 existence of the fork() function. This facility can be disabled by adding 330 --disable-pcre2grep-callout-fork to the "configure" command. 331 332. The pcre2grep program currently supports only 8-bit data files, and so 333 requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use 334 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by 335 specifying one or both of 336 337 --enable-pcre2grep-libz 338 --enable-pcre2grep-libbz2 339 340 Of course, the relevant libraries must be installed on your system. 341 342. The default starting size (in bytes) of the internal buffer used by pcre2grep 343 can be set by, for example: 344 345 --with-pcre2grep-bufsize=51200 346 347 The value must be a plain integer. The default is 20480. The amount of memory 348 used by pcre2grep is actually three times this number, to allow for "before" 349 and "after" lines. If very long lines are encountered, the buffer is 350 automatically enlarged, up to a fixed maximum size. 351 352. The default maximum size of pcre2grep's internal buffer can be set by, for 353 example: 354 355 --with-pcre2grep-max-bufsize=2097152 356 357 The default is either 1048576 or the value of --with-pcre2grep-bufsize, 358 whichever is the larger. 359 360. It is possible to compile pcre2test so that it links with the libreadline 361 or libedit libraries, by specifying, respectively, 362 363 --enable-pcre2test-libreadline or --enable-pcre2test-libedit 364 365 If this is done, when pcre2test's input is from a terminal, it reads it using 366 the readline() function. This provides line-editing and history facilities. 367 Note that libreadline is GPL-licenced, so if you distribute a binary of 368 pcre2test linked in this way, there may be licensing issues. These can be 369 avoided by linking with libedit (which has a BSD licence) instead. 370 371 Enabling libreadline causes the -lreadline option to be added to the 372 pcre2test build. In many operating environments with a sytem-installed 373 readline library this is sufficient. However, in some environments (e.g. if 374 an unmodified distribution version of readline is in use), it may be 375 necessary to specify something like LIBS="-lncurses" as well. This is 376 because, to quote the readline INSTALL, "Readline uses the termcap functions, 377 but does not link with the termcap or curses library itself, allowing 378 applications which link with readline the option to choose an appropriate 379 library." If you get error messages about missing functions tgetstr, tgetent, 380 tputs, tgetflag, or tgoto, this is the problem, and linking with the ncurses 381 library should fix it. 382 383. The C99 standard defines formatting modifiers z and t for size_t and 384 ptrdiff_t values, respectively. By default, PCRE2 uses these modifiers in 385 environments other than Microsoft Visual Studio versions earlier than 2013 386 when __STDC_VERSION__ is defined and has a value greater than or equal to 387 199901L (indicating C99). However, there is at least one environment that 388 claims to be C99 but does not support these modifiers. If 389 --disable-percent-zt is specified, no use is made of the z or t modifiers. 390 Instead of %td or %zu, %lu is used, with a cast for size_t values. 391 392. There is a special option called --enable-fuzz-support for use by people who 393 want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit 394 library. If set, it causes an extra library called libpcre2-fuzzsupport.a to 395 be built, but not installed. This contains a single function called 396 LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the 397 length of the string. When called, this function tries to compile the string 398 as a pattern, and if that succeeds, to match it. This is done both with no 399 options and with some random options bits that are generated from the string. 400 Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to 401 be created. This is normally run under valgrind or used when PCRE2 is 402 compiled with address sanitizing enabled. It calls the fuzzing function and 403 outputs information about what it is doing. The input strings are specified 404 by arguments: if an argument starts with "=" the rest of it is a literal 405 input string. Otherwise, it is assumed to be a file name, and the contents 406 of the file are the test string. 407 408. Releases before 10.30 could be compiled with --disable-stack-for-recursion, 409 which caused pcre2_match() to use individual blocks on the heap for 410 backtracking instead of recursive function calls (which use the stack). This 411 is now obsolete because pcre2_match() was refactored always to use the heap 412 (in a much more efficient way than before). This option is retained for 413 backwards compatibility, but has no effect other than to output a warning. 414 415The "configure" script builds the following files for the basic C library: 416 417. Makefile the makefile that builds the library 418. src/config.h build-time configuration options for the library 419. src/pcre2.h the public PCRE2 header file 420. pcre2-config script that shows the building settings such as CFLAGS 421 that were set for "configure" 422. libpcre2-8.pc ) 423. libpcre2-16.pc ) data for the pkg-config command 424. libpcre2-32.pc ) 425. libpcre2-posix.pc ) 426. libtool script that builds shared and/or static libraries 427 428Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 429tarballs under the names config.h.generic and pcre2.h.generic. These are 430provided for those who have to build PCRE2 without using "configure" or CMake. 431If you use "configure" or CMake, the .generic versions are not used. 432 433The "configure" script also creates config.status, which is an executable 434script that can be run to recreate the configuration, and config.log, which 435contains compiler output from tests that "configure" runs. 436 437Once "configure" has run, you can run "make". This builds whichever of the 438libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test 439program called pcre2test. If you enabled JIT support with --enable-jit, another 440test program called pcre2_jit_test is built as well. If the 8-bit library is 441built, libpcre2-posix, pcre2posix_test, and the pcre2grep command are also 442built. Running "make" with the -j option may speed up compilation on 443multiprocessor systems. 444 445The command "make check" runs all the appropriate tests. Details of the PCRE2 446tests are given below in a separate section of this document. The -j option of 447"make" can also be used when running the tests. 448 449You can use "make install" to install PCRE2 into live directories on your 450system. The following are installed (file names are all relative to the 451<prefix> that is set when "configure" is run): 452 453 Commands (bin): 454 pcre2test 455 pcre2grep (if 8-bit support is enabled) 456 pcre2-config 457 458 Libraries (lib): 459 libpcre2-8 (if 8-bit support is enabled) 460 libpcre2-16 (if 16-bit support is enabled) 461 libpcre2-32 (if 32-bit support is enabled) 462 libpcre2-posix (if 8-bit support is enabled) 463 464 Configuration information (lib/pkgconfig): 465 libpcre2-8.pc 466 libpcre2-16.pc 467 libpcre2-32.pc 468 libpcre2-posix.pc 469 470 Header files (include): 471 pcre2.h 472 pcre2posix.h 473 474 Man pages (share/man/man{1,3}): 475 pcre2grep.1 476 pcre2test.1 477 pcre2-config.1 478 pcre2.3 479 pcre2*.3 (lots more pages, all starting "pcre2") 480 481 HTML documentation (share/doc/pcre2/html): 482 index.html 483 *.html (lots more pages, hyperlinked from index.html) 484 485 Text file documentation (share/doc/pcre2): 486 AUTHORS 487 COPYING 488 ChangeLog 489 LICENCE 490 NEWS 491 README 492 pcre2.txt (a concatenation of the man(3) pages) 493 pcre2test.txt the pcre2test man page 494 pcre2grep.txt the pcre2grep man page 495 pcre2-config.txt the pcre2-config man page 496 497If you want to remove PCRE2 from your system, you can run "make uninstall". 498This removes all the files that "make install" installed. However, it does not 499remove any directories, because these are often shared with other programs. 500 501 502Retrieving configuration information 503------------------------------------ 504 505Running "make install" installs the command pcre2-config, which can be used to 506recall information about the PCRE2 configuration and installation. For example: 507 508 pcre2-config --version 509 510prints the version number, and 511 512 pcre2-config --libs8 513 514outputs information about where the 8-bit library is installed. This command 515can be included in makefiles for programs that use PCRE2, saving the programmer 516from having to remember too many details. Run pcre2-config with no arguments to 517obtain a list of possible arguments. 518 519The pkg-config command is another system for saving and retrieving information 520about installed libraries. Instead of separate commands for each library, a 521single command is used. For example: 522 523 pkg-config --libs libpcre2-16 524 525The data is held in *.pc files that are installed in a directory called 526<prefix>/lib/pkgconfig. 527 528 529Shared libraries 530---------------- 531 532The default distribution builds PCRE2 as shared libraries and static libraries, 533as long as the operating system supports shared libraries. Shared library 534support relies on the "libtool" script which is built as part of the 535"configure" process. 536 537The libtool script is used to compile and link both shared and static 538libraries. They are placed in a subdirectory called .libs when they are newly 539built. The programs pcre2test and pcre2grep are built to use these uninstalled 540libraries (by means of wrapper scripts in the case of shared libraries). When 541you use "make install" to install shared libraries, pcre2grep and pcre2test are 542automatically re-built to use the newly installed shared libraries before being 543installed themselves. However, the versions left in the build directory still 544use the uninstalled libraries. 545 546To build PCRE2 using static libraries only you must use --disable-shared when 547configuring it. For example: 548 549./configure --prefix=/usr/gnu --disable-shared 550 551Then run "make" in the usual way. Similarly, you can use --disable-static to 552build only shared libraries. 553 554 555Cross-compiling using autotools 556------------------------------- 557 558You can specify CC and CFLAGS in the normal way to the "configure" command, in 559order to cross-compile PCRE2 for some other host. However, you should NOT 560specify --enable-rebuild-chartables, because if you do, the pcre2_dftables.c 561source file is compiled and run on the local host, in order to generate the 562inbuilt character tables (the pcre2_chartables.c file). This will probably not 563work, because pcre2_dftables.c needs to be compiled with the local compiler, 564not the cross compiler. 565 566When --enable-rebuild-chartables is not specified, pcre2_chartables.c is 567created by making a copy of pcre2_chartables.c.dist, which is a default set of 568tables that assumes ASCII code. Cross-compiling with the default tables should 569not be a problem. 570 571If you need to modify the character tables when cross-compiling, you should 572move pcre2_chartables.c.dist out of the way, then compile pcre2_dftables.c by 573hand and run it on the local host to make a new version of 574pcre2_chartables.c.dist. See the pcre2build section "Creating character tables 575at build time" for more details. 576 577 578Making new tarballs 579------------------- 580 581The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and 582zip formats. The command "make distcheck" does the same, but then does a trial 583build of the new distribution to ensure that it works. 584 585If you have modified any of the man page sources in the doc directory, you 586should first run the PrepareRelease script before making a distribution. This 587script creates the .txt and HTML forms of the documentation from the man pages. 588 589 590Testing PCRE2 591------------- 592 593To test the basic PCRE2 library on a Unix-like system, run the RunTest script. 594There is another script called RunGrepTest that tests the pcre2grep command. 595When the 8-bit library is built, a test program for the POSIX wrapper, called 596pcre2posix_test, is compiled, and when JIT support is enabled, a test program 597called pcre2_jit_test is built. The scripts and the program tests are all run 598when you obey "make check". For other environments, see the instructions in 599NON-AUTOTOOLS-BUILD. 600 601The RunTest script runs the pcre2test test program (which is documented in its 602own man page) on each of the relevant testinput files in the testdata 603directory, and compares the output with the contents of the corresponding 604testoutput files. RunTest uses a file called testtry to hold the main output 605from pcre2test. Other files whose names begin with "test" are used as working 606files in some tests. 607 608Some tests are relevant only when certain build-time options were selected. For 609example, the tests for UTF-8/16/32 features are run only when Unicode support 610is available. RunTest outputs a comment when it skips a test. 611 612Many (but not all) of the tests that are not skipped are run twice if JIT 613support is available. On the second run, JIT compilation is forced. This 614testing can be suppressed by putting "-nojit" on the RunTest command line. 615 616The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit 617libraries that are enabled. If you want to run just one set of tests, call 618RunTest with either the -8, -16 or -32 option. 619 620If valgrind is installed, you can run the tests under it by putting "-valgrind" 621on the RunTest command line. To run pcre2test on just one or more specific test 622files, give their numbers as arguments to RunTest, for example: 623 624 RunTest 2 7 11 625 626You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the 627end), or a number preceded by ~ to exclude a test. For example: 628 629 Runtest 3-15 ~10 630 631This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests 632except test 13. Whatever order the arguments are in, the tests are always run 633in numerical order. 634 635You can also call RunTest with the single argument "list" to cause it to output 636a list of tests. 637 638The test sequence starts with "test 0", which is a special test that has no 639input file, and whose output is not checked. This is because it will be 640different on different hardware and with different configurations. The test 641exists in order to exercise some of pcre2test's code that would not otherwise 642be run. 643 644Tests 1 and 2 can always be run, as they expect only plain text strings (not 645UTF) and make no use of Unicode properties. The first test file can be fed 646directly into the perltest.sh script to check that Perl gives the same results. 647The only difference you should see is in the first few lines, where the Perl 648version is given instead of the PCRE2 version. The second set of tests check 649auxiliary functions, error detection, and run-time flags that are specific to 650PCRE2. It also uses the debugging flags to check some of the internals of 651pcre2_compile(). 652 653If you build PCRE2 with a locale setting that is not the standard C locale, the 654character tables may be different (see next paragraph). In some cases, this may 655cause failures in the second set of tests. For example, in a locale where the 656isprint() function yields TRUE for characters in the range 128-255, the use of 657[:isascii:] inside a character class defines a different set of characters, and 658this shows up in this test as a difference in the compiled code, which is being 659listed for checking. For example, where the comparison test output contains 660[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other 661cases. This is not a bug in PCRE2. 662 663Test 3 checks pcre2_maketables(), the facility for building a set of character 664tables for a specific locale and using them instead of the default tables. The 665script uses the "locale" command to check for the availability of the "fr_FR", 666"french", or "fr" locale, and uses the first one that it finds. If the "locale" 667command fails, or if its output doesn't include "fr_FR", "french", or "fr" in 668the list of available locales, the third test cannot be run, and a comment is 669output to say why. If running this test produces an error like this: 670 671 ** Failed to set locale "fr_FR" 672 673it means that the given locale is not available on your system, despite being 674listed by "locale". This does not mean that PCRE2 is broken. There are three 675alternative output files for the third test, because three different versions 676of the French locale have been encountered. The test passes if its output 677matches any one of them. 678 679Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible 680with the perltest.sh script, and test 5 checking PCRE2-specific things. 681 682Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in 683non-UTF mode and UTF-mode with Unicode property support, respectively. 684 685Test 8 checks some internal offsets and code size features, but it is run only 686when Unicode support is enabled. The output is different in 8-bit, 16-bit, and 68732-bit modes and for different link sizes, so there are different output files 688for each mode and link size. 689 690Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in 69116-bit and 32-bit modes. These are tests that generate different output in 6928-bit mode. Each pair are for general cases and Unicode support, respectively. 693 694Test 13 checks the handling of non-UTF characters greater than 255 by 695pcre2_dfa_match() in 16-bit and 32-bit modes. 696 697Test 14 contains some special UTF and UCP tests that give different output for 698different code unit widths. 699 700Test 15 contains a number of tests that must not be run with JIT. They check, 701among other non-JIT things, the match-limiting features of the interpretive 702matcher. 703 704Test 16 is run only when JIT support is not available. It checks that an 705attempt to use JIT has the expected behaviour. 706 707Test 17 is run only when JIT support is available. It checks JIT complete and 708partial modes, match-limiting under JIT, and other JIT-specific features. 709 710Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to 711the 8-bit library, without and with Unicode support, respectively. 712 713Test 20 checks the serialization functions by writing a set of compiled 714patterns to a file, and then reloading and checking them. 715 716Tests 21 and 22 test \C support when the use of \C is not locked out, without 717and with UTF support, respectively. Test 23 tests \C when it is locked out. 718 719Tests 24 and 25 test the experimental pattern conversion functions, without and 720with UTF support, respectively. 721 722Test 26 checks Unicode property support using tests that are generated 723automatically from the Unicode data tables. 724 725 726Character tables 727---------------- 728 729For speed, PCRE2 uses four tables for manipulating and identifying characters 730whose code point values are less than 256. By default, a set of tables that is 731built into the library is used. The pcre2_maketables() function can be called 732by an application to create a new set of tables in the current locale. This are 733passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a 734compile context. 735 736The source file called pcre2_chartables.c contains the default set of tables. 737By default, this is created as a copy of pcre2_chartables.c.dist, which 738contains tables for ASCII coding. However, if --enable-rebuild-chartables is 739specified for ./configure, a new version of pcre2_chartables.c is built by the 740program pcre2_dftables (compiled from pcre2_dftables.c), which uses the ANSI C 741character handling functions such as isalnum(), isalpha(), isupper(), 742islower(), etc. to build the table sources. This means that the default C 743locale that is set for your system will control the contents of these default 744tables. You can change the default tables by editing pcre2_chartables.c and 745then re-building PCRE2. If you do this, you should take care to ensure that the 746file does not get automatically re-generated. The best way to do this is to 747move pcre2_chartables.c.dist out of the way and replace it with your customized 748tables. 749 750When the pcre2_dftables program is run as a result of specifying 751--enable-rebuild-chartables, it uses the default C locale that is set on your 752system. It does not pay attention to the LC_xxx environment variables. In other 753words, it uses the system's default locale rather than whatever the compiling 754user happens to have set. If you really do want to build a source set of 755character tables in a locale that is specified by the LC_xxx variables, you can 756run the pcre2_dftables program by hand with the -L option. For example: 757 758 ./pcre2_dftables -L pcre2_chartables.c.special 759 760The second argument names the file where the source code for the tables is 761written. The first two 256-byte tables provide lower casing and case flipping 762functions, respectively. The next table consists of a number of 32-byte bit 763maps which identify certain character classes such as digits, "word" 764characters, white space, etc. These are used when building 32-byte bit maps 765that represent character classes for code points less than 256. The final 766256-byte table has bits indicating various character types, as follows: 767 768 1 white space character 769 2 letter 770 4 lower case letter 771 8 decimal digit 772 16 alphanumeric or '_' 773 774You can also specify -b (with or without -L) when running pcre2_dftables. This 775causes the tables to be written in binary instead of as source code. A set of 776binary tables can be loaded into memory by an application and passed to 777pcre2_compile() in the same way as tables created dynamically by calling 778pcre2_maketables(). The tables are just a string of bytes, independent of 779hardware characteristics such as endianness. This means they can be bundled 780with an application that runs in different environments, to ensure consistent 781behaviour. 782 783See also the pcre2build section "Creating character tables at build time". 784 785 786File manifest 787------------- 788 789The distribution should contain the files listed below. 790 791(A) Source files for the PCRE2 library functions and their headers are found in 792 the src directory: 793 794 src/pcre2_dftables.c auxiliary program for building pcre2_chartables.c 795 when --enable-rebuild-chartables is specified 796 797 src/pcre2_chartables.c.dist a default set of character tables that assume 798 ASCII coding; unless --enable-rebuild-chartables is 799 specified, used by copying to pcre2_chartables.c 800 801 src/pcre2posix.c ) 802 src/pcre2_auto_possess.c ) 803 src/pcre2_compile.c ) 804 src/pcre2_config.c ) 805 src/pcre2_context.c ) 806 src/pcre2_convert.c ) 807 src/pcre2_dfa_match.c ) 808 src/pcre2_error.c ) 809 src/pcre2_extuni.c ) 810 src/pcre2_find_bracket.c ) 811 src/pcre2_jit_compile.c ) 812 src/pcre2_jit_match.c ) sources for the functions in the library, 813 src/pcre2_jit_misc.c ) and some internal functions that they use 814 src/pcre2_maketables.c ) 815 src/pcre2_match.c ) 816 src/pcre2_match_data.c ) 817 src/pcre2_newline.c ) 818 src/pcre2_ord2utf.c ) 819 src/pcre2_pattern_info.c ) 820 src/pcre2_script_run.c ) 821 src/pcre2_serialize.c ) 822 src/pcre2_string_utils.c ) 823 src/pcre2_study.c ) 824 src/pcre2_substitute.c ) 825 src/pcre2_substring.c ) 826 src/pcre2_tables.c ) 827 src/pcre2_ucd.c ) 828 src/pcre2_ucptables.c ) 829 src/pcre2_valid_utf.c ) 830 src/pcre2_xclass.c ) 831 832 src/pcre2_printint.c debugging function that is used by pcre2test, 833 src/pcre2_fuzzsupport.c function for (optional) fuzzing support 834 835 src/config.h.in template for config.h, when built by "configure" 836 src/pcre2.h.in template for pcre2.h when built by "configure" 837 src/pcre2posix.h header for the external POSIX wrapper API 838 src/pcre2_internal.h header for internal use 839 src/pcre2_intmodedep.h a mode-specific internal header 840 src/pcre2_jit_neon_inc.h header used by JIT 841 src/pcre2_jit_simd_inc.h header used by JIT 842 src/pcre2_ucp.h header for Unicode property handling 843 844 sljit/* source files for the JIT compiler 845 846(B) Source files for programs that use PCRE2: 847 848 src/pcre2demo.c simple demonstration of coding calls to PCRE2 849 src/pcre2grep.c source of a grep utility that uses PCRE2 850 src/pcre2test.c comprehensive test program 851 src/pcre2_jit_test.c JIT test program 852 src/pcre2posix_test.c POSIX wrapper API test program 853 854(C) Auxiliary files: 855 856 132html script to turn "man" pages into HTML 857 AUTHORS information about the author of PCRE2 858 ChangeLog log of changes to the code 859 CleanTxt script to clean nroff output for txt man pages 860 Detrail script to remove trailing spaces 861 HACKING some notes about the internals of PCRE2 862 INSTALL generic installation instructions 863 LICENCE conditions for the use of PCRE2 864 COPYING the same, using GNU's standard name 865 Makefile.in ) template for Unix Makefile, which is built by 866 ) "configure" 867 Makefile.am ) the automake input that was used to create 868 ) Makefile.in 869 NEWS important changes in this release 870 NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools 871 PrepareRelease script to make preparations for "make dist" 872 README this file 873 RunTest a Unix shell script for running tests 874 RunGrepTest a Unix shell script for pcre2grep tests 875 aclocal.m4 m4 macros (generated by "aclocal") 876 config.guess ) files used by libtool, 877 config.sub ) used only when building a shared library 878 configure a configuring shell script (built by autoconf) 879 configure.ac ) the autoconf input that was used to build 880 ) "configure" and config.h 881 depcomp ) script to find program dependencies, generated by 882 ) automake 883 doc/*.3 man page sources for PCRE2 884 doc/*.1 man page sources for pcre2grep and pcre2test 885 doc/index.html.src the base HTML page 886 doc/html/* HTML documentation 887 doc/pcre2.txt plain text version of the man pages 888 doc/pcre2test.txt plain text documentation of test program 889 install-sh a shell script for installing files 890 libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config 891 libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config 892 libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config 893 libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config 894 ltmain.sh file used to build a libtool script 895 missing ) common stub for a few missing GNU programs while 896 ) installing, generated by automake 897 mkinstalldirs script for making install directories 898 perltest.sh Script for running a Perl test program 899 pcre2-config.in source of script which retains PCRE2 information 900 testdata/testinput* test data for main library tests 901 testdata/testoutput* expected test results 902 testdata/grep* input and output for pcre2grep tests 903 testdata/* other supporting test files 904 905(D) Auxiliary files for cmake support 906 907 cmake/COPYING-CMAKE-SCRIPTS 908 cmake/FindPackageHandleStandardArgs.cmake 909 cmake/FindEditline.cmake 910 cmake/FindReadline.cmake 911 CMakeLists.txt 912 config-cmake.h.in 913 914(E) Auxiliary files for building PCRE2 "by hand" 915 916 src/pcre2.h.generic ) a version of the public PCRE2 header file 917 ) for use in non-"configure" environments 918 src/config.h.generic ) a version of config.h for use in non-"configure" 919 ) environments 920 921Philip Hazel 922Email local part: Philip.Hazel 923Email domain: gmail.com 924Last updated: 10 December 2022 925
README.md
1# PCRE2 - Perl-Compatible Regular Expressions 2 3The PCRE2 library is a set of C functions that implement regular expression 4pattern matching using the same syntax and semantics as Perl 5. PCRE2 has its 5own native API, as well as a set of wrapper functions that correspond to the 6POSIX regular expression API. The PCRE2 library is free, even for building 7proprietary software. It comes in three forms, for processing 8-bit, 16-bit, 8or 32-bit code units, in either literal or UTF encoding. 9 10PCRE2 was first released in 2015 to replace the API in the original PCRE 11library, which is now obsolete and no longer maintained. As well as a more 12flexible API, the code of PCRE2 has been much improved since the fork. 13 14## Download 15 16As well as downloading from the 17[GitHub site](https://github.com/PCRE2Project/pcre2), you can download PCRE2 18or the older, unmaintained PCRE1 library from an 19[*unofficial* mirror](https://sourceforge.net/projects/pcre/files/) at SourceForge. 20 21You can check out the PCRE2 source code via Git or Subversion: 22 23 git clone https://github.com/PCRE2Project/pcre2.git 24 svn co https://github.com/PCRE2Project/pcre2.git 25 26## Contributed Ports 27 28If you just need the command-line PCRE2 tools on Windows, precompiled binary 29versions are available at this 30[Rexegg page](http://www.rexegg.com/pcregrep-pcretest.html). 31 32A PCRE2 port for z/OS, a mainframe operating system which uses EBCDIC as its 33default character encoding, can be found at 34[http://www.cbttape.org](http://www.cbttape.org/) (File 939). 35 36## Documentation 37 38You can read the PCRE2 documentation 39[here](https://PCRE2Project.github.io/pcre2/doc/html/index.html). 40 41Comparisons to Perl's regular expression semantics can be found in the 42community authored Wikipedia entry for PCRE. 43 44There is a curated summary of changes for each PCRE release, copies of 45documentation from older releases, and other useful information from the third 46party authored 47[RexEgg PCRE Documentation and Change Log page](http://www.rexegg.com/pcre-documentation.html). 48 49## Contact 50 51To report a problem with the PCRE2 library, or to make a feature request, please 52use the PCRE2 GitHub issues tracker. There is a mailing list for discussion of 53 PCRE2 issues and development at pcre2-dev@googlegroups.com, which is where any 54announcements will be made. You can browse the 55[list archives](https://groups.google.com/g/pcre2-dev). 56 57