1Change Log for PCRE2 - see also the Git log 2------------------------------------------- 3 4 5Version 10.42 11-December-2022 6------------------------------ 7 81. Change 19 of 10.41 wasn't quite right; it put the definition of a default, 9empty value for PCRE2_CALL_CONVENTION in src/pcre2posix.c instead of 10src/pcre2posix.h, which meant that programs that included pcre2posix.h but not 11pcre2.h failed to compile. 12 132. To catch similar issues to the above in future, a new small test program 14that includes pcre2posix.h but not pcre2.h has been added to the test suite. 15 163. When the -S option of pcre2test was used to set a stack size greater than 17the allowed maximum, the error message displayed the hard limit incorrectly. 18This was pointed out on GitHub pull request #171, but the suggested patch 19didn't cope with all cases. Some further modification was required. 20 214. Supplying an ovector count of more than 65535 to pcre2_match_data_create() 22caused a crash because the field in the match data block is only 16 bits. A 23maximum of 65535 is now silently applied. 24 255. Merged @carenas patch #175 which fixes #86 - segfault on aarch64 (ARM), 26 27 28Version 10.41 06-December-2022 29------------------------------ 30 311. Add fflush() before and after a fork callout in pcre2grep to get its output 32to be the same on all systems. (There were previously ordering differences in 33Alpine Linux). 34 352. Merged patch from @carenas (GitHub #110) for pthreads support in CMake. 36 373. SSF scorecards grumbled about possible overflow in an expression in 38pcre2test. It never would have overflowed in practice, but some casts have been 39added and at the some time there's been some tidying of fprints that output 40size_t values. 41 424. PR #94 showed up an unused enum in pcre2_convert.c, which is now removed. 43 445. Minor code re-arrangement to remove gcc warning about realloc() in 45pcre2test. 46 476. Change a number of int variables that hold buffer and line lengths in 48pcre2grep to PCRE2_SIZE (aka size_t). 49 507. Added an #ifdef to cut out a call to PRIV(jit_free) when JIT is not 51supported (even though that function would do nothing in that case) at the 52request of a user who doesn't even want to link with pcre_jit_compile.o. Also 53tidied up an untidy #ifdef arrangement in pcre2test. 54 558. Fixed an issue in the backtracking optimization of character repeats in 56JIT. Furthermore optimize star repetitions, not just plus repetitions. 57 589. Removed the use of an initial backtracking frames vector on the system stack 59in pcre2_match() so that it now always uses the heap. (In a multi-thread 60environment with very small stacks there had been an issue.) This also is 61tidier for JIT matching, which didn't need that vector. The heap vector is now 62remembered in the match data block and re-used if that block itself is re-used. 63It is freed with the match data block. 64 6510. Adjusted the find_limits code in pcre2test to work with change 9 above. 66 6711. Added find_limits_noheap to pcre2test, because the heap limits are now 68different in different environments and so cannot be included in the standard 69tests. 70 7112. Created a test for pcre2_match() heap processing that is not part of the 72tests run by 'make check', but can be run manually. The current output is from 73a 64-bit system. 74 7513. Implemented -Z aka --null in pcre2grep. 76 7714. A minor change to pcre2test and the addition of several new pcre2grep tests 78have improved LCOV coverage statistics. At the same time, code in pcre2grep and 79elsewhere that can never be obeyed in normal testing has been excluded from 80coverage. 81 8215. Fixed a bug in pcre2grep that could cause an extra newline to be written 83after output generaed by --output. 84 8516. If a file has a .bz2 extension but is not in fact compressed, pcre2grep 86should process it as a plain text file. A bug stopped this happening; now fixed 87and added to the tests. 88 8917. When pcre2grep was running not in UTF mode, if a string specified by 90--output or obtained from a callout in a pattern contained a character (byte) 91greater than 127, it was incorrectly output in UTF-8 format. 92 9318. Added some casts after warnings from Clang sanitize. 94 9519. Merged patch from cbouc (GitHub #139): 4 function prototypes were missing 96PCRE2_CALL_CONVENTION in src/pcre2posix.h. All function prototypes returning 97pointers had out of place PCRE2_CALL_CONVENTION in src/pcre2.h.*. These 98produced errors when building for Windows with #define PCRE2_CALL_CONVENTION 99__stdcall. 100 10120. A negative repeat value in a pcre2test subject line was not being 102diagnosed, leading to infinite looping. 103 10421. Updated RunGrepTest to discard the warning that Bash now gives when setting 105LC_CTYPE to a bad value (because older versions didn't). 106 10722. Updated pcre2grep so that it behaves like GNU grep when matching more than 108one pattern and a later pattern matches at an earlier point in the subject when 109the matched substrings are being identified by colour or by offsets. 110 11123. Updated the PrepareRelease script so that the man page that it makes for 112the pcre2demo demonstration program is more standard and does not cause errors 113when processed by lexgrog or mandb -c (GitHub issue #160). 114 11524. The JIT compiler was updated. 116 117 118Version 10.40 15-April-2022 119--------------------------- 120 1211. Merged patch from @carenas (GitHub #35, 7db87842) to fix pcre2grep incorrect 122handling of multiple passes. 123 1242. Merged patch from @carenas (GitHub #36, dae47509) to fix portability issue 125in pcre2grep with buffered fseek(stdin). 126 1273. Merged patch from @carenas (GitHub #37, acc520924) to fix tests when -S is 128not supported. 129 1304. Revert an unintended change in JIT repeat detection. 131 1325. Merged patch from @carenas (GitHub #52, b037bfa1) to fix build on GNU Hurd. 133 1346. Merged documentation and comments patches from @carenas (GitHub #47). 135 1367. Merged patch from @carenas (GitHub #49) to remove obsolete JFriedl test code 137from pcre2grep. 138 1398. Merged patch from @carenas (GitHub #48) to fix CMake install issue #46. 140 1419. Merged patch from @carenas (GitHub #53) fixing NULL checks in matching and 142substituting. 143 14410. Add null_subject and null_replacement modifiers to pcre2test. 145 14611. Add check for NULL subject to POSIX regexec() function. 147 14812. Add check for NULL replacement to pcre2_substitute(). 149 15013. For the subject arguments of pcre2_match(), pcre2_dfa_match(), and 151pcre2_substitute(), and the replacement argument of the latter, if the pointer 152is NULL and the length is zero, treat as an empty string. Apparently a number 153of applications treat NULL/0 in this way. 154 15514. Added support for Bidi_Class and a number of binary Unicode properties, 156including Bidi_Control. 157 15815. Fix some minor issues raised by clang sanitize. 159 16016. Very minor code speed up for maximizing character property matches. 161 16217. A number of changes to script matching for \p and \P: 163 164 (a) Script extensions for a character are now coded as a bitmap instead of 165 a list of script numbers, which should be faster and does not need a 166 loop. 167 168 (b) Added the syntax \p{script:xxx} and \p{script_extensions:xxx} (synonyms 169 sc and scx). 170 171 (c) Changed \p{scriptname} from being the same as \p{sc:scriptname} to being 172 the same as \p{scx:scriptname} because this change happened in Perl at 173 release 5.26. 174 175 (d) The standard Unicode 4-letter abbreviations for script names are now 176 recognized. 177 178 (e) In accordance with Unicode and Perl's "loose matching" rules, spaces, 179 hyphens, and underscores are ignored in property names, which are then 180 matched independent of case. 181 18218. The Python scripts in the maint directory have been refactored. There are 183now three scripts that generate pcre2_ucd.c, pcre2_ucp.h, and pcre2_ucptables.c 184(which is #included by pcre2_tables.c). The data lists that used to be 185duplicated are now held in a single common Python module. 186 18719. On CHERI, and thus Arm's Morello prototype, pointers are represented as 188hardware capabilities, which consist of both an integer address and additional 189metadata, meaning they are twice the size of the platform's size_t type, i.e. 19016 bytes on a 64-bit system. The ovector member of heapframe happens to only be 1918 byte aligned, and so computing frame_size ended up with a multiple of 8 but 192not 16. Whilst the first frame was always suitably aligned, this then 193misaligned the frame that follows, resulting in an alignment fault when storing 194a pointer to Fecode at the start of match. Patch to fix this issue by Jessica 195Clarke PR#72. 196 19720. Added -LP and -LS listing options to pcre2test. 198 19921. A user discovered that the library names in CMakeLists.txt for MSVC 200debugger (PDB) files were incorrect - perhaps never tried for PCRE2? 201 20222. An item such as [Aa] is optimized into a caseless single character match. 203When this was quantified (e.g. [Aa]{2}) and was also the last literal item in a 204pattern, the optimizing "must be present for a match" character check was not 205being flagged as caseless, causing some matches that should have succeeded to 206fail. 207 20823. Fixed a unicode property matching issue in JIT. The character was not 209fully read in caseless matching. 210 21124. Fixed an issue affecting recursions in JIT caused by duplicated data 212transfers. 213 21425. Merged patch from @carenas (GitHub #96) which fixes some problems with 215pcre2test and readline/readedit: 216 217 * Use the right header for libedit in FreeBSD with autoconf 218 * Really allow libedit with cmake 219 * Avoid using readline headers with libedit 220 221 222Version 10.39 29-October-2021 223----------------------------- 224 2251. Fix incorrect detection of alternatives in first character search in JIT. 226 2272. Merged patch from @carenas (GitHub #28): 228 229 Visual Studio 2013 includes support for %zu and %td, so let newer 230 versions of it avoid the fallback, and while at it, make sure that 231 the first check is for DISABLE_PERCENT_ZT so it will be always 232 honoured if chosen. 233 234 prtdiff_t is signed, so use a signed type instead, and make sure 235 that an appropriate width is chosen if pointers are 64bit wide and 236 long is not (ex: Windows 64bit). 237 238 IMHO removing the cast (and therefore the possibilty of truncation) 239 make the code cleaner and the fallback is likely portable enough 240 with all 64-bit POSIX systems doing LP64 except for Windows. 241 2423. Merged patch from @carenas (GitHub #29) to update to Unicode 14.0.0. 243 2444. Merged patch from @carenas (GitHub #30): 245 246 * Cleanup: remove references to no longer used stdint.h 247 248 Since 19c50b9d (Unconditionally use inttypes.h instead of trying for stdint.h 249 (simplification) and remove the now unnecessary inclusion in 250 pcre2_internal.h., 2018-11-14), stdint.h is no longer used. 251 252 Remove checks for it in autotools and CMake and document better the expected 253 build failures for systems that might have stdint.h (C99) and not inttypes.h 254 (from POSIX), like old Windows. 255 256 * Cleanup: remove detection for inttypes.h which is a hard dependency 257 258 CMake checks for standard headers are not meant to be used for hard 259 dependencies, so will prevent a possible fallback to work. 260 261 Alternatively, the header could be checked to make the configuration fail 262 instead of breaking the build, but that was punted, as it was missing anyway 263 from autotools. 264 2655. Merged patch from @carenas (GitHub #32): 266 267 * jit: allow building with ancient MSVC versions 268 269 Visual Studio older than 2013 fails to build with JIT enabled, because it is 270 unable to parse non C89 compatible syntax, with mixed declarations and code. 271 While most recent compilers wouldn't even report this as a warning since it 272 is valid C99, it could be also made visible by adding to gcc/clang the 273 -Wdeclaration-after-statement flag at build time. 274 275 Move the code below the affected definitions. 276 277 * pcre2grep: avoid mixing declarations with code 278 279 Since d5a61ee8 (Patch to detect (and ignore) symlink loops in pcre2grep, 280 2021-08-28), code will fail to build in a strict C89 compiler. 281 282 Reformat slightly to make it C89 compatible again. 283 284 285Version 10.38 01-October-2021 286----------------------------- 287 2881. Fix invalid single character repetition issues in JIT when the repetition 289is inside a capturing bracket and the bracket is preceded by character 290literals. 291 2922. Installed revised CMake configuration files provided by Jan-Willem Blokland. 293This extends the CMake build system to build both static and shared libraries 294in one go, builds the static library with PIC, and exposes PCRE2 libraries 295using the CMake config files. JWB provided these notes: 296 297- Introduced CMake variable BUILD_STATIC_LIBS to build the static library. 298 299- Make a small modification to config-cmake.h.in by removing the PCRE2_STATIC 300 variable. Added PCRE2_STATIC variable to the static build using the 301 target_compile_definitions() function. 302 303- Extended the CMake config files. 304 305 - Introduced CMake variable PCRE2_USE_STATIC_LIBS to easily switch between 306 the static and shared libraries. 307 308 - Added the PCRE_STATIC variable to the target compile definitions for the 309 import of the static library. 310 311Building static and shared libraries using MSVC results in a name clash of 312the libraries. Both static and shared library builds create, for example, the 313file pcre2-8.lib. Therefore, I decided to change the static library names by 314adding "-static". For example, pcre2-8.lib has become pcre2-8-static.lib. 315[Comment by PH: this is MSVC-specific. It doesn't happen on Linux.] 316 3173. Increased the minimum release number for CMake to 3.0.0 because older than 3182.8.12 is deprecated (it was set to 2.8.5) and causes warnings. Even 3.0.0 is 319quite old; it was released in 2014. 320 3214. Implemented a modified version of Thomas Tempelmann's pcre2grep patch for 322detecting symlink loops. This is dependent on the availability of realpath(), 323which is now tested for in ./configure and CMakeLists.txt. 324 3255. Implemented a modified version of Thomas Tempelmann's patch for faster 326case-independent "first code unit" searches for unanchored patterns in 8-bit 327mode in the interpreters. Instead of just remembering whether one case matched 328or not, it remembers the position of a previous match so as to avoid 329unnecessary repeated searching. 330 3316. Perl now locks out \K in lookarounds, so PCRE2 now does the same by default. 332However, just in case anybody was relying on the old behaviour, there is an 333option called PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK that enables the old behaviour. 334An option has also been added to pcre2grep to enable this. 335 3367. Re-enable a JIT optimization which was unintentionally disabled in 10.35. 337 3388. There is a loop counter to catch excessively crazy patterns when checking 339the lengths of lookbehinds at compile time. This was incorrectly getting reset 340whenever a lookahead was processed, leading to some fuzzer-generated patterns 341taking a very long time to compile when (?|) was present in the pattern, 342because (?|) disables caching of group lengths. 343 344 345Version 10.37 26-May-2021 346------------------------- 347 3481. Change RunGrepTest to use tr instead of sed when testing with binary 349zero bytes, because sed varies a lot from system to system and has problems 350with binary zeros. This is from Bugzilla #2681. Patch from Jeremie 351Courreges-Anglas via Nam Nguyen. This fixes RunGrepTest for OpenBSD. Later: 352it broke it for at least one version of Solaris, where tr can't handle binary 353zeros. However, that system had /usr/xpg4/bin/tr installed, which works OK, so 354RunGrepTest now checks for that command and uses it if found. 355 3562. Compiling with gcc 10.2's -fanalyzer option showed up a hypothetical problem 357with a NULL dereference. I don't think this case could ever occur in practice, 358but I have put in a check in order to get rid of the compiler error. 359 3603. An alternative patch for CMakeLists.txt because 10.36 #4 breaks CMake on 361Windows. Patch from email@cs-ware.de fixes bugzilla #2688. 362 3634. Two bugs related to over-large numbers have been fixed so the behaviour is 364now the same as Perl. 365 366 (a) A pattern such as /\214748364/ gave an overflow error instead of being 367 treated as the octal number \214 followed by literal digits. 368 369 (b) A sequence such as {65536 that has no terminating } so is not a 370 quantifier was nevertheless complaining that a quantifier number was too big. 371 3725. A run of autoconf suggested that configure.ac was out-of-date with respect 373to the lastest autoconf. Running autoupdate made some valid changes, some valid 374suggestions, and also some invalid changes, which were fixed by hand. Autoconf 375now runs clean and the resulting "configure" seems to work, so I hope nothing 376is broken. Later: the requirement for autoconf 2.70 broke some automatic test 377robots. It doesn't seem to be necessary: trying a reduction to 2.60. 378 3796. The pattern /a\K.(?0)*/ when matched against "abac" by the interpreter gave 380the answer "bac", whereas Perl and JIT both yield "c". This was because the 381effect of \K was not propagating back from the full pattern recursion. Other 382recursions such as /(a\K.(?1)*)/ did not have this problem. 383 3847. Restore single character repetition optimization in JIT. Currently fewer 385character repetitions are optimized than in 10.34. 386 3878. When the names of the functions in the POSIX wrapper were changed to 388pcre2_regcomp() etc. (see change 10.33 #4 below), functions with the original 389names were left in the library so that pre-compiled programs would still work. 390However, this has proved troublesome when programs link with several libraries, 391some of which use PCRE2 via the POSIX interface while others use a native POSIX 392library. For this reason, the POSIX function names are removed in this release. 393The macros in pcre2posix.h should ensure that re-compiling fixes any programs 394that haven't been compiled since before 10.33. 395 396 397Version 10.36 04-December-2020 398------------------------------ 399 4001. Add CET_CFLAGS so that when Intel CET is enabled, pass -mshstk to 401compiler. This fixes https://bugs.exim.org/show_bug.cgi?id=2578. Patch for 402Makefile.am and configure.ac by H.J. Lu. Equivalent patch for CMakeLists.txt 403invented by PH. 404 4052. Fix inifinite loop when a single byte newline is searched in JIT when 406invalid utf8 mode is enabled. 407 4083. Updated CMakeLists.txt with patch from Wolfgang Stöggl (Bugzilla #2584): 409 410 - Include GNUInstallDirs and use ${CMAKE_INSTALL_LIBDIR} instead of hardcoded 411 lib. This allows differentiation between lib and lib64. 412 CMAKE_INSTALL_LIBDIR is used for installation of libraries and also for 413 pkgconfig file generation. 414 415 - Add the version of PCRE2 to the configuration summary like ./configure 416 does. 417 418 - Fix typo: MACTHED_STRING->MATCHED_STRING 419 4204. Updated CMakeLists.txt with another patch from Wolfgang Stöggl (Bugzilla 421#2588): 422 423 - Add escaped double quotes around include directory in CMakeLists.txt to 424 allow spaces in directory names. 425 426 - This fixes a cmake error, if the path of the pcre2 source contains a space. 427 4285. Updated CMakeLists.txt with a patch from B. Scott Michel: CMake's 429documentation suggests using CHECK_SYMBOL_EXISTS over CHECK_FUNCTION_EXIST. 430Moreover, these functions come from specific header files, which need to be 431specified (and, thankfully, are the same on both the Linux and WinXX 432platforms.) 433 4346. Added a (uint32_t) cast to prevent a compiler warning in pcre2_compile.c. 435 4367. Applied a patch from Wolfgang Stöggl (Bugzilla #2600) to fix postfix for 437debug Windows builds using CMake. This also updated configure so that it 438generates *.pc files and pcre2-config with the same content, as in the past. 439 4408. If a pattern ended with (?(VERSION=n.d where n is any number but d is just a 441single digit, the code unit beyond d was being read (i.e. there was a read 442buffer overflow). Fixes ClusterFuzz 23779. 443 4449. After the rework in r1235, certain character ranges were incorrectly 445handled by an optimization in JIT. Furthermore a wrong offset was used to 446read a value from a buffer which could lead to memory overread. 447 44810. Unnoticed for many years was the fact that delimiters other than / in the 449testinput1 and testinput4 files could cause incorrect behaviour when these 450files were processed by perltest.sh. There were several tests that used quotes 451as delimiters, and it was just luck that they didn't go wrong with perltest.sh. 452All the patterns in testinput1 and testinput4 now use / as their delimiter. 453This fixes Bugzilla #2641. 454 45511. Perl has started to give an error for \K within lookarounds (though there 456are cases where it doesn't). PCRE2 still allows this, so the tests that include 457this case have been moved from test 1 to test 2. 458 45912. Further to 10 above, pcre2test has been updated to detect and grumble if a 460delimiter other than / is used after #perltest. 461 46213. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS 463was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding 464the start of a match was not resetting correctly after a failed match on the 465first valid fragment of the subject, possibly causing incorrect "no match" 466returns on subsequent fragments. For example, the pattern /A/ failed to match 467the subject \xe5A. Fixes Bugzilla #2642. 468 46914. Fixed a bug in character set matching when JIT is enabled and both unicode 470scripts and unicode classes are present at the same time. 471 47215. Added GNU grep's -m (aka --max-count) option to pcre2grep. 473 47416. Refactored substitution processing in pcre2grep strings, both for the -O 475option and when dealing with callouts. There is now a single function that 476handles $ expansion in all cases (instead of multiple copies of almost 477identical code). This means that the same escape sequences are available 478everywhere, which was not previously the case. At the same time, the escape 479sequences $x{...} and $o{...} have been introduced, to allow for characters 480whose code points are greater than 255 in Unicode mode. 481 48217. Applied the patch from Bugzilla #2628 to RunGrepTest. This does an explicit 483test for a version of sed that can handle binary zero, instead of assuming that 484any Linux version will work. Later: replaced $(...) by `...` because not all 485shells recognize the former. 486 48718. Fixed a word boundary check bug in JIT when partial matching is enabled. 488 48919. Fix ARM64 compilation warning in JIT. Patch by Carlo. 490 49120. A bug in the RunTest script meant that if the first part of test 2 failed, 492the failure was not reported. 493 49421. Test 2 was failing when run from a directory other than the source 495directory. This failure was previously missed in RunTest because of 20 above. 496Fixes added to both RunTest and RunTest.bat. 497 49822. Patch to CMakeLists.txt from Daniel to fix problem with testing under 499Windows. 500 501 502Version 10.35 09-May-2020 503--------------------------- 504 5051. Use PCRE2_MATCH_EMPTY flag to detect empty matches in JIT. 506 5072. Fix ARMv5 JIT improper handling of labels right after a constant pool. 508 5093. A JIT bug is fixed which allowed to read the fields of the compiled 510pattern before its existence is checked. 511 5124. Back in the PCRE1 day, capturing groups that contained recursive back 513references to themselves were made atomic (version 8.01, change 18) because 514after the end a repeated group, the captured substrings had their values from 515the final repetition, not from an earlier repetition that might be the 516destination of a backtrack. This feature was documented, and was carried over 517into PCRE2. However, it has now been realized that the major refactoring that 518was done for 10.30 has made this atomicizing unnecessary, and it is confusing 519when users are unaware of it, making some patterns appear not to be working as 520expected. Capture values of recursive back references in repeated groups are 521now correctly backtracked, so this unnecessary restriction has been removed. 522 5235. Added PCRE2_SUBSTITUTE_LITERAL. 524 5256. Avoid some VS compiler warnings. 526 5277. Added PCRE2_SUBSTITUTE_MATCHED. 528 5298. Added (?* and (?<* as synonyms for (*napla: and (*naplb: to match another 530regex engine. The Perl regex folks are aware of this usage and have made a note 531about it. 532 5339. When an assertion is repeated, PCRE2 used to limit the maximum repetition to 5341, believing that repeating an assertion is pointless. However, if a positive 535assertion contains capturing groups, repetition can be useful. In any case, an 536assertion could always be wrapped in a repeated group. The only restriction 537that is now imposed is that an unlimited maximum is changed to one more than 538the minimum. 539 54010. Fix *THEN verbs in lookahead assertions in JIT. 541 54211. Added PCRE2_SUBSTITUTE_REPLACEMENT_ONLY. 543 54412. The JIT stack should be freed when the low-level stack allocation fails. 545 54613. In pcre2grep, if the final line in a scanned file is output but does not 547end with a newline sequence, add a newline according to the --newline setting. 548 54914. (?(DEFINE)...) groups were not being handled correctly when checking for 550the fixed length of a lookbehind assertion. Such a group within a lookbehind 551should be skipped, as it does not contribute to the length of the group. 552Instead, the (DEFINE) group was being processed, and if at the end of the 553lookbehind, that end was not correctly recognized. Errors such as "lookbehind 554assertion is not fixed length" and also "internal error: bad code value in 555parsed_skip()" could result. 556 55715. Put a limit of 1000 on recursive calls in pcre2_study() when searching 558nested groups for starting code units, in order to avoid stack overflow issues. 559If the limit is reached, it just gives up trying for this optimization. 560 56116. The control verb chain list must always be restored when exiting from a 562recurse function in JIT. 563 56417. Fix a crash which occurs when the character type of an invalid UTF 565character is decoded in JIT. 566 56718. Changes in many areas of the code so that when Unicode is supported and 568PCRE2_UCP is set without PCRE2_UTF, Unicode character properties are used for 569upper/lower case computations on characters whose code points are greater than 570127. 571 57219. The function for checking UTF-16 validity was returning an incorrect offset 573for the start of the error when a high surrogate was not followed by a valid 574low surrogate. This caused incorrect behaviour, for example when 575PCRE2_MATCH_INVALID_UTF was set and a match started immediately following the 576invalid high surrogate, such as /aa/ matching "\x{d800}aa". 577 57820. If a DEFINE group immediately preceded a lookbehind assertion, the pattern 579could be mis-compiled and therefore not match correctly. This is the example 580that found this: /(?(DEFINE)(?<foo>bar))(?<![-a-z0-9])word/ which failed to 581match "word" because the "move back" value was set to zero. 582 58321. Following a request from a user, some extensions and tidies to the 584character tables handling have been done: 585 586 (a) The dftables auxiliary program is renamed pcre2_dftables, but it is still 587 not installed for public use. 588 589 (b) There is now a -b option for pcre2_dftables, which causes the tables to 590 be written in binary. There is also a -help option. 591 592 (c) PCRE2_CONFIG_TABLES_LENGTH is added to pcre2_config() so that an 593 application that wants to save tables in binary knows how long they are. 594 59522. Changed setting of CMAKE_MODULE_PATH in CMakeLists.txt from SET to 596LIST(APPEND...) to allow a setting from the command line to be included. 597 59823. Updated to Unicode 13.0.0. 599 60024. CMake build now checks for secure_getenv() and strerror(). Patch by Carlo. 601 60225. Avoid using [-1] as a suffix in pcre2test because it can provoke a compiler 603warning. 604 60526. Added tests for __attribute__((uninitialized)) to both the configure and 606CMake build files, and then applied this attribute to the variable called 607stack_frames_vector[] in pcre2_match(). When implemented, this disables 608automatic initialization (a facility in clang), which can take time on big 609variables. 610 61127. Updated CMakeLists.txt (patches by Uwe Korn) to add support for 612pcre2-config, the libpcre*.pc files, SOVERSION, VERSION and the 613MACHO_*_VERSIONS settings for CMake builds. 614 61528. Another patch to CMakeLists.txt to check for mkostemp (configure already 616does). Patch by Carlo Marcelo Arenas Belon. 617 61829. Check for the existence of memfd_create in both CMake and configure 619configurations. Patch by Carlo Marcelo Arenas Belon. 620 62130. Restrict the configuration setting for the SELinux compatible execmem 622allocator (change 10.30/44) to Linux and NetBSD. 623 624 625Version 10.34 21-November-2019 626------------------------------ 627 6281. The maximum number of capturing subpatterns is 65535 (documented), but no 629check on this was ever implemented. This omission has been rectified; it fixes 630ClusterFuzz 14376. 631 6322. Improved the invalid utf32 support of the JIT compiler. Now it correctly 633detects invalid characters in the 0xd800-0xdfff range. 634 6353. Fix minor typo bug in JIT compile when \X is used in a non-UTF string. 636 6374. Add support for matching in invalid UTF strings to the pcre2_match() 638interpreter, and integrate with the existing JIT support via the new 639PCRE2_MATCH_INVALID_UTF compile-time option. 640 6415. Give more error detail for invalid UTF-8 when detected in pcre2grep. 642 6436. Add support for invalid UTF-8 to pcre2grep. 644 6457. Adjust the limit for "must have" code unit searching, in particular, 646increase it substantially for non-anchored patterns. 647 6488. Allow (*ACCEPT) to be quantified, because an ungreedy quantifier with a zero 649minimum is potentially useful. 650 6519. Some changes to the way the minimum subject length is handled: 652 653 * When PCRE2_NO_START_OPTIMIZE is set, no minimum length is computed; 654 pcre2test now omits this item instead of showing a value of zero. 655 656 * An incorrect minimum length could be calculated for a pattern that 657 contained (*ACCEPT) inside a qualified group whose minimum repetition was 658 zero, for example /A(?:(*ACCEPT))?B/, which incorrectly computed a minimum 659 of 2. The minimum length scan no longer happens for a pattern that 660 contains (*ACCEPT). 661 662 * When no minimum length is set by the normal scan, but a first and/or last 663 code unit is recorded, set the minimum to 1 or 2 as appropriate. 664 665 * When a pattern contains multiple groups with the same number, a back 666 reference cannot know which one to scan for a minimum length. This used to 667 cause the minimum length finder to give up with no result. Now it treats 668 such references as not adding to the minimum length (which it should have 669 done all along). 670 671 * Furthermore, the above action now happens only if the back reference is to 672 a group that exists more than once in a pattern instead of any back 673 reference in a pattern with duplicate numbers. 674 67510. A (*MARK) value inside a successful condition was not being returned by the 676interpretive matcher (it was returned by JIT). This bug has been mended. 677 67811. A bug in pcre2grep meant that -o without an argument (or -o0) didn't work 679if the pattern had more than 32 capturing parentheses. This is fixed. In 680addition (a) the default limit for groups requested by -o<n> has been raised to 68150, (b) the new --om-capture option changes the limit, (c) an error is raised 682if -o asks for a group that is above the limit. 683 68412. The quantifier {1} was always being ignored, but this is incorrect when it 685is made possessive and applied to an item in parentheses, because a 686parenthesized item may contain multiple branches or other backtracking points, 687for example /(a|ab){1}+c/ or /(a+){1}+a/. 688 68913. For partial matches, pcre2test was always showing the maximum lookbehind 690characters, flagged with "<", which is misleading when the lookbehind didn't 691actually look behind the start (because it was later in the pattern). Showing 692all consulted preceding characters for partial matches is now controlled by the 693existing "allusedtext" modifier and, as for complete matches, this facility is 694available only for non-JIT matching, because JIT does not maintain the first 695and last consulted characters. 696 69714. DFA matching (using pcre2_dfa_match()) was not recognising a partial match 698if the end of the subject was encountered in a lookahead (conditional or 699otherwise), an atomic group, or a recursion. 700 70115. Give error if pcre2test -t, -T, -tm or -TM is given an argument of zero. 702 70316. Check for integer overflow when computing lookbehind lengths. Fixes 704Clusterfuzz issue 15636. 705 70617. Implemented non-atomic positive lookaround assertions. 707 70818. If a lookbehind contained a lookahead that contained another lookbehind 709within it, the nested lookbehind was not correctly processed. For example, if 710/(?<=(?=(?<=a)))b/ was matched to "ab" it gave no match instead of matching 711"b". 712 71319. Implemented pcre2_get_match_data_size(). 714 71520. Two alterations to partial matching: 716 717 (a) The definition of a partial match is slightly changed: if a pattern 718 contains any lookbehinds, an empty partial match may be given, because this 719 is another situation where adding characters to the current subject can 720 lead to a full match. Example: /c*+(?<=[bc])/ with subject "ab". 721 722 (b) Similarly, if a pattern could match an empty string, an empty partial 723 match may be given. Example: /(?![ab]).*/ with subject "ab". This case 724 applies only to PCRE2_PARTIAL_HARD. 725 726 (c) An empty string partial hard match can be returned for \z and \Z as it 727 is documented that they shouldn't match. 728 72921. A branch that started with (*ACCEPT) was not being recognized as one that 730could match an empty string. 731 73222. Corrected pcre2_set_character_tables() tables data type: was const unsigned 733char * instead of const uint8_t *, as generated by pcre2_maketables(). 734 73523. Upgraded to Unicode 12.1.0. 736 73724. Add -jitfast command line option to pcre2test (to make all the jit options 738available directly). 739 74025. Make pcre2test -C show if libreadline or libedit is supported. 741 74226. If the length of one branch of a group exceeded 65535 (the maximum value 743that is remembered as a minimum length), the whole group's length was 744incorrectly recorded as 65535, leading to incorrect "no match" when start-up 745optimizations were in force. 746 74727. The "rightmost consulted character" value was not always correct; in 748particular, if a pattern ended with a negative lookahead, characters that were 749inspected in that lookahead were not included. 750 75128. Add the pcre2_maketables_free() function. 752 75329. The start-up optimization that looks for a unique initial matching 754code unit in the interpretive engines uses memchr() in 8-bit mode. When the 755search is caseless, it was doing so inefficiently, which ended up slowing down 756the match drastically when the subject was very long. The revised code (a) 757remembers if one case is not found, so it never repeats the search for that 758case after a bumpalong and (b) when one case has been found, it searches only 759up to that position for an earlier occurrence of the other case. This fix 760applies to both interpretive pcre2_match() and to pcre2_dfa_match(). 761 76230. While scanning to find the minimum length of a group, if any branch has 763minimum length zero, there is no need to scan any subsequent branches (a small 764compile-time performance improvement). 765 76631. Installed a .gitignore file on a user's suggestion. When using the svn 767repository with git (through git svn) this helps keep it tidy. 768 76932. Add underflow check in JIT which may occur when the value of subject 770string pointer is close to 0. 771 77233. Arrange for classes such as [Aa] which contain just the two cases of the 773same character, to be treated as a single caseless character. This causes the 774first and required code unit optimizations to kick in where relevant. 775 77634. Improve the bitmap of starting bytes for positive classes that include wide 777characters, but no property types, in UTF-8 mode. Previously, on encountering 778such a class, the bits for all bytes greater than \xc4 were set, thus 779specifying any character with codepoint >= 0x100. Now the only bits that are 780set are for the relevant bytes that start the wide characters. This can give a 781noticeable performance improvement. 782 78335. If the bitmap of starting code units contains only 1 or 2 bits, replace it 784with a single starting code unit (1 bit) or a caseless single starting code 785unit if the two relevant characters are case-partners. This is particularly 786relevant to the 8-bit library, though it applies to all. It can give a 787performance boost for patterns such as [Ww]ord and (word|WORD). However, this 788optimization doesn't happen if there is a "required" code unit of the same 789value (because the search for a "required" code unit starts at the match start 790for non-unique first code unit patterns, but after a unique first code unit, 791and patterns such as a*a need the former action). 792 79336. Small patch to pcre2posix.c to set the erroroffset field to -1 immediately 794after a successful compile, instead of at the start of matching to avoid a 795sanitizer complaint (regexec is supposed to be thread safe). 796 79737. Add NEON vectorization to JIT to speed up matching of first character and 798pairs of characters on ARM64 CPUs. 799 80038. If a non-ASCII character was the first in a starting assertion in a 801caseless match, the "first code unit" optimization did not get the casing 802right, and the assertion failed to match a character in the other case if it 803did not start with the same code unit. 804 80539. Fixed the incorrect computation of jump sizes on x86 CPUs in JIT. A masking 806operation was incorrectly removed in r1136. Reported by Ralf Junker. 807 808 809Version 10.33 16-April-2019 810--------------------------- 811 8121. Added "allvector" to pcre2test to make it easy to check the part of the 813ovector that shouldn't be changed, in particular after substitute and failed or 814partial matches. 815 8162. Fix subject buffer overread in JIT when UTF is disabled and \X or \R has 817a greater than 1 fixed quantifier. This issue was found by Yunho Kim. 818 8193. Added support for callouts from pcre2_substitute(). After 10.33-RC1, but 820prior to release, fixed a bug that caused a crash if pcre2_substitute() was 821called with a NULL match context. 822 8234. The POSIX functions are now all called pcre2_regcomp() etc., with wrapper 824functions that use the standard POSIX names. However, in pcre2posix.h the POSIX 825names are defined as macros. This should help avoid linking with the wrong 826library in some environments while still exporting the POSIX names for 827pre-existing programs that use them. (The Debian alternative names are also 828defined as macros, but not documented.) 829 8305. Fix an xclass matching issue in JIT. 831 8326. Implement PCRE2_EXTRA_ESCAPED_CR_IS_LF (see Bugzilla 2315). 833 8347. Implement the Perl 5.28 experimental alphabetic names for atomic groups and 835lookaround assertions, for example, (*pla:...) and (*atomic:...). These are 836characterized by a lower case letter following (* and to simplify coding for 837this, the character tables created by pcre2_maketables() were updated to add a 838new "is lower case letter" bit. At the same time, the now unused "is 839hexadecimal digit" bit was removed. The default tables in 840src/pcre2_chartables.c.dist are updated. 841 8428. Implement the new Perl "script run" features (*script_run:...) and 843(*atomic_script_run:...) aka (*sr:...) and (*asr:...). 844 8459. Fixed two typos in change 22 for 10.21, which added special handling for 846ranges such as a-z in EBCDIC environments. The original code probably never 847worked, though there were no bug reports. 848 84910. Implement PCRE2_COPY_MATCHED_SUBJECT for pcre2_match() (including JIT via 850pcre2_match()) and pcre2_dfa_match(), but *not* the pcre2_jit_match() fast 851path. Also, when a match fails, set the subject field in the match data to NULL 852for tidiness - none of the substring extractors should reference this after 853match failure. 854 85511. If a pattern started with a subroutine call that had a quantifier with a 856minimum of zero, an incorrect "match must start with this character" could be 857recorded. Example: /(?&xxx)*ABC(?<xxx>XYZ)/ would (incorrectly) expect 'A' to 858be the first character of a match. 859 86012. The heap limit checking code in pcre2_dfa_match() could suffer from 861overflow if the heap limit was set very large. This could cause incorrect "heap 862limit exceeded" errors. 863 86413. Add "kibibytes" to the heap limit output from pcre2test -C to make the 865units clear. 866 86714. Add a call to pcre2_jit_free_unused_memory() in pcre2grep, for tidiness. 868 86915. Updated the VMS-specific code in pcre2test on the advice of a VMS user. 870 87116. Removed the unnecessary inclusion of stdint.h (or inttypes.h) from 872pcre2_internal.h as it is now included by pcre2.h. Also, change 17 for 10.32 873below was unnecessarily complicated, as inttypes.h is a Standard C header, 874which is defined to be a superset of stdint.h. Instead of conditionally 875including stdint.h or inttypes.h, pcre2.h now unconditionally includes 876inttypes.h. This supports environments that do not have stdint.h but do have 877inttypes.h, which are known to exist. A note in the autotools documentation 878says (November 2018) that there are none known that are the other way round. 879 88017. Added --disable-percent-zt to "configure" (and equivalent to CMake) to 881forcibly disable the use of %zu and %td in formatting strings because there is 882at least one version of VMS that claims to be C99 but does not support these 883modifiers. 884 88518. Added --disable-pcre2grep-callout-fork, which restricts the callout support 886in pcre2grep to the inbuilt echo facility. This may be useful in environments 887that do not support fork(). 888 88919. Fix two instances of <= 0 being applied to unsigned integers (the VMS 890compiler complains). 891 89220. Added "fork" support for VMS to pcre2grep, for running an external program 893via a string callout. 894 89521. Improve MAP_JIT flag usage on MacOS. Patch by Rich Siegel. 896 89722. If a pattern started with (*MARK), (*COMMIT), (*PRUNE), (*SKIP), or (*THEN) 898followed by ^ it was not recognized as anchored. 899 90023. The RunGrepTest script used to cut out the test of NUL characters for 901Solaris and MacOS as printf and sed can't handle them. It seems that the *BSD 902systems can't either. I've inverted the test so that only those OS that are 903known to work (currently only Linux) try to run this test. 904 90524. Some tests in RunGrepTest appended to testtrygrep from two different file 906descriptors instead of redirecting stderr to stdout. This worked on Linux, but 907it was reported not to on other systems, causing the tests to fail. 908 90925. In the RunTest script, make the test for stack setting use the same value 910for the stack as it needs for -bigstack. 911 91226. Insert a cast in pcre2_dfa_match.c to suppress a compiler warning. 913 91426. With PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL set, escape sequences such as \s 915which are valid in character classes, but not as the end of ranges, were being 916treated as literals. An example is [_-\s] (but not [\s-_] because that gave an 917error at the *start* of a range). Now an "invalid range" error is given 918independently of PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. 919 92027. Related to 26 above, PCRE2_BAD_ESCAPE_IS_LITERAL was affecting known escape 921sequences such as \eX when they appeared invalidly in a character class. Now 922the option applies only to unrecognized or malformed escape sequences. 923 92428. Fix word boundary in JIT compiler. Patch by Mike Munday. 925 92629. The pcre2_dfa_match() function was incorrectly handling conditional version 927tests such as (?(VERSION>=0)...) when the version test was true. Incorrect 928processing or a crash could result. 929 93030. When PCRE2_UTF is set, allow non-ASCII letters and decimal digits in group 931names, as Perl does. There was a small bug in this new code, found by 932ClusterFuzz 12950, fixed before release. 933 93431. Implemented PCRE2_EXTRA_ALT_BSUX to support ECMAScript 6's \u{hhh} 935construct. 936 93732. Compile \p{Any} to be the same as . in DOTALL mode, so that it benefits 938from auto-anchoring if \p{Any}* starts a pattern. 939 94033. Compile invalid UTF check in JIT test when only pcre32 is enabled. 941 94234. For some time now, CMake has been warning about the setting of policy 943CMP0026 to "OLD" in CmakeLists.txt, and hinting that the feature might be 944removed in a future version. A request for CMake expertise on the list produced 945no result, so I have now hacked CMakeLists.txt along the lines of some changes 946I found on the Internet. The new code no longer needs the policy setting, and 947it appears to work fine on Linux. 948 94935. Setting --enable-jit=auto for an out-of-tree build failed because the 950source directory wasn't in the search path for AC_TRY_COMPILE always. Patch 951from Ross Burton. 952 95336. Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available. 954Patch by Guillem Jover. 955 95637. Changed expressions such as 1<<10 to 1u<<10 in many places because compiler 957warnings were reported. 958 95938. Using the clang compiler with sanitizing options causes runtime complaints 960about truncation for statements such as x = ~x when x is an 8-bit value; it 961seems to compute ~x as a 32-bit value. Changing such statements to x = 255 ^ x 962gets rid of the warnings. There were also two missing casts in pcre2test. 963 964 965Version 10.32 10-September-2018 966------------------------------- 967 9681. When matching using the the REG_STARTEND feature of the POSIX API with a 969non-zero starting offset, unset capturing groups with lower numbers than a 970group that did capture something were not being correctly returned as "unset" 971(that is, with offset values of -1). 972 9732. When matching using the POSIX API, pcre2test used to omit listing unset 974groups altogether. Now it shows those that come before any actual captures as 975"<unset>", as happens for non-POSIX matching. 976 9773. Running "pcre2test -C" always stated "\R matches CR, LF, or CRLF only", 978whatever the build configuration was. It now correctly says "\R matches all 979Unicode newlines" in the default case when --enable-bsr-anycrlf has not been 980specified. Similarly, running "pcre2test -C bsr" never produced the result 981ANY. 982 9834. Matching the pattern /(*UTF)\C[^\v]+\x80/ against an 8-bit string containing 984multi-code-unit characters caused bad behaviour and possibly a crash. This 985issue was fixed for other kinds of repeat in release 10.20 by change 19, but 986repeating character classes were overlooked. 987 9885. pcre2grep now supports the inclusion of binary zeros in patterns that are 989read from files via the -f option. 990 9916. A small fix to pcre2grep to avoid compiler warnings for -Wformat-overflow=2. 992 9937. Added --enable-jit=auto support to configure.ac. 994 9958. Added some dummy variables to the heapframe structure in 16-bit and 32-bit 996modes for the benefit of m68k, where pointers can be 16-bit aligned. The 997dummies force 32-bit alignment and this ensures that the structure is a 998multiple of PCRE2_SIZE, a requirement that is tested at compile time. In other 999architectures, alignment requirements take care of this automatically. 1000 10019. When returning an error from pcre2_pattern_convert(), ensure the error 1002offset is set zero for early errors. 1003 100410. A number of patches for Windows support from Daniel Richard G: 1005 1006 (a) List of error numbers in Runtest.bat corrected (it was not the same as in 1007 Runtest). 1008 1009 (b) pcre2grep snprintf() workaround as used elsewhere in the tree. 1010 1011 (c) Support for non-C99 snprintf() that returns -1 in the overflow case. 1012 101311. Minor tidy of pcre2_dfa_match() code. 1014 101512. Refactored pcre2_dfa_match() so that the internal recursive calls no longer 1016use the stack for local workspace and local ovectors. Instead, an initial block 1017of stack is reserved, but if this is insufficient, heap memory is used. The 1018heap limit parameter now applies to pcre2_dfa_match(). 1019 102013. If a "find limits" test of DFA matching in pcre2test resulted in too many 1021matches for the ovector, no matches were displayed. 1022 102314. Removed an occurrence of ctrl/Z from test 6 because Windows treats it as 1024EOF. The test looks to have come from a fuzzer. 1025 102615. If PCRE2 was built with a default match limit a lot greater than the 1027default default of 10 000 000, some JIT tests of the match limit no longer 1028failed. All such tests now set 10 000 000 as the upper limit. 1029 103016. Another Windows related patch for pcregrep to ensure that WIN32 is 1031undefined under Cygwin. 1032 103317. Test for the presence of stdint.h and inttypes.h in configure and CMake and 1034include whichever exists (stdint preferred) instead of unconditionally 1035including stdint. This makes life easier for old and non-standard systems. 1036 103718. Further changes to improve portability, especially to old and or non- 1038standard systems: 1039 1040 (a) Put all printf arguments in RunGrepTest into single, not double, quotes, 1041 and use \0 not \x00 for binary zero. 1042 1043 (b) Avoid the use of C++ (i.e. BCPL) // comments. 1044 1045 (c) Parameterize the use of %zu in pcre2test to make it like %td. For both of 1046 these now, if using MSVC or a standard C before C99, %lu is used with a 1047 cast if necessary. 1048 104919. Applied a contributed patch to CMakeLists.txt to increase the stack size 1050when linking pcre2test with MSVC. This gets rid of a stack overflow error in 1051the standard set of tests. 1052 105320. Output a warning in pcre2test when ignoring the "altglobal" modifier when 1054it is given with the "replace" modifier. 1055 105621. In both pcre2test and pcre2_substitute(), with global matching, a pattern 1057that matched an empty string, but never at the starting match offset, was not 1058handled in a Perl-compatible way. The pattern /(<?=\G.)/ is an example of such 1059a pattern. Because \G is in a lookbehind assertion, there has to be a 1060"bumpalong" before there can be a match. The automatic "advance by one 1061character after an empty string match" rule is therefore inappropriate. A more 1062complicated algorithm has now been implemented. 1063 106422. When checking to see if a lookbehind is of fixed length, lookaheads were 1065correctly ignored, but qualifiers on lookaheads were not being ignored, leading 1066to an incorrect "lookbehind assertion is not fixed length" error. 1067 106823. The VERSION condition test was reading fractional PCRE2 version numbers 1069such as the 04 in 10.04 incorrectly and hence giving wrong results. 1070 107124. Updated to Unicode version 11.0.0. As well as the usual addition of new 1072scripts and characters, this involved re-jigging the grapheme break property 1073algorithm because Unicode has changed the way emojis are handled. 1074 107525. Fixed an obscure bug that struck when there were two atomic groups not 1076separated by something with a backtracking point. There could be an incorrect 1077backtrack into the first of the atomic groups. A complicated example is 1078/(?>a(*:1))(?>b)(*SKIP:1)x|.*/ matched against "abc", where the *SKIP 1079shouldn't find a MARK (because is in an atomic group), but it did. 1080 108126. Upgraded the perltest.sh script: (1) #pattern lines can now be used to set 1082a list of modifiers for all subsequent patterns - only those that the script 1083recognizes are meaningful; (2) #subject lines can be used to set or unset a 1084default "mark" modifier; (3) Unsupported #command lines give a warning when 1085they are ignored; (4) Mark data is output only if the "mark" modifier is 1086present. 1087 108827. (*ACCEPT:ARG), (*FAIL:ARG), and (*COMMIT:ARG) are now supported. 1089 109028. A (*MARK) name was not being passed back for positive assertions that were 1091terminated by (*ACCEPT). 1092 109329. Add support for \N{U+dddd}, but only in Unicode mode. 1094 109530. Add support for (?^) for unsetting all imnsx options. 1096 109731. The PCRE2_EXTENDED (/x) option only ever discarded space characters whose 1098code point was less than 256 and that were recognized by the lookup table 1099generated by pcre2_maketables(), which uses isspace() to identify white space. 1100Now, when Unicode support is compiled, PCRE2_EXTENDED also discards U+0085, 1101U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by 1102Unicode as "Pattern White Space". This makes PCRE2 compatible with Perl. 1103 110432. In certain circumstances, option settings within patterns were not being 1105correctly processed. For example, the pattern /((?i)A)(?m)B/ incorrectly 1106matched "ab". (The (?m) setting lost the fact that (?i) should be reset at the 1107end of its group during the parse process, but without another setting such as 1108(?m) the compile phase got it right.) This bug was introduced by the 1109refactoring in release 10.23. 1110 111133. PCRE2 uses bcopy() if available when memmove() is not, and it used just to 1112define memmove() as function call to bcopy(). This hasn't been tested for a 1113long time because in pcre2test the result of memmove() was being used, whereas 1114bcopy() doesn't return a result. This feature is now refactored always to call 1115an emulation function when there is no memmove(). The emulation makes use of 1116bcopy() when available. 1117 111834. When serializing a pattern, set the memctl, executable_jit, and tables 1119fields (that is, all the fields that contain pointers) to zeros so that the 1120result of serializing is always the same. These fields are re-set when the 1121pattern is deserialized. 1122 112335. In a pattern such as /[^\x{100}-\x{ffff}]*[\x80-\xff]/ which has a repeated 1124negative class with no characters less than 0x100 followed by a positive class 1125with only characters less than 0x100, the first class was incorrectly being 1126auto-possessified, causing incorrect match failures. 1127 112836. Removed the character type bit ctype_meta, which dates from PCRE1 and is 1129not used in PCRE2. 1130 113137. Tidied up unnecessarily complicated macros used in the escapes table. 1132 113338. Since 10.21, the new testoutput8-16-4 file has accidentally been omitted 1134from distribution tarballs, owing to a typo in Makefile.am which had 1135testoutput8-16-3 twice. Now fixed. 1136 113739. If the only branch in a conditional subpattern was anchored, the whole 1138subpattern was treated as anchored, when it should not have been, since the 1139assumed empty second branch cannot be anchored. Demonstrated by test patterns 1140such as /(?(1)^())b/ or /(?(?=^))b/. 1141 114240. A repeated conditional subpattern that could match an empty string was 1143always assumed to be unanchored. Now it it checked just like any other 1144repeated conditional subpattern, and can be found to be anchored if the minimum 1145quantifier is one or more. I can't see much use for a repeated anchored 1146pattern, but the behaviour is now consistent. 1147 114841. Minor addition to pcre2_jit_compile.c to avoid static analyzer complaint 1149(for an event that could never occur but you had to have external information 1150to know that). 1151 115242. If before the first match in a file that was being searched by pcre2grep 1153there was a line that was sufficiently long to cause the input buffer to be 1154expanded, the variable holding the location of the end of the previous match 1155was being adjusted incorrectly, and could cause an overflow warning from a code 1156sanitizer. However, as the value is used only to print pending "after" lines 1157when the next match is reached (and there are no such lines in this case) this 1158bug could do no damage. 1159 1160 1161Version 10.31 12-February-2018 1162------------------------------ 1163 11641. Fix typo (missing ]) in VMS code in pcre2test.c. 1165 11662. Replace the replicated code for matching extended Unicode grapheme sequences 1167(which got a lot more complicated by change 10.30/49) by a single subroutine 1168that is called by both pcre2_match() and pcre2_dfa_match(). 1169 11703. Add idempotent guard to pcre2_internal.h. 1171 11724. Add new pcre2_config() options: PCRE2_CONFIG_NEVER_BACKSLASH_C and 1173PCRE2_CONFIG_COMPILED_WIDTHS. 1174 11755. Cut out \C tests in the JIT regression tests when NEVER_BACKSLASH_C is 1176defined (e.g. by --enable-never-backslash-C). 1177 11786. Defined public names for all the pcre2_compile() error numbers, and used 1179the public names in pcre2_convert.c. 1180 11817. Fixed a small memory leak in pcre2test (convert contexts). 1182 11838. Added two casts to compile.c and one to match.c to avoid compiler warnings. 1184 11859. Added code to pcre2grep when compiled under VMS to set the symbol 1186PCRE2GREP_RC to the exit status, because VMS does not distinguish between 1187exit(0) and exit(1). 1188 118910. Added the -LM (list modifiers) option to pcre2test. Also made -C complain 1190about a bad option only if the following argument item does not start with a 1191hyphen. 1192 119311. pcre2grep was truncating components of file names to 128 characters when 1194processing files with the -r option, and also (some very odd code) truncating 1195path names to 512 characters. There is now a check on the absolute length of 1196full path file names, which may be up to 2047 characters long. 1197 119812. When an assertion contained (*ACCEPT) it caused all open capturing groups 1199to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to 1200misbehaviour for subsequent references to groups that started outside the 1201assertion. ACCEPT in an assertion now closes only those groups that were 1202started within that assertion. Fixes oss-fuzz issues 3852 and 3891. 1203 120413. Multiline matching in pcre2grep was misbehaving if the pattern matched 1205within a line, and then matched again at the end of the line and over into 1206subsequent lines. Behaviour was different with and without colouring, and 1207sometimes context lines were incorrectly printed and/or line endings were lost. 1208All these issues should now be fixed. 1209 121014. If --line-buffered was specified for pcre2grep when input was from a 1211compressed file (.gz or .bz2) a segfault occurred. (Line buffering should be 1212ignored for compressed files.) 1213 121415. Although pcre2_jit_match checks whether the pattern is compiled 1215in a given mode, it was also expected that at least one mode is available. 1216This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION 1217when the pattern is not optimized by JIT at all. 1218 121916. The line number and related variables such as match counts in pcre2grep 1220were all int variables, causing overflow when files with more than 2147483647 1221lines were processed (assuming 32-bit ints). They have all been changed to 1222unsigned long ints. 1223 122417. If a backreference with a minimum repeat count of zero was first in a 1225pattern, apart from assertions, an incorrect first matching character could be 1226recorded. For example, for the pattern /(?=(a))\1?b/, "b" was incorrectly set 1227as the first character of a match. 1228 122918. Characters in a leading positive assertion are considered for recording a 1230first character of a match when the rest of the pattern does not provide one. 1231However, a character in a non-assertive group within a leading assertion such 1232as in the pattern /(?=(a))\1?b/ caused this process to fail. This was an 1233infelicity rather than an outright bug, because it did not affect the result of 1234a match, just its speed. (In fact, in this case, the starting 'a' was 1235subsequently picked up in the study.) 1236 123719. A minor tidy in pcre2_match(): making all PCRE2_ERROR_ returns use "return" 1238instead of "RRETURN" saves unwinding the backtracks in these cases (only one 1239didn't). 1240 124120. Allocate a single callout block on the stack at the start of pcre2_match() 1242and set its never-changing fields once only. Do the same for pcre2_dfa_match(). 1243 124421. Save the extra compile options (set in the compile context) with the 1245compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS 1246to retrieve them, and update pcre2test to show them. 1247 124822. Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new 1249field callout_flags in callout blocks. The bits are set by pcre2_match(), but 1250not by JIT or pcre2_dfa_match(). Their settings are shown in pcre2test callouts 1251if the callout_extra subject modifier is set. These bits are provided to help 1252with tracking how a backtracking match is proceeding. 1253 125423. Updated the pcre2demo.c demonstration program, which was missing the extra 1255code for -g that handles the case when \K in an assertion causes the match to 1256end at the original start point. Also arranged for it to detect when \K causes 1257the end of a match to be before its start. 1258 125924. Similar to 23 above, strange things (including loops) could happen in 1260pcre2grep when \K was used in an assertion when --colour was used or in 1261multiline mode. The "end at original start point" bug is fixed, and if the end 1262point is found to be before the start point, they are swapped. 1263 126425. When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT 1265matching (both pcre2_match() and pcre2_dfa_match()) and the matched string 1266started with the first code unit of a newline sequence, matching failed because 1267it was not tried at the newline. 1268 126926. Code for giving up a non-partial match after failing to find a starting 1270code unit anywhere in the subject was missing when searching for one of a 1271number of code units (the bitmap case) in both pcre2_match() and 1272pcre2_dfa_match(). This was a missing optimization rather than a bug. 1273 127427. Tidied up the ACROSSCHAR macro to be like FORWARDCHAR and BACKCHAR, using a 1275pointer argument rather than a code unit value. This should not have affected 1276the generated code. 1277 127828. The JIT compiler has been updated. 1279 128029. Avoid pointer overflow for unset captures in pcre2_substring_list_get(). 1281This could not actually cause a crash because it was always used in a memcpy() 1282call with zero length. 1283 128430. Some internal structures have a variable-length ovector[] as their last 1285element. Their actual memory is obtained dynamically, giving an ovector of 1286appropriate length. However, they are defined in the structure as 1287ovector[NUMBER], where NUMBER is large so that array bound checkers don't 1288grumble. The value of NUMBER was 10000, but a fuzzer exceeded 5000 capturing 1289groups, making the ovector larger than this. The number has been increased to 1290131072, which allows for the maximum number of captures (65535) plus the 1291overall match. This fixes oss-fuzz issue 5415. 1292 129331. Auto-possessification at the end of a capturing group was dependent on what 1294follows the group (e.g. /(a+)b/ would auto-possessify the a+) but this caused 1295incorrect behaviour when the group was called recursively from elsewhere in the 1296pattern where something different might follow. This bug is an unforseen 1297consequence of change #1 for 10.30 - the implementation of backtracking into 1298recursions. Iterators at the ends of capturing groups are no longer considered 1299for auto-possessification if the pattern contains any recursions. Fixes 1300Bugzilla #2232. 1301 1302 1303Version 10.30 14-August-2017 1304---------------------------- 1305 13061. The main interpreter, pcre2_match(), has been refactored into a new version 1307that does not use recursive function calls (and therefore the stack) for 1308remembering backtracking positions. This makes --disable-stack-for-recursion a 1309NOOP. The new implementation allows backtracking into recursive group calls in 1310patterns, making it more compatible with Perl, and also fixes some other 1311hard-to-do issues such as #1887 in Bugzilla. The code is also cleaner because 1312the old code had a number of fudges to try to reduce stack usage. It seems to 1313run no slower than the old code. 1314 1315A number of bugs in the refactored code were subsequently fixed during testing 1316before release, but after the code was made available in the repository. These 1317bugs were never in fully released code, but are noted here for the record. 1318 1319 (a) If a pattern had fewer capturing parentheses than the ovector supplied in 1320 the match data block, a memory error (detectable by ASAN) occurred after 1321 a match, because the external block was being set from non-existent 1322 internal ovector fields. Fixes oss-fuzz issue 781. 1323 1324 (b) A pattern with very many capturing parentheses (when the internal frame 1325 size was greater than the initial frame vector on the stack) caused a 1326 crash. A vector on the heap is now set up at the start of matching if the 1327 vector on the stack is not big enough to handle at least 10 frames. 1328 Fixes oss-fuzz issue 783. 1329 1330 (c) Handling of (*VERB)s in recursions was wrong in some cases. 1331 1332 (d) Captures in negative assertions that were used as conditions were not 1333 happening if the assertion matched via (*ACCEPT). 1334 1335 (e) Mark values were not being passed out of recursions. 1336 1337 (f) Refactor some code in do_callout() to avoid picky compiler warnings about 1338 negative indices. Fixes oss-fuzz issue 1454. 1339 1340 (g) Similarly refactor the way the variable length ovector is addressed for 1341 similar reasons. Fixes oss-fuzz issue 1465. 1342 13432. Now that pcre2_match() no longer uses recursive function calls (see above), 1344the "match limit recursion" value seems misnamed. It still exists, and limits 1345the depth of tree that is searched. To avoid future confusion, it has been 1346renamed as "depth limit" in all relevant places (--with-depth-limit, 1347(*LIMIT_DEPTH), pcre2_set_depth_limit(), etc) but the old names are still 1348available for backwards compatibility. 1349 13503. Hardened pcre2test so as to reduce the number of bugs reported by fuzzers: 1351 1352 (a) Check for malloc failures when getting memory for the ovector (POSIX) or 1353 the match data block (non-POSIX). 1354 13554. In the 32-bit library in non-UTF mode, an attempt to find a Unicode property 1356for a character with a code point greater than 0x10ffff (the Unicode maximum) 1357caused a crash. 1358 13595. If a lookbehind assertion that contained a back reference to a group 1360appearing later in the pattern was compiled with the PCRE2_ANCHORED option, 1361undefined actions (often a segmentation fault) could occur, depending on what 1362other options were set. An example assertion is (?<!\1(abc)) where the 1363reference \1 precedes the group (abc). This fixes oss-fuzz issue 865. 1364 13656. Added the PCRE2_INFO_FRAMESIZE item to pcre2_pattern_info() and arranged for 1366pcre2test to use it to output the frame size when the "framesize" modifier is 1367given. 1368 13697. Reworked the recursive pattern matching in the JIT compiler to follow the 1370interpreter changes. 1371 13728. When the zero_terminate modifier was specified on a pcre2test subject line 1373for global matching, unpredictable things could happen. For example, in UTF-8 1374mode, the pattern //g,zero_terminate read random memory when matched against an 1375empty string with zero_terminate. This was a bug in pcre2test, not the library. 1376 13779. Moved some Windows-specific code in pcre2grep (introduced in 10.23/13) out 1378of the section that is compiled when Unix-style directory scanning is 1379available, and into a new section that is always compiled for Windows. 1380 138110. In pcre2test, explicitly close the file after an error during serialization 1382or deserialization (the "load" or "save" commands). 1383 138411. Fix memory leak in pcre2_serialize_decode() when the input is invalid. 1385 138612. Fix potential NULL dereference in pcre2_callout_enumerate() if called with 1387a NULL pattern pointer when Unicode support is available. 1388 138913. When the 32-bit library was being tested by pcre2test, error messages that 1390were longer than 64 code units could cause a buffer overflow. This was a bug in 1391pcre2test. 1392 139314. The alternative matching function, pcre2_dfa_match() misbehaved if it 1394encountered a character class with a possessive repeat, for example [a-f]{3}+. 1395 139615. The depth (formerly recursion) limit now applies to DFA matching (as 1397of 10.23/36); pcre2test has been upgraded so that \=find_limits works with DFA 1398matching to find the minimum value for this limit. 1399 140016. Since 10.21, if pcre2_match() was called with a null context, default 1401memory allocation functions were used instead of whatever was used when the 1402pattern was compiled. 1403 140417. Changes to the pcre2test "memory" modifier on a subject line. These apply 1405only to pcre2_match(): 1406 1407 (a) Warn if null_context is set on both pattern and subject, because the 1408 memory details cannot then be shown. 1409 1410 (b) Remember (up to a certain number of) memory allocations and their 1411 lengths, and list only the lengths, so as to be system-independent. 1412 (In practice, the new interpreter never has more than 2 blocks allocated 1413 simultaneously.) 1414 141518. Make pcre2test detect an error return from pcre2_get_error_message(), give 1416a message, and abandon the run (this would have detected #13 above). 1417 141819. Implemented PCRE2_ENDANCHORED. 1419 142020. Applied Jason Hood's patches (slightly modified) to pcre2grep, to implement 1421the --output=text (-O) option and the inbuilt callout echo. 1422 142321. Extend auto-anchoring etc. to ignore groups with a zero qualifier and 1424single-branch conditions with a false condition (e.g. DEFINE) at the start of a 1425branch. For example, /(?(DEFINE)...)^A/ and /(...){0}^B/ are now flagged as 1426anchored. 1427 142822. Added an explicit limit on the amount of heap used by pcre2_match(), set by 1429pcre2_set_heap_limit() or (*LIMIT_HEAP=xxx). Upgraded pcre2test to show the 1430heap limit along with other pattern information, and to find the minimum when 1431the find_limits modifier is set. 1432 143323. Write to the last 8 bytes of the pcre2_real_code structure when a compiled 1434pattern is set up so as to initialize any padding the compiler might have 1435included. This avoids valgrind warnings when a compiled pattern is copied, in 1436particular when it is serialized. 1437 143824. Remove a redundant line of code left in accidentally a long time ago. 1439 144025. Remove a duplication typo in pcre2_tables.c 1441 144226. Correct an incorrect cast in pcre2_valid_utf.c 1443 144427. Update pcre2test, remove some unused code in pcre2_match(), and upgrade the 1445tests to improve coverage. 1446 144728. Some fixes/tidies as a result of looking at Coverity Scan output: 1448 1449 (a) Typo: ">" should be ">=" in opcode check in pcre2_auto_possess.c. 1450 (b) Added some casts to avoid "suspicious implicit sign extension". 1451 (c) Resource leaks in pcre2test in rare error cases. 1452 (d) Avoid warning for never-use case OP_TABLE_LENGTH which is just a fudge 1453 for checking at compile time that tables are the right size. 1454 (e) Add missing "fall through" comment. 1455 145629. Implemented PCRE2_EXTENDED_MORE and related /xx and (?xx) features. 1457 145830. Implement (?n: for PCRE2_NO_AUTO_CAPTURE, because Perl now has this. 1459 146031. If more than one of "push", "pushcopy", or "pushtablescopy" were set in 1461pcre2test, a crash could occur. 1462 146332. Make -bigstack in RunTest allocate a 64MiB stack (instead of 16MiB) so 1464that all the tests can run with clang's sanitizing options. 1465 146633. Implement extra compile options in the compile context and add the first 1467one: PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES. 1468 146934. Implement newline type PCRE2_NEWLINE_NUL. 1470 147135. A lookbehind assertion that had a zero-length branch caused undefined 1472behaviour when processed by pcre2_dfa_match(). This is oss-fuzz issue 1859. 1473 147436. The match limit value now also applies to pcre2_dfa_match() as there are 1475patterns that can use up a lot of resources without necessarily recursing very 1476deeply. (Compare item 10.23/36.) This should fix oss-fuzz #1761. 1477 147837. Implement PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL. 1479 148038. Fix returned offsets from regexec() when REG_STARTEND is used with a 1481starting offset greater than zero. 1482 148339. Implement REG_PEND (GNU extension) for the POSIX wrapper. 1484 148540. Implement the subject_literal modifier in pcre2test, and allow jitstack on 1486pattern lines. 1487 148841. Implement PCRE2_LITERAL and use it to support REG_NOSPEC. 1489 149042. Implement PCRE2_EXTRA_MATCH_LINE and PCRE2_EXTRA_MATCH_WORD for the benefit 1491of pcre2grep. 1492 149343. Re-implement pcre2grep's -F, -w, and -x options using PCRE2_LITERAL, 1494PCRE2_EXTRA_MATCH_WORD, and PCRE2_EXTRA_MATCH_LINE. This fixes two bugs: 1495 1496 (a) The -F option did not work for fixed strings containing \E. 1497 (b) The -w option did not work for patterns with multiple branches. 1498 149944. Added configuration options for the SELinux compatible execmem allocator in 1500JIT. 1501 150245. Increased the limit for searching for a "must be present" code unit in 1503subjects from 1000 to 2000 for 8-bit searches, since they use memchr() and are 1504much faster. 1505 150646. Arrange for anchored patterns to record and use "first code unit" data, 1507because this can give a fast "no match" without searching for a "required code 1508unit". Previously only non-anchored patterns did this. 1509 151047. Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0. 1511 151248. Add the callout_no_where modifier to pcre2test. 1513 151449. Update extended grapheme breaking rules to the latest set that are in 1515Unicode Standard Annex #29. 1516 151750. Added experimental foreign pattern conversion facilities 1518(pcre2_pattern_convert() and friends). 1519 152051. Change the macro FWRITE, used in pcre2grep, to FWRITE_IGNORE because FWRITE 1521is defined in a system header in cygwin. Also modified some of the #ifdefs in 1522pcre2grep related to Windows and Cygwin support. 1523 152452. Change 3(g) for 10.23 was a bit too zealous. If a hyphen that follows a 1525character class is the last character in the class, Perl does not give a 1526warning. PCRE2 now also treats this as a literal. 1527 152853. Related to 52, though PCRE2 was throwing an error for [[:digit:]-X] it was 1529not doing so for [\d-X] (and similar escapes), as is documented. 1530 153154. Fixed a MIPS issue in the JIT compiler reported by Joshua Kinard. 1532 153355. Fixed a "maybe uninitialized" warning for class_uchardata in \p handling in 1534pcre2_compile() which could never actually trigger (code should have been cut 1535out when Unicode support is disabled). 1536 1537 1538Version 10.23 14-February-2017 1539------------------------------ 1540 15411. Extended pcre2test with the utf8_input modifier so that it is able to 1542generate all possible 16-bit and 32-bit code unit values in non-UTF modes. 1543 15442. In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without 1545PCRE2_UCP set, a negative character type such as \D in a positive class should 1546cause all characters greater than 255 to match, whatever else is in the class. 1547There was a bug that caused this not to happen if a Unicode property item was 1548added to such a class, for example [\D\P{Nd}] or [\W\pL]. 1549 15503. There has been a major re-factoring of the pcre2_compile.c file. Most syntax 1551checking is now done in the pre-pass that identifies capturing groups. This has 1552reduced the amount of duplication and made the code tidier. While doing this, 1553some minor bugs and Perl incompatibilities were fixed, including: 1554 1555 (a) \Q\E in the middle of a quantifier such as A+\Q\E+ is now ignored instead 1556 of giving an invalid quantifier error. 1557 1558 (b) {0} can now be used after a group in a lookbehind assertion; previously 1559 this caused an "assertion is not fixed length" error. 1560 1561 (c) Perl always treats (?(DEFINE) as a "define" group, even if a group with 1562 the name "DEFINE" exists. PCRE2 now does likewise. 1563 1564 (d) A recursion condition test such as (?(R2)...) must now refer to an 1565 existing subpattern. 1566 1567 (e) A conditional recursion test such as (?(R)...) misbehaved if there was a 1568 group whose name began with "R". 1569 1570 (f) When testing zero-terminated patterns under valgrind, the terminating 1571 zero is now marked "no access". This catches bugs that would otherwise 1572 show up only with non-zero-terminated patterns. 1573 1574 (g) A hyphen appearing immediately after a POSIX character class (for example 1575 /[[:ascii:]-z]/) now generates an error. Perl does accept this as a 1576 literal, but gives a warning, so it seems best to fail it in PCRE. 1577 1578 (h) An empty \Q\E sequence may appear after a callout that precedes an 1579 assertion condition (it is, of course, ignored). 1580 1581One effect of the refactoring is that some error numbers and messages have 1582changed, and the pattern offset given for compiling errors is not always the 1583right-most character that has been read. In particular, for a variable-length 1584lookbehind assertion it now points to the start of the assertion. Another 1585change is that when a callout appears before a group, the "length of next 1586pattern item" that is passed now just gives the length of the opening 1587parenthesis item, not the length of the whole group. A length of zero is now 1588given only for a callout at the end of the pattern. Automatic callouts are no 1589longer inserted before and after explicit callouts in the pattern. 1590 1591A number of bugs in the refactored code were subsequently fixed during testing 1592before release, but after the code was made available in the repository. Many 1593of the bugs were discovered by fuzzing testing. Several of them were related to 1594the change from assuming a zero-terminated pattern (which previously had 1595required non-zero terminated strings to be copied). These bugs were never in 1596fully released code, but are noted here for the record. 1597 1598 (a) An overall recursion such as (?0) inside a lookbehind assertion was not 1599 being diagnosed as an error. 1600 1601 (b) In utf mode, the length of a *MARK (or other verb) name was being checked 1602 in characters instead of code units, which could lead to bad code being 1603 compiled, leading to unpredictable behaviour. 1604 1605 (c) In extended /x mode, characters whose code was greater than 255 caused 1606 a lookup outside one of the global tables. A similar bug existed for wide 1607 characters in *VERB names. 1608 1609 (d) The amount of memory needed for a compiled pattern was miscalculated if a 1610 lookbehind contained more than one toplevel branch and the first branch 1611 was of length zero. 1612 1613 (e) In UTF-8 or UTF-16 modes with PCRE2_EXTENDED (/x) set and a non-zero- 1614 terminated pattern, if a # comment ran on to the end of the pattern, one 1615 or more code units past the end were being read. 1616 1617 (f) An unterminated repeat at the end of a non-zero-terminated pattern (e.g. 1618 "{2,2") could cause reading beyond the pattern. 1619 1620 (g) When reading a callout string, if the end delimiter was at the end of the 1621 pattern one further code unit was read. 1622 1623 (h) An unterminated number after \g' could cause reading beyond the pattern. 1624 1625 (i) An insufficient memory size was being computed for compiling with 1626 PCRE2_AUTO_CALLOUT. 1627 1628 (j) A conditional group with an assertion condition used more memory than was 1629 allowed for it during parsing, so too many of them could therefore 1630 overrun a buffer. 1631 1632 (k) If parsing a pattern exactly filled the buffer, the internal test for 1633 overrun did not check when the final META_END item was added. 1634 1635 (l) If a lookbehind contained a subroutine call, and the called group 1636 contained an option setting such as (?s), and the PCRE2_ANCHORED option 1637 was set, unpredictable behaviour could occur. The underlying bug was 1638 incorrect code and insufficient checking while searching for the end of 1639 the called subroutine in the parsed pattern. 1640 1641 (m) Quantifiers following (*VERB)s were not being diagnosed as errors. 1642 1643 (n) The use of \Q...\E in a (*VERB) name when PCRE2_ALT_VERBNAMES and 1644 PCRE2_AUTO_CALLOUT were both specified caused undetermined behaviour. 1645 1646 (o) If \Q was preceded by a quantified item, and the following \E was 1647 followed by '?' or '+', and there was at least one literal character 1648 between them, an internal error "unexpected repeat" occurred (example: 1649 /.+\QX\E+/). 1650 1651 (p) A buffer overflow could occur while sorting the names in the group name 1652 list (depending on the order in which the names were seen). 1653 1654 (q) A conditional group that started with a callout was not doing the right 1655 check for a following assertion, leading to compiling bad code. Example: 1656 /(?(C'XX))?!XX/ 1657 1658 (r) If a character whose code point was greater than 0xffff appeared within 1659 a lookbehind that was within another lookbehind, the calculation of the 1660 lookbehind length went wrong and could provoke an internal error. 1661 1662 (t) The sequence \E- or \Q\E- after a POSIX class in a character class caused 1663 an internal error. Now the hyphen is treated as a literal. 1664 16654. Back references are now permitted in lookbehind assertions when there are 1666no duplicated group numbers (that is, (?| has not been used), and, if the 1667reference is by name, there is only one group of that name. The referenced 1668group must, of course be of fixed length. 1669 16705. pcre2test has been upgraded so that, when run under valgrind with valgrind 1671support enabled, reading past the end of the pattern is detected, both when 1672compiling and during callout processing. 1673 16746. \g{+<number>} (e.g. \g{+2} ) is now supported. It is a "forward back 1675reference" and can be useful in repetitions (compare \g{-<number>} ). Perl does 1676not recognize this syntax. 1677 16787. Automatic callouts are no longer generated before and after callouts in the 1679pattern. 1680 16818. When pcre2test was outputing information from a callout, the caret indicator 1682for the current position in the subject line was incorrect if it was after an 1683escape sequence for a character whose code point was greater than \x{ff}. 1684 16859. Change 19 for 10.22 had a typo (PCRE_STATIC_RUNTIME should be 1686PCRE2_STATIC_RUNTIME). Fix from David Gaussmann. 1687 168810. Added --max-buffer-size to pcre2grep, to allow for automatic buffer 1689expansion when long lines are encountered. Original patch by Dmitry 1690Cherniachenko. 1691 169211. If pcre2grep was compiled with JIT support, but the library was compiled 1693without it (something that neither ./configure nor CMake allow, but it can be 1694done by editing config.h), pcre2grep was giving a JIT error. Now it detects 1695this situation and does not try to use JIT. 1696 169712. Added some "const" qualifiers to variables in pcre2grep. 1698 169913. Added Dmitry Cherniachenko's patch for colouring output in Windows 1700(untested by me). Also, look for GREP_COLOUR or GREP_COLOR if the environment 1701variables PCRE2GREP_COLOUR and PCRE2GREP_COLOR are not found. 1702 170314. Add the -t (grand total) option to pcre2grep. 1704 170515. A number of bugs have been mended relating to match start-up optimizations 1706when the first thing in a pattern is a positive lookahead. These all applied 1707only when PCRE2_NO_START_OPTIMIZE was *not* set: 1708 1709 (a) A pattern such as (?=.*X)X$ was incorrectly optimized as if it needed 1710 both an initial 'X' and a following 'X'. 1711 (b) Some patterns starting with an assertion that started with .* were 1712 incorrectly optimized as having to match at the start of the subject or 1713 after a newline. There are cases where this is not true, for example, 1714 (?=.*[A-Z])(?=.{8,16})(?!.*[\s]) matches after the start in lines that 1715 start with spaces. Starting .* in an assertion is no longer taken as an 1716 indication of matching at the start (or after a newline). 1717 171816. The "offset" modifier in pcre2test was not being ignored (as documented) 1719when the POSIX API was in use. 1720 172117. Added --enable-fuzz-support to "configure", causing an non-installed 1722library containing a test function that can be called by fuzzers to be 1723compiled. A non-installed binary to run the test function locally, called 1724pcre2fuzzcheck is also compiled. 1725 172618. A pattern with PCRE2_DOTALL (/s) set but not PCRE2_NO_DOTSTAR_ANCHOR, and 1727which started with .* inside a positive lookahead was incorrectly being 1728compiled as implicitly anchored. 1729 173019. Removed all instances of "register" declarations, as they are considered 1731obsolete these days and in any case had become very haphazard. 1732 173320. Add strerror() to pcre2test for failed file opening. 1734 173521. Make pcre2test -C list valgrind support when it is enabled. 1736 173722. Add the use_length modifier to pcre2test. 1738 173923. Fix an off-by-one bug in pcre2test for the list of names for 'get' and 1740'copy' modifiers. 1741 174224. Add PCRE2_CALL_CONVENTION into the prototype declarations in pcre2.h as it 1743is apparently needed there as well as in the function definitions. (Why did 1744nobody ask for this in PCRE1?) 1745 174625. Change the _PCRE2_H and _PCRE2_UCP_H guard macros in the header files to 1747PCRE2_H_IDEMPOTENT_GUARD and PCRE2_UCP_H_IDEMPOTENT_GUARD to be more standard 1748compliant and unique. 1749 175026. pcre2-config --libs-posix was listing -lpcre2posix instead of 1751-lpcre2-posix. Also, the CMake build process was building the library with the 1752wrong name. 1753 175427. In pcre2test, give some offset information for errors in hex patterns. 1755This uses the C99 formatting sequence %td, except for MSVC which doesn't 1756support it - %lu is used instead. 1757 175828. Implemented pcre2_code_copy_with_tables(), and added pushtablescopy to 1759pcre2test for testing it. 1760 176129. Fix small memory leak in pcre2test. 1762 176330. Fix out-of-bounds read for partial matching of /./ against an empty string 1764when the newline type is CRLF. 1765 176631. Fix a bug in pcre2test that caused a crash when a locale was set either in 1767the current pattern or a previous one and a wide character was matched. 1768 176932. The appearance of \p, \P, or \X in a substitution string when 1770PCRE2_SUBSTITUTE_EXTENDED was set caused a segmentation fault (NULL 1771dereference). 1772 177333. If the starting offset was specified as greater than the subject length in 1774a call to pcre2_substitute() an out-of-bounds memory reference could occur. 1775 177634. When PCRE2 was compiled to use the heap instead of the stack for recursive 1777calls to match(), a repeated minimizing caseless back reference, or a 1778maximizing one where the two cases had different numbers of code units, 1779followed by a caseful back reference, could lose the caselessness of the first 1780repeated back reference (example: /(Z)(a)\2{1,2}?(?-i)\1X/i should match ZaAAZX 1781but didn't). 1782 178335. When a pattern is too complicated, PCRE2 gives up trying to find a minimum 1784matching length and just records zero. Typically this happens when there are 1785too many nested or recursive back references. If the limit was reached in 1786certain recursive cases it failed to be triggered and an internal error could 1787be the result. 1788 178936. The pcre2_dfa_match() function now takes note of the recursion limit for 1790the internal recursive calls that are used for lookrounds and recursions within 1791the pattern. 1792 179337. More refactoring has got rid of the internal could_be_empty_branch() 1794function (around 400 lines of code, including comments) by keeping track of 1795could-be-emptiness as the pattern is compiled instead of scanning compiled 1796groups. (This would have been much harder before the refactoring of #3 above.) 1797This lifts a restriction on the number of branches in a group (more than about 17981100 would give "pattern is too complicated"). 1799 180038. Add the "-ac" command line option to pcre2test as a synonym for "-pattern 1801auto_callout". 1802 180339. In a library with Unicode support, incorrect data was compiled for a 1804pattern with PCRE2_UCP set without PCRE2_UTF if a class required all wide 1805characters to match (for example, /[\s[:^ascii:]]/). 1806 180740. The callout_error modifier has been added to pcre2test to make it possible 1808to return PCRE2_ERROR_CALLOUT from a callout. 1809 181041. A minor change to pcre2grep: colour reset is now "<esc>[0m" instead of 1811"<esc>[00m". 1812 181342. The limit in the auto-possessification code that was intended to catch 1814overly-complicated patterns and not spend too much time auto-possessifying was 1815being reset too often, resulting in very long compile times for some patterns. 1816Now such patterns are no longer completely auto-possessified. 1817 181843. Applied Jason Hood's revised patch for RunTest.bat. 1819 182044. Added a new Windows script RunGrepTest.bat, courtesy of Jason Hood. 1821 182245. Minor cosmetic fix to pcre2test: move a variable that is not used under 1823Windows into the "not Windows" code. 1824 182546. Applied Jason Hood's patches to upgrade pcre2grep under Windows and tidy 1826some of the code: 1827 1828 * normalised the Windows condition by ensuring WIN32 is defined; 1829 * enables the callout feature under Windows; 1830 * adds globbing (Microsoft's implementation expands quoted args), 1831 using a tweaked opendirectory; 1832 * implements the is_*_tty functions for Windows; 1833 * --color=always will write the ANSI sequences to file; 1834 * add sequences 4 (underline works on Win10) and 5 (blink as bright 1835 background, relatively standard on DOS/Win); 1836 * remove the (char *) casts for the now-const strings; 1837 * remove GREP_COLOUR (grep's command line allowed the 'u', but not 1838 the environment), parsing GREP_COLORS instead; 1839 * uses the current colour if not set, rather than black; 1840 * add print_match for the undefined case; 1841 * fixes a typo. 1842 1843In addition, colour settings containing anything other than digits and 1844semicolon are ignored, and the colour controls are no longer output for empty 1845strings. 1846 184747. Detecting patterns that are too large inside the length-measuring loop 1848saves processing ridiculously long patterns to their end. 1849 185048. Ignore PCRE2_CASELESS when processing \h, \H, \v, and \V in classes as it 1851just wastes time. In the UTF case it can also produce redundant entries in 1852XCLASS lists caused by characters with multiple other cases and pairs of 1853characters in the same "not-x" sublists. 1854 185549. A pattern such as /(?=(a\K))/ can report the end of the match being before 1856its start; pcre2test was not handling this correctly when using the POSIX 1857interface (it was OK with the native interface). 1858 185950. In pcre2grep, ignore all JIT compile errors. This means that pcre2grep will 1860continue to work, falling back to interpretation if anything goes wrong with 1861JIT. 1862 186351. Applied patches from Christian Persch to configure.ac to make use of the 1864AC_USE_SYSTEM_EXTENSIONS macro and to test for functions used by the JIT 1865modules. 1866 186752. Minor fixes to pcre2grep from Jason Hood: 1868 * fixed some spacing; 1869 * Windows doesn't usually use single quotes, so I've added a define 1870 to use appropriate quotes [in an example]; 1871 * LC_ALL was displayed as "LCC_ALL"; 1872 * numbers 11, 12 & 13 should end in "th"; 1873 * use double quotes in usage message. 1874 187553. When autopossessifying, skip empty branches without recursion, to reduce 1876stack usage for the benefit of clang with -fsanitize-address, which uses huge 1877stack frames. Example pattern: /X?(R||){3335}/. Fixes oss-fuzz issue 553. 1878 187954. A pattern with very many explicit back references to a group that is a long 1880way from the start of the pattern could take a long time to compile because 1881searching for the referenced group in order to find the minimum length was 1882being done repeatedly. Now up to 128 group minimum lengths are cached and the 1883attempt to find a minimum length is abandoned if there is a back reference to a 1884group whose number is greater than 128. (In that case, the pattern is so 1885complicated that this optimization probably isn't worth it.) This fixes 1886oss-fuzz issue 557. 1887 188855. Issue 32 for 10.22 below was not correctly fixed. If pcre2grep in multiline 1889mode with --only-matching matched several lines, it restarted scanning at the 1890next line instead of moving on to the end of the matched string, which can be 1891several lines after the start. 1892 189356. Applied Jason Hood's new patch for RunGrepTest.bat that updates it in line 1894with updates to the non-Windows version. 1895 1896 1897 1898Version 10.22 29-July-2016 1899-------------------------- 1900 19011. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3 1902to fix problems with running the tests under Windows. 1903 19042. Implemented a facility for quoting literal characters within hexadecimal 1905patterns in pcre2test, to make it easier to create patterns with just a few 1906non-printing characters. 1907 19083. Binary zeros are not supported in pcre2test input files. It now detects them 1909and gives an error. 1910 19114. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to 1912smc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so 1913that it matches only unknown objects. 1914 19155. Updated the maintenance script maint/ManyConfigTests to make it easier to 1916select individual groups of tests. 1917 19186. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option 1919used to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this 1920disables the use of back references (and subroutine calls), which are supported 1921by other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no 1922longer causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch 1923and pmatch when regexec() is called. 1924 19257. Because of 6 above, pcre2test has been modified with a new modifier called 1926posix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture 1927modifier had this effect. That option is now ignored when the POSIX API is in 1928use. 1929 19308. Minor tidies to the pcre2demo.c sample program, including more comments 1931about its 8-bit-ness. 1932 19339. Detect unmatched closing parentheses and give the error in the pre-scan 1934instead of later. Previously the pre-scan carried on and could give a 1935misleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a 1936message about invalid duplicate group names. 1937 193810. It has happened that pcre2test was accidentally linked with another POSIX 1939regex library instead of libpcre2-posix. In this situation, a call to regcomp() 1940(in the other library) may succeed, returning zero, but of course putting its 1941own data into the regex_t block. In one example the re_pcre2_code field was 1942left as NULL, which made pcre2test think it had not got a compiled POSIX regex, 1943so it treated the next line as another pattern line, resulting in a confusing 1944error message. A check has been added to pcre2test to see if the data returned 1945from a successful call of regcomp() are valid for PCRE2's regcomp(). If they 1946are not, an error message is output and the pcre2test run is abandoned. The 1947message points out the possibility of a mis-linking. Hopefully this will avoid 1948some head-scratching the next time this happens. 1949 195011. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind 1951assertion, caused pcre2test to output a very large number of spaces when the 1952callout was taken, making the program appearing to loop. 1953 195412. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply 1955nested set of parentheses of sufficient size caused an overflow of the 1956compiling workspace (which was diagnosed, but of course is not desirable). 1957 195813. Detect missing closing parentheses during the pre-pass for group 1959identification. 1960 196114. Changed some integer variable types and put in a number of casts, following 1962a report of compiler warnings from Visual Studio 2013 and a few tests with 1963gcc's -Wconversion (which still throws up a lot). 1964 196515. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test 1966for testing it. 1967 196816. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of 1969regerror(). When the error buffer is too small, my version of snprintf() puts a 1970binary zero in the final byte. Bug #1801 seems to show that other versions do 1971not do this, leading to bad output from pcre2test when it was checking for 1972buffer overflow. It no longer assumes a binary zero at the end of a too-small 1973regerror() buffer. 1974 197517. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not 1976actually affect anything, by sheer luck. 1977 197818. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect 1979"const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for 1980older MSVC compilers. This has been done both in src/pcre2_internal.h for most 1981of the library, and also in src/pcre2posix.c, which no longer includes 1982pcre2_internal.h (see 24 below). 1983 198419. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC 1985static compilation. Subsequently applied Chris Wilson's second patch, putting 1986the first patch under a new option instead of being unconditional when 1987PCRE_STATIC is set. 1988 198920. Updated pcre2grep to set stdout as binary when run under Windows, so as not 1990to convert \r\n at the ends of reflected lines into \r\r\n. This required 1991ensuring that other output that is written to stdout (e.g. file names) uses the 1992appropriate line terminator: \r\n for Windows, \n otherwise. 1993 199421. When a line is too long for pcre2grep's internal buffer, show the maximum 1995length in the error message. 1996 199722. Added support for string callouts to pcre2grep (Zoltan's patch with PH 1998additions). 1999 200023. RunTest.bat was missing a "set type" line for test 22. 2001 200224. The pcre2posix.c file was including pcre2_internal.h, and using some 2003"private" knowledge of the data structures. This is unnecessary; the code has 2004been re-factored and no longer includes pcre2_internal.h. 2005 200625. A racing condition is fixed in JIT reported by Mozilla. 2007 200826. Minor code refactor to avoid "array subscript is below array bounds" 2009compiler warning. 2010 201127. Minor code refactor to avoid "left shift of negative number" warning. 2012 201328. Add a bit more sanity checking to pcre2_serialize_decode() and document 2014that it expects trusted data. 2015 201629. Fix typo in pcre2_jit_test.c 2017 201830. Due to an oversight, pcre2grep was not making use of JIT when available. 2019This is now fixed. 2020 202131. The RunGrepTest script is updated to use the valgrind suppressions file 2022when testing with JIT under valgrind (compare 10.21/51 below). The suppressions 2023file is updated so that is now the same as for PCRE1: it suppresses the 2024Memcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled 2025code). Also changed smc-check=all to smc-check=all-non-file as was done for 2026RunTest (see 4 above). 2027 202832. Implemented the PCRE2_NO_JIT option for pcre2_match(). 2029 203033. Fix typo that gave a compiler error when JIT not supported. 2031 203234. Fix comment describing the returns from find_fixedlength(). 2033 203435. Fix potential negative index in pcre2test. 2035 203636. Calls to pcre2_get_error_message() with error numbers that are never 2037returned by PCRE2 functions were returning empty strings. Now the error code 2038PCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to 2039show the texts for given error numbers (i.e. to call pcre2_get_error_message() 2040and display what it returns) and a few representative error codes are now 2041checked in RunTest. 2042 204337. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in 2044pcre2_match.c, in anticipation that this is needed for the same reason it was 2045recently added to pcrecpp.cc in PCRE1. 2046 204738. Using -o with -M in pcre2grep could cause unnecessary repeated output when 2048the match extended over a line boundary, as it tried to find more matches "on 2049the same line" - but it was already over the end. 2050 205139. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it 2052to the same code as '.' when PCRE2_DOTALL is set). 2053 205440. Fix two clang compiler warnings in pcre2test when only one code unit width 2055is supported. 2056 205741. Upgrade RunTest to automatically re-run test 2 with a large (64MiB) stack 2058if it fails when running the interpreter with a 16MiB stack (and if changing 2059the stack size via pcre2test is possible). This avoids having to manually set a 2060large stack size when testing with clang. 2061 206242. Fix register overwite in JIT when SSE2 acceleration is enabled. 2063 206443. Detect integer overflow in pcre2test pattern and data repetition counts. 2065 206644. In pcre2test, ignore "allcaptures" after DFA matching. 2067 206845. Fix unaligned accesses on x86. Patch by Marc Mutz. 2069 207046. Fix some more clang compiler warnings. 2071 2072 2073Version 10.21 12-January-2016 2074----------------------------- 2075 20761. Improve matching speed of patterns starting with + or * in JIT. 2077 20782. Use memchr() to find the first character in an unanchored match in 8-bit 2079mode in the interpreter. This gives a significant speed improvement. 2080 20813. Removed a redundant copy of the opcode_possessify table in the 2082pcre2_auto_possessify.c source. 2083 20844. Fix typos in dftables.c for z/OS. 2085 20865. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that 2087processing them could involve a buffer overflow if the following character was 2088an opening parenthesis. 2089 20906. Change 36 for 10.20 also introduced a bug in processing this pattern: 2091/((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK) 2092setting (which (*:0) is), then (?x) did not get unset at the end of its group 2093during the scan for named groups, and hence the external # was incorrectly 2094treated as a comment and the invalid (?' at the end of the pattern was not 2095diagnosed. This caused a buffer overflow during the real compile. This bug was 2096discovered by Karl Skomski with the LLVM fuzzer. 2097 20987. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its 2099own source module to avoid a circular dependency between src/pcre2_compile.c 2100and src/pcre2_study.c 2101 21028. A callout with a string argument containing an opening square bracket, for 2103example /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer 2104overflow. This bug was discovered by Karl Skomski with the LLVM fuzzer. 2105 21069. The handling of callouts during the pre-pass for named group identification 2107has been tightened up. 2108 210910. The quantifier {1} can be ignored, whether greedy, non-greedy, or 2110possessive. This is a very minor optimization. 2111 211211. A possessively repeated conditional group that could match an empty string, 2113for example, /(?(R))*+/, was incorrectly compiled. 2114 211512. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian 2116Persch). 2117 211813. An empty comment (?#) in a pattern was incorrectly processed and could 2119provoke a buffer overflow. This bug was discovered by Karl Skomski with the 2120LLVM fuzzer. 2121 212214. Fix infinite recursion in the JIT compiler when certain patterns such as 2123/(?:|a|){100}x/ are analysed. 2124 212515. Some patterns with character classes involving [: and \\ were incorrectly 2126compiled and could cause reading from uninitialized memory or an incorrect 2127error diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The 2128first of these bugs was discovered by Karl Skomski with the LLVM fuzzer. 2129 213016. Pathological patterns containing many nested occurrences of [: caused 2131pcre2_compile() to run for a very long time. This bug was found by the LLVM 2132fuzzer. 2133 213417. A missing closing parenthesis for a callout with a string argument was not 2135being diagnosed, possibly leading to a buffer overflow. This bug was found by 2136the LLVM fuzzer. 2137 213818. A conditional group with only one branch has an implicit empty alternative 2139branch and must therefore be treated as potentially matching an empty string. 2140 214119. If (?R was followed by - or + incorrect behaviour happened instead of a 2142diagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer. 2143 214420. Another bug that was introduced by change 36 for 10.20: conditional groups 2145whose condition was an assertion preceded by an explicit callout with a string 2146argument might be incorrectly processed, especially if the string contained \Q. 2147This bug was discovered by Karl Skomski with the LLVM fuzzer. 2148 214921. Compiling PCRE2 with the sanitize options of clang showed up a number of 2150very pedantic coding infelicities and a buffer overflow while checking a UTF-8 2151string if the final multi-byte UTF-8 character was truncated. 2152 215322. For Perl compatibility in EBCDIC environments, ranges such as a-z in a 2154class, where both values are literal letters in the same case, omit the 2155non-letter EBCDIC code points within the range. 2156 215723. Finding the minimum matching length of complex patterns with back 2158references and/or recursions can take a long time. There is now a cut-off that 2159gives up trying to find a minimum length when things get too complex. 2160 216124. An optimization has been added that speeds up finding the minimum matching 2162length for patterns containing repeated capturing groups or recursions. 2163 216425. If a pattern contained a back reference to a group whose number was 2165duplicated as a result of appearing in a (?|...) group, the computation of the 2166minimum matching length gave a wrong result, which could cause incorrect "no 2167match" errors. For such patterns, a minimum matching length cannot at present 2168be computed. 2169 217026. Added a check for integer overflow in conditions (?(<digits>) and 2171(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM 2172fuzzer. 2173 217427. Fixed an issue when \p{Any} inside an xclass did not read the current 2175character. 2176 217728. If pcre2grep was given the -q option with -c or -l, or when handling a 2178binary file, it incorrectly wrote output to stdout. 2179 218029. The JIT compiler did not restore the control verb head in case of *THEN 2181control verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer. 2182 218330. The way recursive references such as (?3) are compiled has been re-written 2184because the old way was the cause of many issues. Now, conversion of the group 2185number into a pattern offset does not happen until the pattern has been 2186completely compiled. This does mean that detection of all infinitely looping 2187recursions is postponed till match time. In the past, some easy ones were 2188detected at compile time. This re-writing was done in response to yet another 2189bug found by the LLVM fuzzer. 2190 219131. A test for a back reference to a non-existent group was missing for items 2192such as \987. This caused incorrect code to be compiled. This issue was found 2193by Karl Skomski with a custom LLVM fuzzer. 2194 219532. Error messages for syntax errors following \g and \k were giving inaccurate 2196offsets in the pattern. 2197 219833. Improve the performance of starting single character repetitions in JIT. 2199 220034. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0. 2201 220235. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now 2203give the right offset instead of zero. 2204 220536. The JIT compiler should not check repeats after a {0,1} repeat byte code. 2206This issue was found by Karl Skomski with a custom LLVM fuzzer. 2207 220837. The JIT compiler should restore the control chain for empty possessive 2209repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer. 2210 221138. A bug which was introduced by the single character repetition optimization 2212was fixed. 2213 221439. Match limit check added to recursion. This issue was found by Karl Skomski 2215with a custom LLVM fuzzer. 2216 221740. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look 2218only at the part of the subject that is relevant when the starting offset is 2219non-zero. 2220 222141. Improve first character match in JIT with SSE2 on x86. 2222 222342. Fix two assertion fails in JIT. These issues were found by Karl Skomski 2224with a custom LLVM fuzzer. 2225 222643. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy 2227III). 2228 222944. Fix bug in RunTest.bat for new test 14, and adjust the script for the added 2230test (there are now 20 in total). 2231 223245. Fixed a corner case of range optimization in JIT. 2233 223446. Add the ${*MARK} facility to pcre2_substitute(). 2235 223647. Modifier lists in pcre2test were splitting at spaces without the required 2237commas. 2238 223948. Implemented PCRE2_ALT_VERBNAMES. 2240 224149. Fixed two issues in JIT. These were found by Karl Skomski with a custom 2242LLVM fuzzer. 2243 224450. The pcre2test program has been extended by adding the #newline_default 2245command. This has made it possible to run the standard tests when PCRE2 is 2246compiled with either CR or CRLF as the default newline convention. As part of 2247this work, the new command was added to several test files and the testing 2248scripts were modified. The pcre2grep tests can now also be run when there is no 2249LF in the default newline convention. 2250 225151. The RunTest script has been modified so that, when JIT is used and valgrind 2252is specified, a valgrind suppressions file is set up to ignore "Invalid read of 2253size 16" errors because these are false positives when the hardware supports 2254the SSE2 instruction set. 2255 225652. It is now possible to have comment lines amid the subject strings in 2257pcre2test (and perltest.sh) input. 2258 225953. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit(). 2260 226154. Add the null_context modifier to pcre2test so that calling pcre2_compile() 2262and the matching functions with NULL contexts can be tested. 2263 226455. Implemented PCRE2_SUBSTITUTE_EXTENDED. 2265 226656. In a character class such as [\W\p{Any}] where both a negative-type escape 2267("not a word character") and a property escape were present, the property 2268escape was being ignored. 2269 227057. Fixed integer overflow for patterns whose minimum matching length is very, 2271very large. 2272 227358. Implemented --never-backslash-C. 2274 227559. Change 55 above introduced a bug by which certain patterns provoked the 2276erroneous error "\ at end of pattern". 2277 227860. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling 2279errors or other strange effects if compiled in UCP mode. Found with libFuzzer 2280and AddressSanitizer. 2281 228261. Whitespace at the end of a pcre2test pattern line caused a spurious error 2283message if there were only single-character modifiers. It should be ignored. 2284 228562. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results 2286or segmentation errors for some patterns. Found with libFuzzer and 2287AddressSanitizer. 2288 228963. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer 2290overflow. 2291 229264. Improve error message for overly-complicated patterns. 2293 229465. Implemented an optional replication feature for patterns in pcre2test, to 2295make it easier to test long repetitive patterns. The tests for 63 above are 2296converted to use the new feature. 2297 229866. In the POSIX wrapper, if regerror() was given too small a buffer, it could 2299misbehave. 2300 230167. In pcre2_substitute() in UTF mode, the UTF validity check on the 2302replacement string was happening before the length setting when the replacement 2303string was zero-terminated. 2304 230568. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the 2306second and subsequent calls to pcre2_match(). 2307 230869. There was no check for integer overflow for a replacement group number in 2309pcre2_substitute(). An added check for a number greater than the largest group 2310number in the pattern means this is not now needed. 2311 231270. The PCRE2-specific VERSION condition didn't work correctly if only one 2313digit was given after the decimal point, or if more than two digits were given. 2314It now works with one or two digits, and gives a compile time error if more are 2315given. 2316 231771. In pcre2_substitute() there was the possibility of reading one code unit 2318beyond the end of the replacement string. 2319 232072. The code for checking a subject's UTF-32 validity for a pattern with a 2321lookbehind involved an out-of-bounds pointer, which could potentially cause 2322trouble in some environments. 2323 232473. The maximum lookbehind length was incorrectly calculated for patterns such 2325as /(?<=(a)(?-1))x/ which have a recursion within a backreference. 2326 232774. Give an error if a lookbehind assertion is longer than 65535 code units. 2328 232975. Give an error in pcre2_substitute() if a match ends before it starts (as a 2330result of the use of \K). 2331 233276. Check the length of subpattern names and the names in (*MARK:xx) etc. 2333dynamically to avoid the possibility of integer overflow. 2334 233577. Implement pcre2_set_max_pattern_length() so that programs can restrict the 2336size of patterns that they are prepared to handle. 2337 233878. (*NO_AUTO_POSSESS) was not working. 2339 234079. Adding group information caching improves the speed of compiling when 2341checking whether a group has a fixed length and/or could match an empty string, 2342especially when recursion or subroutine calls are involved. However, this 2343cannot be used when (?| is present in the pattern because the same number may 2344be used for groups of different sizes. To catch runaway patterns in this 2345situation, counts have been introduced to the functions that scan for empty 2346branches or compute fixed lengths. 2347 234880. Allow for the possibility of the size of the nest_save structure not being 2349a factor of the size of the compiling workspace (it currently is). 2350 235181. Check for integer overflow in minimum length calculation and cap it at 235265535. 2353 235482. Small optimizations in code for finding the minimum matching length. 2355 235683. Lock out configuring for EBCDIC with non-8-bit libraries. 2357 235884. Test for error code <= 0 in regerror(). 2359 236085. Check for too many replacements (more than INT_MAX) in pcre2_substitute(). 2361 236286. Avoid the possibility of computing with an out-of-bounds pointer (though 2363not dereferencing it) while handling lookbehind assertions. 2364 236587. Failure to get memory for the match data in regcomp() is now given as a 2366regcomp() error instead of waiting for regexec() to pick it up. 2367 236888. In pcre2_substitute(), ensure that CRLF is not split when it is a valid 2369newline sequence. 2370 237189. Paranoid check in regcomp() for bad error code from pcre2_compile(). 2372 237390. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well 2374as for link size 2. 2375 237691. Document that JIT has a limit on pattern size, and give more information 2377about JIT compile failures in pcre2test. 2378 237992. Implement PCRE2_INFO_HASBACKSLASHC. 2380 238193. Re-arrange valgrind support code in pcre2test to avoid spurious reports 2382with JIT (possibly caused by SSE2?). 2383 238494. Support offset_limit in JIT. 2385 238695. A sequence such as [[:punct:]b] that is, a POSIX character class followed 2387by a single ASCII character in a class item, was incorrectly compiled in UCP 2388mode. The POSIX class got lost, but only if the single character followed it. 2389 239096. [:punct:] in UCP mode was matching some characters in the range 128-255 2391that should not have been matched. 2392 239397. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all 2394characters with code points greater than 255 are in the class. When a Unicode 2395property was also in the class (if PCRE2_UCP is set, escapes such as \w are 2396turned into Unicode properties), wide characters were not correctly handled, 2397and could fail to match. 2398 239998. In pcre2test, make the "startoffset" modifier a synonym of "offset", 2400because it sets the "startoffset" parameter for pcre2_match(). 2401 240299. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between 2403an item and its qualifier (for example, A(?#comment)?B) pcre2_compile() 2404misbehaved. This bug was found by the LLVM fuzzer. 2405 2406100. The error for an invalid UTF pattern string always gave the code unit 2407offset as zero instead of where the invalidity was found. 2408 2409101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not 2410working correctly in UCP mode. 2411 2412102. Similar to 99 above, if an isolated \E was present between an item and its 2413qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug 2414was found by the LLVM fuzzer. 2415 2416103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND 2417was set when the pmatch argument was NULL. It now returns REG_INVARG. 2418 2419104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep. 2420 2421105. An empty \Q\E sequence between an item and its qualifier caused 2422pcre2_compile() to misbehave when auto callouts were enabled. This bug 2423was found by the LLVM fuzzer. 2424 2425106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or 2426other verb "name" ended with whitespace immediately before the closing 2427parenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when 2428both those options were set. 2429 2430107. In a number of places pcre2_compile() was not handling NULL characters 2431correctly, and pcre2test with the "bincode" modifier was not always correctly 2432displaying fields containing NULLS: 2433 2434 (a) Within /x extended #-comments 2435 (b) Within the "name" part of (*MARK) and other *verbs 2436 (c) Within the text argument of a callout 2437 2438108. If a pattern that was compiled with PCRE2_EXTENDED started with white 2439space or a #-type comment that was followed by (?-x), which turns off 2440PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again, 2441pcre2_compile() assumed that (?-x) applied to the whole pattern and 2442consequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix 2443for this bug means that a setting of any of the (?imsxJU) options at the start 2444of a pattern is no longer transferred to the options that are returned by 2445PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have 2446changed when the effects of those options were all moved to compile time. 2447 2448109. An escaped closing parenthesis in the "name" part of a (*verb) when 2449PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug 2450was found by the LLVM fuzzer. 2451 2452110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it 2453possible to test it. 2454 2455111. "Harden" pcre2test against ridiculously large values in modifiers and 2456command line arguments. 2457 2458112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_ 2459LENGTH. 2460 2461113. Fix printing of *MARK names that contain binary zeroes in pcre2test. 2462 2463 2464Version 10.20 30-June-2015 2465-------------------------- 2466 24671. Callouts with string arguments have been added. 2468 24692. Assertion code generator in JIT has been optimized. 2470 24713. The invalid pattern (?(?C) has a missing assertion condition at the end. The 2472pcre2_compile() function read past the end of the input before diagnosing an 2473error. This bug was discovered by the LLVM fuzzer. 2474 24754. Implemented pcre2_callout_enumerate(). 2476 24775. Fix JIT compilation of conditional blocks whose assertion is converted to 2478(*FAIL). E.g: /(?(?!))/. 2479 24806. The pattern /(?(?!)^)/ caused references to random memory. This bug was 2481discovered by the LLVM fuzzer. 2482 24837. The assertion (?!) is optimized to (*FAIL). This was not handled correctly 2484when this assertion was used as a condition, for example (?(?!)a|b). In 2485pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect 2486error about an unsupported item. 2487 24888. For some types of pattern, for example /Z*(|d*){216}/, the auto- 2489possessification code could take exponential time to complete. A recursion 2490depth limit of 1000 has been imposed to limit the resources used by this 2491optimization. This infelicity was discovered by the LLVM fuzzer. 2492 24939. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class 2494such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored 2495because \S ensures they are all in the class. The code for doing this was 2496interacting badly with the code for computing the amount of space needed to 2497compile the pattern, leading to a buffer overflow. This bug was discovered by 2498the LLVM fuzzer. 2499 250010. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside 2501other kinds of group caused stack overflow at compile time. This bug was 2502discovered by the LLVM fuzzer. 2503 250411. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment 2505between a subroutine call and its quantifier was incorrectly compiled, leading 2506to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer. 2507 250812. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an 2509assertion after (?(. The code was failing to check the character after (?(?< 2510for the ! or = that would indicate a lookbehind assertion. This bug was 2511discovered by the LLVM fuzzer. 2512 251313. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with 2514a fixed maximum following a group that contains a subroutine reference was 2515incorrectly compiled and could trigger buffer overflow. This bug was discovered 2516by the LLVM fuzzer. 2517 251814. Negative relative recursive references such as (?-7) to non-existent 2519subpatterns were not being diagnosed and could lead to unpredictable behaviour. 2520This bug was discovered by the LLVM fuzzer. 2521 252215. The bug fixed in 14 was due to an integer variable that was unsigned when 2523it should have been signed. Some other "int" variables, having been checked, 2524have either been changed to uint32_t or commented as "must be signed". 2525 252616. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1))) 2527caused a stack overflow instead of the diagnosis of a non-fixed length 2528lookbehind assertion. This bug was discovered by the LLVM fuzzer. 2529 253017. The use of \K in a positive lookbehind assertion in a non-anchored pattern 2531(e.g. /(?<=\Ka)/) could make pcre2grep loop. 2532 253318. There was a similar problem to 17 in pcre2test for global matches, though 2534the code there did catch the loop. 2535 253619. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*), 2537and a subsequent item in the pattern caused a non-match, backtracking over the 2538repeated \X did not stop, but carried on past the start of the subject, causing 2539reference to random memory and/or a segfault. There were also some other cases 2540where backtracking after \C could crash. This set of bugs was discovered by the 2541LLVM fuzzer. 2542 254320. The function for finding the minimum length of a matching string could take 2544a very long time if mutual recursion was present many times in a pattern, for 2545example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has 2546been implemented. This infelicity was discovered by the LLVM fuzzer. 2547 254821. Implemented PCRE2_NEVER_BACKSLASH_C. 2549 255022. The feature for string replication in pcre2test could read from freed 2551memory if the replication required a buffer to be extended, and it was not 2552working properly in 16-bit and 32-bit modes. This issue was discovered by a 2553fuzzer: see http://lcamtuf.coredump.cx/afl/. 2554 255523. Added the PCRE2_ALT_CIRCUMFLEX option. 2556 255724. Adjust the treatment of \8 and \9 to be the same as the current Perl 2558behaviour. 2559 256025. Static linking against the PCRE2 library using the pkg-config module was 2561failing on missing pthread symbols. 2562 256326. If a group that contained a recursive back reference also contained a 2564forward reference subroutine call followed by a non-forward-reference 2565subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to 2566compile correct code, leading to undefined behaviour or an internally detected 2567error. This bug was discovered by the LLVM fuzzer. 2568 256927. Quantification of certain items (e.g. atomic back references) could cause 2570incorrect code to be compiled when recursive forward references were involved. 2571For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was 2572discovered by the LLVM fuzzer. 2573 257428. A repeated conditional group whose condition was a reference by name caused 2575a buffer overflow if there was more than one group with the given name. This 2576bug was discovered by the LLVM fuzzer. 2577 257829. A recursive back reference by name within a group that had the same name as 2579another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/. 2580This bug was discovered by the LLVM fuzzer. 2581 258230. A forward reference by name to a group whose number is the same as the 2583current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a 2584buffer overflow at compile time. This bug was discovered by the LLVM fuzzer. 2585 258631. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1 2587as an int; fixed by writing it as 1u). 2588 258932. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives 2590a warning for "fileno" unless -std=gnu99 us used. 2591 259233. A lookbehind assertion within a set of mutually recursive subpatterns could 2593provoke a buffer overflow. This bug was discovered by the LLVM fuzzer. 2594 259534. Give an error for an empty subpattern name such as (?''). 2596 259735. Make pcre2test give an error if a pattern that follows #forbud_utf contains 2598\P, \p, or \X. 2599 260036. The way named subpatterns are handled has been refactored. There is now a 2601pre-pass over the regex which does nothing other than identify named 2602subpatterns and count the total captures. This means that information about 2603named patterns is known before the rest of the compile. In particular, it means 2604that forward references can be checked as they are encountered. Previously, the 2605code for handling forward references was contorted and led to several errors in 2606computing the memory requirements for some patterns, leading to buffer 2607overflows. 2608 260937. There was no check for integer overflow in subroutine calls such as (?123). 2610 261138. The table entry for \l in EBCDIC environments was incorrect, leading to its 2612being treated as a literal 'l' instead of causing an error. 2613 261439. If a non-capturing group containing a conditional group that could match 2615an empty string was repeated, it was not identified as matching an empty string 2616itself. For example: /^(?:(?(1)x|)+)+$()/. 2617 261840. In an EBCDIC environment, pcretest was mishandling the escape sequences 2619\a and \e in test subject lines. 2620 262141. In an EBCDIC environment, \a in a pattern was converted to the ASCII 2622instead of the EBCDIC value. 2623 262442. The handling of \c in an EBCDIC environment has been revised so that it is 2625now compatible with the specification in Perl's perlebcdic page. 2626 262743. Single character repetition in JIT has been improved. 20-30% speedup 2628was achieved on certain patterns. 2629 263044. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in 2631ASCII/Unicode. This has now been added to the list of characters that are 2632recognized as white space in EBCDIC. 2633 263445. When PCRE2 was compiled without Unicode support, the use of \p and \P gave 2635an error (correctly) when used outside a class, but did not give an error 2636within a class. 2637 263846. \h within a class was incorrectly compiled in EBCDIC environments. 2639 264047. JIT should return with error when the compiled pattern requires 2641more stack space than the maximum. 2642 264348. Fixed a memory leak in pcre2grep when a locale is set. 2644 2645 2646Version 10.10 06-March-2015 2647--------------------------- 2648 26491. When a pattern is compiled, it remembers the highest back reference so that 2650when matching, if the ovector is too small, extra memory can be obtained to 2651use instead. A conditional subpattern whose condition is a check on a capture 2652having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is 2653another kind of back reference, but it was not setting the highest 2654backreference number. This mattered only if pcre2_match() was called with an 2655ovector that was too small to hold the capture, and there was no other kind of 2656back reference (a situation which is probably quite rare). The effect of the 2657bug was that the condition was always treated as FALSE when the capture could 2658not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug 2659has been fixed. 2660 26612. Functions for serialization and deserialization of sets of compiled patterns 2662have been added. 2663 26643. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove 2665excess code units at the end of the data block that may occasionally occur if 2666the code for calculating the size over-estimates. This change stops the 2667serialization code copying uninitialized data, to which valgrind objects. The 2668documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not 2669include the general overhead. This has been corrected. 2670 26714. All code units in every slot in the table of group names are now set, again 2672in order to avoid accessing uninitialized data when serializing. 2673 26745. The (*NO_JIT) feature is implemented. 2675 26766. If a bug that caused pcre2_compile() to use more memory than allocated was 2677triggered when using valgrind, the code in (3) above passed a stupidly large 2678value to valgrind. This caused a crash instead of an "internal error" return. 2679 26807. A reference to a duplicated named group (either a back reference or a test 2681for being set in a conditional) that occurred in a part of the pattern where 2682PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern 2683to be incorrectly calculated, leading to overwriting. 2684 26858. A mutually recursive set of back references such as (\2)(\1) caused a 2686segfault at compile time (while trying to find the minimum matching length). 2687The infinite loop is now broken (with the minimum length unset, that is, zero). 2688 26899. If an assertion that was used as a condition was quantified with a minimum 2690of zero, matching went wrong. In particular, if the whole group had unlimited 2691repetition and could match an empty string, a segfault was likely. The pattern 2692(?(?=0)?)+ is an example that caused this. Perl allows assertions to be 2693quantified, but not if they are being used as conditions, so the above pattern 2694is faulted by Perl. PCRE2 has now been changed so that it also rejects such 2695patterns. 2696 269710. The error message for an invalid quantifier has been changed from "nothing 2698to repeat" to "quantifier does not follow a repeatable item". 2699 270011. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but 2701scanning the compiled pattern in subsequent auto-possessification can get out 2702of step and lead to an unknown opcode. Previously this could have caused an 2703infinite loop. Now it generates an "internal error" error. This is a tidyup, 2704not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an 2705undefined outcome. 2706 270712. A UTF pattern containing a "not" match of a non-ASCII character and a 2708subroutine reference could loop at compile time. Example: /[^\xff]((?1))/. 2709 271013. The locale test (RunTest 3) has been upgraded. It now checks that a locale 2711that is found in the output of "locale -a" can actually be set by pcre2test 2712before it is accepted. Previously, in an environment where a locale was listed 2713but would not set (an example does exist), the test would "pass" without 2714actually doing anything. Also the fr_CA locale has been added to the list of 2715locales that can be used. 2716 271714. Fixed a bug in pcre2_substitute(). If a replacement string ended in a 2718capturing group number without parentheses, the last character was incorrectly 2719literally included at the end of the replacement string. 2720 272115. A possessive capturing group such as (a)*+ with a minimum repeat of zero 2722failed to allow the zero-repeat case if pcre2_match() was called with an 2723ovector too small to capture the group. 2724 272516. Improved error message in pcre2test when setting the stack size (-S) fails. 2726 272717. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the 2728transfer from PCRE1, meaning that CMake configuration failed if "build tests" 2729was selected. (2) The file src/pcre2_serialize.c had not been added to the list 2730of PCRE2 sources, which caused a failure to build pcre2test. 2731 273218. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems 2733only on Windows. 2734 273519. Use binary input when reading back saved serialized patterns in pcre2test. 2736 273720. Added RunTest.bat for running the tests under Windows. 2738 273921. "make distclean" was not removing config.h, a file that may be created for 2740use with CMake. 2741 274222. A pattern such as "((?2){0,1999}())?", which has a group containing a 2743forward reference repeated a large (but limited) number of times within a 2744repeated outer group that has a zero minimum quantifier, caused incorrect code 2745to be compiled, leading to the error "internal error: previously-checked 2746referenced subpattern not found" when an incorrect memory address was read. 2747This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's 2748FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.) 2749 275023. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine 2751call within a group that also contained a recursive back reference caused 2752incorrect code to be compiled. This bug was reported as "heap overflow", 2753discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015: 2754CVE-2015-2326 was given to this.) 2755 275624. Computing the size of the JIT read-only data in advance has been a source 2757of various issues, and new ones are still appear unfortunately. To fix 2758existing and future issues, size computation is eliminated from the code, 2759and replaced by on-demand memory allocation. 2760 276125. A pattern such as /(?i)[A-`]/, where characters in the other case are 2762adjacent to the end of the range, and the range contained characters with more 2763than one other case, caused incorrect behaviour when compiled in UTF mode. In 2764that example, the range a-j was left out of the class. 2765 2766 2767Version 10.00 05-January-2015 2768----------------------------- 2769 2770Version 10.00 is the first release of PCRE2, a revised API for the PCRE 2771library. Changes prior to 10.00 are logged in the ChangeLog file for the old 2772API, up to item 20 for release 8.36. 2773 2774The code of the library was heavily revised as part of the new API 2775implementation. Details of each and every modification were not individually 2776logged. In addition to the API changes, the following changes were made. They 2777are either new functionality, or bug fixes and other noticeable changes of 2778behaviour that were implemented after the code had been forked. 2779 27801. Including Unicode support at build time is now enabled by default, but it 2781can optionally be disabled. It is not enabled by default at run time (no 2782change). 2783 27842. The test program, now called pcre2test, was re-specified and almost 2785completely re-written. Its input is not compatible with input for pcretest. 2786 27873. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the 2788PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is 2789matched by that pattern. 2790 27914. For the benefit of those who use PCRE2 via some other application, that is, 2792not writing the function calls themselves, it is possible to check the PCRE2 2793version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a 2794string such as "yesno". 2795 27965. There are case-equivalent Unicode characters whose encodings use different 2797numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is 2798theoretically possible for this to happen in UTF-16 too.) If a backreference to 2799a group containing one of these characters was greedily repeated, and during 2800the match a backtrack occurred, the subject might be backtracked by the wrong 2801number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly 2802(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should 2803capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. 2804Incorrect backtracking meant that group 2 captured only the last two bytes. 2805This bug has been fixed; the new code is slower, but it is used only when the 2806strings matched by the repetition are not all the same length. 2807 28086. A pattern such as /()a/ was not setting the "first character must be 'a'" 2809information. This applied to any pattern with a group that matched no 2810characters, for example: /(?:(?=.)|(?<!x))a/. 2811 28127. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for 2813those parentheses to be closed with whatever has been captured so far. However, 2814it was failing to mark any other groups between the highest capture so far and 2815the currrent group as "unset". Thus, the ovector for those groups contained 2816whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when 2817matched against "abcd". 2818 28198. The pcre2_substitute() function has been implemented. 2820 28219. If an assertion used as a condition was quantified with a minimum of zero 2822(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could 2823occur. 2824 282510. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented. 2826 2827**** 2828