Lines Matching full:is
7 PCRE2 is a new API for PCRE, starting at release 10.0. This document contains a
278 backward compatibility. They should not be used in new code. The first is
279 replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
309 patterns that can be processed by \fBpcre2_compile()\fP. This facility is
322 units, respectively. However, there is just one header file, \fBpcre2.h\fP.
342 example, PCRE2_UCHAR16 is usually defined as `uint16_t'. The SPTR types are
343 constant pointers to the equivalent UCHAR types, that is, they are pointers to
350 PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it
356 including \fBpcre2.h\fP, and then use the real function names. Any code that is
357 to be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is
358 unknown should also use the real function names. (Unfortunately, it is not
361 If PCRE2_CODE_UNIT_WIDTH is not defined before including \fBpcre2.h\fP, a
378 PCRE2 has its own native API, which is described in this document. There are
399 sample program that demonstrates the simplest way of using them is provided in
401 of this program is given in the
417 Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
422 support is not available.
428 JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
429 unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
437 A second matching function, \fBpcre2_dfa_match()\fP, which is not
438 Perl-compatible, is also provided. This uses a different algorithm for the
443 and disadvantages is given in the
447 documentation. There is no JIT support for \fBpcre2_dfa_match()\fP.
465 functions is called with a NULL argument, the function returns immediately
480 blocks of various sorts. In all cases, if one of these functions is called with
488 several places. These values are always of type PCRE2_SIZE, which is an
490 value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved
492 Therefore, the longest string that can be handled is one less than this
508 Each of the first three conventions is used by at least one operating system as
509 its standard newline sequence. When PCRE2 is built, a default can be specified.
510 If it is not, the default is set to LF, which is the Unix standard. However,
519 In the PCRE2 documentation the word "newline" is used to mean "the character or
522 metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
524 non-anchored pattern. There is more detail about this in the
539 In a multithreaded application it is important to keep thread-specific data
541 itself is thread-safe: it contains no static or global variables. The API is
552 A pointer to the compiled form of a pattern is returned to the user when
553 \fBpcre2_compile()\fP is successful. The data in the compiled pattern is fixed,
554 and does not change when the pattern is matched. Therefore, it is thread-safe,
555 that is, the same compiled pattern can be used by more than one thread
558 just-in-time (JIT) optimization feature is being used, it needs separate memory
568 is somewhat tricky to do correctly. If you know that writing to a pointer is
582 The reason for checking the pointer a second time is as follows: Several
592 is not sufficient. The thread that is doing the compiling may be descheduled
610 If JIT is being used, but the JIT compilation is not being done immediately
611 (perhaps waiting to see if the pattern is used often enough), similar logic is
623 functions are called. A context is nothing more than a collection of parameters
625 in a context is a convenient way of passing them to a PCRE2 function without
653 directly. A context is just a block of memory that holds the parameter values.
655 NULL when a context pointer is required.
657 There are three different types of context: a general context that is relevant
666 library. The context is named `general' rather than specifically `memory'
669 general context. A general context is created by:
683 Whenever code in PCRE2 calls these functions, the final argument is the value
686 \fImalloc()\fP and \fIfree()\fP are used. (This is not currently useful, as
688 The \fIprivate_malloc()\fP function is used (if supplied) to obtain memory for
693 used. When the time comes to free the block, this function is called.
708 If this function is passed a NULL argument, it returns immediately without
716 A compile context is required if you want to provide an external function for
727 A compile context is also required if you are using custom memory management.
731 A compile context is created, copied, and freed by the following functions:
743 A compile context is created with default values for its parameters. These can
745 PCRE2_ERROR_BADDATA if invalid data is detected.
754 ending sequence. The value is used by the JIT compiler and by the two
764 argument is a general context. This function builds a set of character tables
789 This sets a maximum length, in code units, for any pattern string that is
790 compiled with this context. If the pattern is longer, an error is generated.
791 This facility is provided so that applications that accept patterns from
792 external sources can limit their size. The default is the largest number that a
793 PCRE2_SIZE variable can hold, which is effectively unlimited.
805 NUL character, that is a binary zero).
814 When a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE
816 comments starting with #. The value is saved with the compiled pattern for
825 This parameter adjusts the limit, set when PCRE2 is built (default 250), on the
835 There is at least one application that runs PCRE2 in threads with very limited
836 system stack, where running out of stack is to be avoided at all costs. The
837 parenthesis limit above cannot take account of how much stack is actually
839 that is called whenever \fBpcre2_compile()\fP starts to compile a parenthesized
844 nesting, and the second is user data that is set up by the last argument of
846 zero if all is well, or non-zero to force an error.
853 A match context is required if you want to:
865 A match context is created, copied, and freed by the following functions:
877 A match context is created with default values for its parameters. These can
879 PCRE2_ERROR_BADDATA if invalid data is detected.
914 advance in the subject string. The default value is PCRE2_UNSET. The
917 offset is not found. The \fBpcre2_substitute()\fP function makes no more
920 For example, if the pattern /abc/ is matched against "123abc" with an offset
921 limit less than 3, the result is PCRE2_ERROR_NOMATCH. A match can never be
923 \fBpcre2_dfa_match()\fP, or \fBpcre2_substitute()\fP is greater than the offset
927 calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
928 compiled. If a match is started with a non-default match limit when
929 PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
934 newline that follows the start of matching in the subject. If this is set with
936 offset limit. In other words, whichever limit comes first is used.
953 documentation for more details). If the limit is reached, the negative error
954 code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
955 is built; if it is not, the default is set very large and is essentially
963 where ddd is a decimal number. However, such a setting is ignored unless ddd is
965 limit is set, less than the default.
969 there are (that is, the deeper the search tree), the more memory is needed.
970 Heap memory is used only if the initial vector is too small. If the heap limit
975 Similarly, for \fBpcre2_dfa_match()\fP, a vector on the system stack is used
977 this is not big enough is heap memory used. In this case, too, setting a value
988 trees. The classic example is a pattern that uses nested unlimited repeats.
990 There is an internal counter in \fBpcre2_match()\fP that is incremented each
996 though the counting is done in a different way.
998 When \fBpcre2_match()\fP is called with a pattern that was successfully
999 processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
1000 is entirely different. However, there is still the possibility of runaway
1005 The default value for the limit can be set when PCRE2 is built; the default
1006 default is 10 million, which handles all but the most extreme cases. A value
1012 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1014 \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
1022 Each time a nested backtracking point is passed, a new memory "frame" is used
1024 indirectly limits the amount of memory that is used in a match. However,
1030 The depth limit is not relevant, and is ignored, when matching is done using
1031 JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
1034 limits, indirectly, the amount of system stack that is used. It was more useful
1040 If the depth of internal recursive function calls is great enough, local
1042 depth limit also indirectly limits the amount of heap memory that is used. A
1044 using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
1048 The default value for the depth limit can be set when PCRE2 is built; if it is
1049 not, the default is set to the same value as the default for the match limit.
1050 If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
1056 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1058 \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
1074 The first argument for \fBpcre2_config()\fP specifies which information is
1075 required. The second argument is a pointer to memory into which the information
1076 is placed. If NULL is passed, the function returns the amount of memory that is
1078 the value is in bytes; when requesting these values, \fIwhere\fP should point
1080 length is given in code units, not counting the terminating zero.
1082 When requesting information, the returned value from \fBpcre2_config()\fP is
1084 the value in the first argument is not recognized. The following information is
1089 The output is a uint32_t integer whose value indicates what character
1093 default can be overridden when a pattern is compiled.
1097 The output is a uint32_t integer whose lower bits indicate which code unit
1103 The output is a uint32_t integer that gives the default limit for the depth of
1110 The output is a uint32_t integer that gives, in kibibytes, the default limit
1117 The output is a uint32_t integer that is set to one if support for just-in-time
1118 compiling is available; otherwise it is set to zero.
1122 The \fIwhere\fP argument should point to a buffer that is at least 48 code
1124 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with a
1125 string that contains the name of the architecture for which the JIT compiler is
1127 is not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of
1128 code units used is returned. This is the length of the string, plus one unit
1133 The output is a uint32_t integer that contains the number of bytes used for
1134 internal linkage in compiled regular expressions. When PCRE2 is configured, the
1135 value can be set to 2, 3, or 4, with the default being 2. This is the value
1136 that is returned by \fBpcre2_config()\fP. However, when the 16-bit library is
1137 compiled, a value of 3 is rounded up to 4, and when the 32-bit library is
1138 compiled, internal linkages always use 4 bytes, so the configured value is not
1141 The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
1148 The output is a uint32_t integer that gives the default match limit for
1154 The output is a uint32_t integer whose value specifies the default character
1155 sequence that is recognized as meaning "newline". The values are:
1169 The output is a uint32_t integer that is set to one if the use of \eC was
1170 permanently disabled when PCRE2 was built; otherwise it is set to zero.
1174 The output is a uint32_t integer that gives the maximum depth of nesting
1175 of parentheses (of any kind) in a pattern. This limit is imposed to cap the
1176 amount of system stack used when a pattern is compiled. It is specified when
1177 PCRE2 is built; the default is 250. This limit does not take into account the
1183 This parameter is obsolete and should not be used in new code. The output is a
1184 uint32_t integer that is always set to zero.
1188 The output is a uint32_t integer that gives the length of PCRE2's character
1198 The \fIwhere\fP argument should point to a buffer that is at least 24 code
1201 without Unicode support, the buffer is filled with the text "Unicode not
1202 supported". Otherwise, the Unicode version string (for example, "8.0.0") is
1203 inserted. The number of code units used is returned. This is the length of the
1208 The output is a uint32_t integer that is set to one if Unicode support is
1209 available; otherwise it is set to zero. Unicode support implies UTF support.
1213 The \fIwhere\fP argument should point to a buffer that is at least 24 code
1215 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
1216 the PCRE2 version string, zero-terminated. The number of code units used is
1217 returned. This is the length of the string plus one unit for the terminating
1238 The pattern is defined by a pointer to a string of code units and a length (in
1239 code units). If the pattern is zero-terminated, the length can be specified as
1243 If the compile context argument \fIccontext\fP is NULL, memory for the compiled
1244 pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
1246 free the memory by calling \fBpcre2_code_free()\fP when it is no longer needed.
1247 If \fBpcre2_code_free()\fP is called with a NULL argument, it returns
1257 the JIT information cannot be copied (because it is position-dependent).
1259 passed to \fBpcre2_jit_compile()\fP if required. If \fBpcre2_code_copy()\fP is
1272 tables are used throughout, so this behaviour is appropriate. Nevertheless,
1276 the new tables. The memory for the new tables is automatically freed when
1277 \fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
1278 \fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
1281 NOTE: When one of the matching functions is called, pointers to the compiled
1291 PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
1322 If \fIerrorcode\fP or \fIerroroffset\fP is NULL, \fBpcre2_compile()\fP returns
1326 error has occurred. The values are not defined when compilation is successful
1331 that are used for invalid UTF strings when validity checking is in force. These
1337 documentation. There is no separate documentation for the positive error codes,
1348 The value returned in \fIerroroffset\fP is an indication of where in the
1349 pattern the error occurred. It is not necessarily the furthest point in the
1350 pattern that was read. For example, after the error "lookbehind assertion is
1352 assertion. For an invalid UTF-8 or UTF-16 string, the offset is that of the
1356 cases, the offset passed back is the length of the pattern. Note that the
1357 offset is in code units, not characters, even in a UTF mode. It may sometimes
1368 PCRE2_ZERO_TERMINATED, /* the pattern is zero-terminated */
1384 If this bit is set, the pattern is forced to be "anchored", that is, it is
1385 constrained to match only at the first matching point in the string that is
1387 appropriate constructs in the pattern itself, which is the only way to do it in
1393 immediately follows an opening one is treated as a data character for the
1394 class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which
1400 makes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set:
1405 (2) \eu matches a lower case "u" character unless it is followed by four
1410 (3) \ex matches a lower case "x" character unless it is followed by two
1412 to match. By default, as in Perl, a hexadecimal number is always expected after
1428 In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
1429 matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
1438 (*MARK:NAME) is any sequence of characters that does not include a closing
1439 parenthesis. The name is not processed in any way, and it is not possible to
1441 option is set, normal backslash processing is applied to verb names and only an
1444 or PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped
1445 whitespace in verb names is skipped and #-comments are recognized, exactly as
1450 If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
1461 If this bit is set, letters in the pattern match both upper and lower case
1462 letters in the subject. It is equivalent to Perl's /i option, and it can be
1464 PCRE2_UCP is set, Unicode properties are used for all characters with more than
1469 one other case, a lookup table is used for speed. When neither PCRE2_UTF nor
1470 PCRE2_UCP is set, a lookup table is used for all code points less than 256, and
1476 If this bit is set, a dollar metacharacter in the pattern matches only at the
1479 newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is
1480 set. There is no equivalent to this option in Perl, and no way to set it within
1485 If this bit is set, a dot metacharacter in the pattern matches any character,
1488 not match when the current position in the subject is at a newline. This option
1496 If this bit is set, names used to identify capture groups need not be unique.
1497 This can be helpful for certain types of pattern when it is known that only one
1507 If this bit is set, the end of any pattern match must be right at the end of
1520 achieved by appropriate constructs in the pattern itself, which is the only way
1524 to the first (that is, the longest) matched string. Other parallel matches,
1530 If this bit is set, most white space characters in the pattern are totally
1534 white space is permitted between an item and a following quantifier and between
1535 a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
1539 When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
1541 flagged as white space in its low-character table. The table is normally
1551 When PCRE2 is compiled with Unicode support, in addition to these characters,
1555 separator). This set of characters is the same as recognized by Perl's /x
1562 complicated patterns. Note that the end of this type of comment is a literal
1567 the compile context that is passed to \fBpcre2_compile()\fP or by a special
1573 in the \fBpcre2pattern\fP documentation. A default is defined when PCRE2 is
1581 characters that are ignored outside a character class. PCRE2_EXTENDED_MORE is
1587 If this option is set, the start of an unanchored pattern match must be before
1589 though the matched text may continue over the newline. If \fIstartoffset\fP is
1590 non-zero, the limiting newline is not necessarily the first newline in the
1591 subject. For example, if the subject string is "abc\enxyz" (where \en
1593 PCRE2_FIRSTLINE if \fIstartoffset\fP is greater than 3. See also
1595 PCRE2_FIRSTLINE is set with an offset limit, a match must occur in the first
1597 first is used.
1601 If this option is set, all meta-characters in the pattern are disabled, and it
1603 expression engine is not the most efficient way of doing it. If you are doing a
1616 This facility is not supported for DFA matching. For details, see the
1624 If this option is set, a backreference to an unset capture group matches an
1626 A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
1638 (except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless
1639 PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a
1640 newline. This behaviour (for ^, $, and dot) is the same as Perl.
1642 When PCRE2_MULTILINE it is set, the "start of line" and "end of line"
1654 This option locks out the use of \eC in the pattern that is being compiled.
1658 external sources. Note that there is also a build-time option that permanently
1673 UTF-32, depending on which library is in use. In particular, it prevents the
1681 If this option is set, it disables the use of numbered capturing parentheses in
1682 the pattern. Any opening parenthesis that is not followed by ? behaves as if it
1684 they acquire numbers in the usual way). This is the same as Perl's /n option.
1685 Note that, when this option is set, references to capture groups
1691 If this option is set, it disables "auto-possessification", which is an
1696 search and run all the callouts, but it is mainly provided for testing
1701 If this option is set, it disables an optimization that is applied when .* is
1703 other branches also start with .* or with \eA or \eG or ^. The optimization is
1704 automatically disabled for .* if it is inside an atomic group or a capture
1705 group that is the subject of a backreference, or if the pattern contains
1706 (*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
1707 automatically anchored if PCRE2_DOTALL is set for all the .* items and
1708 PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
1709 must start either at the start of the subject or following a newline is
1714 This is an option whose main effect is at matching time. It does not change
1719 order to speed up the process. For example, if it is known that an unanchored
1723 such as (*COMMIT) at the start of a pattern is not considered until after a
1726 skipped if the pattern is never actually used. The start-up optimizations are
1727 in effect a pre-scan of the subject that takes place before the pattern is run.
1731 result is "no match", the callouts do occur, and that items such as (*COMMIT)
1740 When this is compiled, PCRE2 records the fact that a match must start with the
1741 character "A". Suppose the subject string is "DEFABC". The start-up
1745 match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
1746 subject string does not happen. The first match attempt is run starting from
1748 the overall result is "no match".
1751 subject, which is recorded when possible. Consider the pattern
1755 The minimum length for a match is two characters. If the subject is "XXBB", the
1757 is long enough. In the process, (*MARK:2) is encountered and remembered. When
1758 the match attempt fails, the next "B" is found, but there is only one character
1759 left, so there are no more attempts, and "no match" is returned with the "last
1760 mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
1762 (*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
1763 returned is "1". In this case, the optimizations do not affect the overall
1764 match result, which is still "no match", but they do affect the auxiliary
1765 information that is returned.
1769 When PCRE2_UTF is set, the validity of the pattern as a UTF string is
1788 document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
1791 If you know that your pattern is a valid UTF string, and you want to skip this
1793 it is set, the effect of passing an invalid UTF string as a pattern is
1801 error that is given if an escape sequence for an invalid Unicode code point is
1810 However, this is possible only in UTF-8 and UTF-32 modes, because these values
1817 default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode
1831 The second effect of PCRE2_UCP is to force the use of Unicode properties for
1833 even when PCRE2_UTF is not set. This makes it possible, for example, to process
1834 strings in the 16-bit UCS-2 code. This option is available only if PCRE2 has
1835 been compiled with Unicode support (which is the default).
1840 greedy by default, but become greedy if followed by "?". It is not compatible
1846 \fBpcre2_set_offset_limit()\fP is going to be used to set a non-default offset
1847 limit in a match context for matches that use this pattern. An error is
1848 generated if an offset limit is set without this option. For more details, see
1861 single-code-unit strings. It is available when PCRE2 is built to include
1862 Unicode support (which is the default). If Unicode support is not available,
1881 This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
1887 in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
1896 If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
1899 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
1907 character code, where hhh.. is any number of hexadecimal digits.
1911 This is a dangerous option. Use with care. By default, an unrecognized escape
1913 detected by \fBpcre2_compile()\fP. Perl is somewhat inconsistent in handling
1914 such items: for example, \ej is treated as a literal "j", and non-hexadecimal
1916 Perl's warning switch is enabled. However, a malformed octal number after \eo{
1919 If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
1921 treated as single-character escapes. For example, \ej is a literal "j" and
1922 \ex{2z} is treated as the literal string "x{2z}". Setting this option means
1924 that a sequence such as [\eN{] is interpreted as a malformed attempt at
1925 [\eN{...}] and so is treated as [N{] whereas [\eN] gives an error because an
1926 unqualified \eN is a valid escape sequence but is not supported in a character
1927 class. To reiterate: this is a dangerous option. Use with great care.
1932 is expected to match a newline. If this option is set, \er in a pattern is
1939 This option is provided for use by the \fB-x\fP option of \fBpcre2grep\fP. It
1940 causes the pattern only to match complete lines. This is achieved by
1942 pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched
1948 This option is provided for use by the \fB-w\fP option of \fBpcre2grep\fP. It
1950 and the end. This is achieved by automatically inserting the code for "\eb(?:"
1952 used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is
1980 compiler is available, further processes a compiled pattern into machine code
1988 JIT compilation is a heavyweight optimization. It can take some time for
2011 When PCRE2 is built with Unicode support (the default), the Unicode properties
2013 PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
2017 when PCRE2_UTF is not set.
2019 The use of locales with Unicode is discouraged. If you are handling characters
2025 recognize only ASCII characters. However, when PCRE2 is built, it is possible
2032 support is expected to die away.
2035 the relevant locale. The only argument to this function is a general context,
2036 which can be used to pass a custom memory allocator. If the argument is NULL,
2037 the system \fBmalloc()\fP is used. The result can be passed to
2051 The locale name "fr_FR" is used on Linux and other Unix-like systems; if you
2052 are using Windows, the name for the French locale is "french".
2054 The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
2060 It is the caller's responsibility to ensure that the memory containing the
2072 processor is 32-bit or 64-bit. A copy of the result of \fBpcre2_maketables()\fP
2077 return this value. Note that the \fBpcre2_dftables\fP program, which is part of
2101 The first argument for \fBpcre2_pattern_info()\fP is a pointer to the compiled
2102 pattern. The second argument specifies which piece of information is required,
2103 and the third argument is a pointer to a variable to receive the data. If the
2104 third argument is NULL, the first argument is ignored, and the function returns
2105 the size in bytes of the variable that is required for the information
2106 requested. Otherwise, the yield of the function is zero for success, or one of
2112 PCRE2_ERROR_UNSET the requested field is not set
2114 The "magic number" is placed at the start of each compiled pattern as a simple
2115 check against passing an arbitrary memory pointer. Here is a typical call of
2122 PCRE2_INFO_SIZE, /* what is required */
2140 For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
2141 option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF.
2146 A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if
2147 the first significant item in every top-level branch is one of the following:
2149 ^ unless PCRE2_MULTILINE is set
2154 When .* is the first significant item, anchoring is possible only when all the
2157 .* is not in an atomic group
2159 .* is not in a capture group that is the subject
2161 PCRE2_DOTALL is in force for .*
2163 PCRE2_NO_DOTSTAR_ANCHOR is not set
2165 For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
2175 group is set in a conditional group such as (?(3)a|b) is also a backreference.
2176 Zero is returned if there are no backreferences.
2180 The output is a uint32_t integer whose value indicates what character sequences
2188 is not used, this is also the total number of capture groups. The third
2194 (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
2197 limit will only be used during matching if it is less than the limit set or
2207 value 255 or above". If such a table was constructed, a pointer to it is
2208 returned. Otherwise NULL is returned. The third argument should point to a
2215 variable. If there is a fixed first value, for example, the letter "c" from a
2216 pattern such as (cat|cow|coyote), 1 is returned, and the value can be retrieved
2217 using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but it is
2219 newline in the subject, 2 is returned. Otherwise, and for anchored patterns, 0
2227 value is always less than 256. In the 16-bit library the value can be up to
2234 backtracking positions when the pattern is processed by \fBpcre2_match()\fP
2248 explicit match is either a literal CR or LF character, or \er or \en or one of
2254 (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
2257 limit will only be used during matching if it is less than the limit set or
2262 Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise
2274 Returns 1 if there is a rightmost literal code unit that must exist in any
2276 \fBuint32_t\fP variable. If there is no such value, 0 is returned. When 1 is
2278 PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is
2280 pattern /^a\ed+z\ed+/ the returned value is 1 (with "z" returned from
2281 PCRE2_INFO_LASTCODEUNIT), but for /^a\edz\ed/ the returned value is 0.
2294 recursive subroutine calls it is not always possible to determine whether or
2301 (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
2304 limit will only be used during matching if it is less than the limit set or
2317 Note that this information is useful for multi-segment matching only
2319 (?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is processed, the
2323 PCRE2_INFO_MAXLOOKBEHIND is really only useful as a debugging tool. See the
2331 If a minimum length for matching subject strings was computed, its value is
2332 returned. Otherwise the returned value is 0. This value is not computed when
2333 PCRE2_NO_START_OPTIMIZE is set. The value is a number of characters, which in
2335 should point to a \fBuint32_t\fP variable. The value is a lower bound to the
2337 do actually match, but every string that does match is at least that long.
2347 substrings by name. It is also possible to extract the data directly, by first
2350 you need to use the name-to-number map, which is described by these three
2358 PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is
2364 the parenthesis number. The rest of the entry is the corresponding name, zero
2367 The names are in alphabetical order. If (?| is used to create multiple capture
2377 page, the groups may be given the same name, but there is only one entry in the
2381 only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
2382 they were found in the pattern. In the absence of (?| this is the order of
2383 increasing number; when (?| is used this is not necessarily the case because
2387 after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
2388 space - including newlines - is ignored):
2395 entry in the table is eight bytes long. The table is as follows, with
2404 name-to-number map, remember that the length of the entries is likely to be
2409 The output is one of the following \fBuint32_t\fP values:
2426 pattern itself. The value that is used when \fBpcre2_compile()\fP is getting
2445 be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a
2447 the third is arbitrary user data. The callback function is called for every
2448 callout in the pattern in the order in which they appear. Its first argument is
2449 a pointer to a callout enumeration block, and its second argument is the
2461 It is possible to save compiled patterns on disc or elsewhere, and reload them
2466 "serialized" form, which in the case of PCRE2 is really just a bytecode dump.
2490 Information about a successful or unsuccessful match is placed in a match
2491 data block, which is an opaque structure that is accessed by function calls. In
2494 captured. This is known as the \fIovector\fP.
2499 argument is the number of pairs of offsets in the \fIovector\fP. One pair of
2500 offsets is required to identify the string that matched the whole pattern, with
2503 captured substrings. A minimum of at least 1 pair is imposed by
2504 \fBpcre2_match_data_create()\fP, so it is always possible to return the overall
2507 The second argument of \fBpcre2_match_data_create()\fP is a pointer to a
2512 For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
2513 pointer to a compiled pattern. The ovector is created to be exactly the right
2514 size to hold all the substrings a pattern might capture. The second argument is
2515 again a pointer to a general context, but in this case if NULL is passed, the
2516 memory is obtained using the same allocator that was used for the compiled
2533 When a call of \fBpcre2_match()\fP fails, valid data is available in the match
2534 block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
2535 of the error codes for an invalid UTF string. Exactly what is available depends
2536 on the error, and is detailed below.
2538 When one of the matching functions is called, pointers to the compiled pattern
2544 PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
2551 When a match data block itself is no longer needed, it should be freed by
2552 calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
2566 The function \fBpcre2_match()\fP is called to match a subject string against a
2567 compiled pattern, which is passed in the \fIcode\fP argument. You can call
2572 This function is the main matching facility of the library, and it operates in
2573 a Perl-like manner. For specialist use there is also an alternative matching
2574 function, which is described
2581 Here is an example of a simple call to \fBpcre2_match()\fP:
2593 If the subject string is zero-terminated, the length can be given as
2606 The subject string is passed to \fBpcre2_match()\fP as a pointer in
2609 That is, they are in bytes for the 8-bit library, 16-bit code units for the
2611 UTF processing is enabled.
2613 If \fIstartoffset\fP is greater than the length of the subject,
2614 \fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is
2621 A non-zero starting offset is useful when searching for another match in the
2630 the current position in the subject is not a word boundary.) When applied to
2632 occurrence. If \fBpcre2_match()\fP is called again with just the remainder of
2633 the subject, namely "issipi", it does not match, because \eB is always false at
2634 the start of the subject, which is deemed to be a word boundary. However, if
2635 \fBpcre2_match()\fP is passed the entire string again, but with
2637 is able to look behind the starting point to discover that it is preceded by a
2640 Finding all the matches in a subject is tricky when the pattern can match an
2641 empty string. It is possible to emulate Perl's /g behaviour by first trying the
2644 and trying an ordinary match again. There is some code that demonstrates how to
2651 character is CR followed by LF, advance the starting offset by two characters
2654 If a non-zero starting offset is passed when the pattern is anchored, a single
2655 attempt to match at the given offset is made. This can only succeed if the
2669 PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
2671 Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
2672 the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
2673 interpretive code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT
2686 By default, a pointer to the subject is remembered in the match data block so
2690 lifetime of the subject string is not guaranteed, it may be necessary to make a
2691 copy of the subject string, but it is wasteful to do this unless the match is
2692 successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
2693 subject is copied and the new pointer is remembered in the match data block
2695 the match block itself is used. The copy is automatically freed when
2696 \fBpcre2_match_data_free()\fP is called to free the match data block. It is also
2697 automatically freed if the match data block is re-used for another match
2702 If the PCRE2_ENDANCHORED option is set, any string that \fBpcre2_match()\fP
2708 This option specifies that first character of the subject string is not the
2716 This option specifies that the end of the subject string is not the end of a
2725 An empty string is not considered to be a valid match if this option is set. If
2732 string at the start of the subject. With PCRE2_NOTEMPTY set, this match is not
2738 This is like PCRE2_NOTEMPTY, except that it locks out an empty string match
2739 only at the first matching position, that is, at the start of the subject plus
2740 the starting offset. An empty string match later in the subject is permitted.
2741 If the pattern is anchored, such a match can occur only if the pattern contains
2747 \fBpcre2_jit_compile()\fP, JIT is automatically used when \fBpcre2_match()\fP
2753 When PCRE2_UTF is set at compile time, the validity of the subject as a UTF
2754 string is checked unless PCRE2_NO_UTF_CHECK is passed to \fBpcre2_match()\fP or
2756 case is discussed in detail in the
2762 In the default case, if a non-zero starting offset is given, the check is
2764 matching, and there is a check that the starting offset points to the first
2772 The check is carried out before any other processing takes place, and a
2773 negative error code is returned if the check fails. There are several UTF error
2795 If you know that your subject is valid, and you want to skip this check for
2802 PCRE2_NO_UTF_CHECK is set at match time the effect of passing an invalid
2803 string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
2810 the end of the subject string is reached successfully, but there are not enough
2817 complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
2819 caller is prepared to handle a partial match, but only if no complete match can
2822 If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
2823 a partial match is found, \fBpcre2_match()\fP immediately returns
2825 words, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more
2828 There is a more detailed discussion of partial and multi-segment matching, with
2840 When PCRE2 is built, a default newline convention is set; this is usually the
2859 starting position is advanced after a match failure for an unanchored pattern.
2861 When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as
2863 when the current starting position is at a CRLF sequence, and the pattern
2864 contains no explicit matches for CR or LF characters, the match position is
2867 The above rule is a compromise that makes the most common cases work as
2868 expected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is
2874 An explicit match for CR of LF is either a literal appearance of one of those
2879 Notwithstanding the above, anomalous effects may still occur when CRLF is a
2896 book, this is called "capturing" in what follows, and the phrase "capture
2897 group" (Perl terminology) is used for a fragment of a pattern that picks out a
2916 called the \fBovector\fP, which contains the offsets of captured strings. It is
2926 Within the ovector, the first in each pair of values is set to the offset of
2927 the first code unit of a substring, and the second is set to the offset of the
2929 offsets, not character offsets. That is, they are byte offsets in the 8-bit
2934 of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They
2943 pair is used for the first captured substring, and so on. The value returned by
2944 \fBpcre2_match()\fP is one more than the highest numbered pair that has been
2945 set. For example, if two substrings have been captured, the returned value is
2947 match is 1, indicating that just the first pair of offsets has been set.
2951 For example, if the pattern (?=ab\eK) is matched against "ab", the start and
2954 If a capture group is matched repeatedly within a single match operation, it is
2955 the last portion of the subject that it matched that is returned.
2957 If the ovector is too small to hold all the captured substring offsets, as much
2958 as possible is filled in, and the function returns a value of zero. If captured
2960 data block whose ovector is of minimum length (that is, one pair).
2962 It is possible for capture group number \fIn+1\fP to match some part of the
2964 "abc" is matched against the pattern (a|(z))(bc) the return from the function
2965 is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
2970 also set to PCRE2_UNSET. For example, if the string "abc" is matched against
2972 function is 2, because the highest used capture group number is 1. The offsets
2973 for for the second and third capture groupss (assuming the vector is large
2977 pattern are never changed. That is, if a pattern contains \fIn\fP capturing
2993 As well as the offsets in the ovector, other information about a match is
2995 appropriate circumstances. If they are called at other times, the result is
3003 the zero-terminated name, which is within the compiled pattern. If no name is
3004 available, NULL is returned. The length of the name (excluding the terminating
3005 zero) is stored in the code unit that precedes the name. You should use this
3009 After a successful match, the name that is returned is the last mark name
3012 contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
3013 partial match, the last encountered name is returned. For example, consider
3018 When it matches "bc", the returned name is A. The B mark is "seen" in the first
3019 branch of the group, but it is not on the matching path. On the other hand,
3020 when this pattern fails to match "bx", the returned name is B.
3024 is removed from the pattern above, there is an initial check for the presence
3035 escape sequence. After a partial match, however, this value is always the same
3058 with them. The codes are given names in the header file. If UTF checking is in
3059 force and an invalid UTF subject string is detected, one of a number of
3060 UTF-specific negative error codes is returned. Details are given in the
3082 catch the case when it is passed a junk pointer. This is the error that is
3083 returned when the magic number is not present.
3087 This error is given when a compiled pattern is passed to a function in a
3089 the 8-bit library is passed to a 16-bit or 32-bit library function.
3108 This error is never generated by \fBpcre2_match()\fP itself. It is provided for
3131 This error is returned when a pattern that was successfully studied using JIT
3133 stack is not large enough. See the
3145 If a pattern contains many nested backtracking points, heap memory is used to
3146 remember them. This error is given when the memory allocation function (default
3147 or custom) fails. Note that a different error, PCRE2_ERROR_HEAPLIMIT, is given
3148 if the amount of memory needed exceeds the heap limit. PCRE2_ERROR_NOMEMORY is
3149 also returned if PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
3158 This error is returned when \fBpcre2_match()\fP detects a recursion loop within
3164 matching is attempted.
3179 code unit buffer and its length in code units, into which the text message is
3180 placed. The message is returned in code units of the appropriate width for the
3181 library that is being used.
3183 The returned message is terminated with a trailing zero, and the function
3185 error number is unknown, the negative error code PCRE2_ERROR_BADDATA is
3186 returned. If the buffer is too small, the message is truncated (but still with
3187 a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
3188 None of the messages are very long; a buffer size of 120 code units is ample.
3217 a binary zero is correctly extracted and has a further zero added on the end,
3218 but the result is not, of course, a C string.
3223 substring zero is available. An attempt to extract any other substring gives
3229 For example, if the pattern (?=ab\eK) is matched against "ab", the start and
3235 argument is a pointer to the match data block, the second is the group number,
3236 and the third is a pointer to a variable into which the length is placed. If
3248 This is updated to contain the actual number of code units used for the
3254 zero. When the substring is no longer needed, the memory should be freed by
3257 The return value from all these functions is zero for success, or a negative
3258 error code. If the pattern match failed, the match failure code is returned.
3259 If a substring number greater than zero is used after a partial match,
3260 PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
3269 There is no substring with that number in the pattern, that is, the number is
3275 pattern, is greater than the number of slots in the ovector, so the substring
3280 The substring did not participate in the match. For example, if the pattern is
3281 (abc)|(def) and the subject is "def", and the ovector contains at least two
3282 capturing slots, substring number 1 is unset.
3298 that is added to each of them. All this is done in a single block of memory
3299 that is obtained using the same memory allocation function that was used to get
3303 partial match, the error code PCRE2_ERROR_PARTIAL is returned.
3305 The address of the memory block is returned via \fIlistptr\fP, which is also
3306 the start of the list of string pointers. The end of the list is marked by a
3307 NULL pointer. The address of the list of lengths is returned via
3311 function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block
3312 could not be obtained. When the list is no longer needed, it should be freed by
3315 If this function encounters a substring that is unset, which can happen when
3348 the number of the capture group called "xxx" is 2. If the name is known to be
3350 calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
3351 compiled pattern, and the second is the name. The yield of the function is the
3352 group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
3353 PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
3358 "bynumber" functions, the only difference being that the second argument is a
3359 name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
3361 captured substring from the first named group that is set.
3363 If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
3365 number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there
3366 is at least one group with a slot in the ovector, but no group is found to be
3367 set, PCRE2_ERROR_UNSET is returned.
3400 the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP. This
3401 can be given as PCRE2_ZERO_TERMINATED for a zero-terminated string. There is an
3403 replacement string(s). The default action is to perform just one replacement if
3404 the pattern matches, but there is an option that requests multiple replacements
3408 that were carried out. This may be zero if no match was found, and is never
3409 greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
3410 returned if an error is detected.
3421 data block is obtained and freed within this function, using memory management
3425 If \fImatch_data\fP is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
3426 provided block is used for all calls to \fBpcre2_match()\fP, and its contents
3433 One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
3436 (return code, offset vector) is used for the first substitution instead of
3442 PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
3443 \fBpcre2_match()\fP is called after the first substitution to check for further
3444 matches, but this is done using an internally obtained match data block, thus
3447 The \fIcode\fP argument is not used for matching before the first substitution
3448 when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
3449 PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
3452 The default action of \fBpcre2_substitute()\fP is to return a copy of the
3454 PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
3465 function is successful, the value is updated to contain the length in code
3466 units of the new string, excluding the trailing zero that is automatically
3469 If the function is not successful, the value set via \fIoutlengthptr\fP depends
3470 on the type of error. For syntax errors in the replacement string, the value is
3472 errors, the value is PCRE2_UNSET by default. This includes the case of the
3473 output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
3475 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
3476 too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
3477 this option is set, however, \fBpcre2_substitute()\fP continues to go through
3479 in order to compute the size of buffer that is needed. This value is passed
3483 Passing a buffer size of zero is a permitted way of finding out how much memory
3485 operation is carried out twice. Depending on the application, it may be more
3489 The replacement string, which is interpreted as a UTF string in UTF mode, is
3490 checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
3493 If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
3494 in any way. By default, however, a dollar character is an escape character that
3506 For example, if the pattern a(b)c is matched with "=abc=" and the replacement
3507 string "+$1$0$1+", the result is "=+babcb+=".
3512 inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
3521 replacing every matching substring. If this option is not set, only the first
3522 matching substring is replaced. The search for matches takes place in the
3523 original subject string (that is, previous replacements do not affect it).
3524 Iteration is implemented by advancing the \fIstartoffset\fP value for each
3525 search, which is always passed the entire subject string. If an offset limit is
3526 set in the match context, searching stops when that limit is reached.
3530 limit. Here is a \fBpcre2test\fP example:
3537 length, an attempt to find a non-empty match at the same offset is performed.
3538 If this is not successful, the offset is advanced by one character except when
3539 CRLF is a valid newline sequence and the next two characters are CR, LF. In
3540 this case, the offset is advanced by two characters.
3548 groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
3549 strings when inserted as described above. If this option is not set, an attempt
3554 replacement string. Without this option, only the dollar character is special,
3556 PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
3558 Firstly, backslash in a replacement string is interpreted as an escape
3569 \eu and \el force the next character (if it is a letter) to upper or lower
3578 the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
3582 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
3583 flexibility to capture group substitution. The syntax is similar to that used
3590 default value. If group <n> is set, its value is inserted; if not, <string> is
3592 expanded and inserted when group <n> is set or unset, respectively. The first
3593 form is just a convenient shorthand for
3611 If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
3620 code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
3623 PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
3624 unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
3626 PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an
3627 unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple
3628 (non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set.
3630 PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the
3631 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
3632 needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
3635 PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
3636 \fImatch_data\fP argument is NULL.
3638 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
3644 subject, which can happen if \eK is used in an assertion).
3666 callout function for \fBpcre2_substitute()\fP. This information is passed in
3667 a match context. The callout function is called after each substitution has
3669 function is not called for simulated substitutions that happen as a result of
3672 The first argument of the callout function is a pointer to a substitute callout
3685 current version is 0. The version number will increase in future if more fields
3686 are added, but the intention is never to remove any of the existing fields.
3688 The \fIsubscount\fP field is the number of the current match. It is 1 for the
3694 are set in the ovector, and is always greater than zero.
3700 The second argument of the callout function is the value passed as
3702 callout function is interpreted as follows:
3704 If the value is zero, the replacement is accepted, and, if
3705 PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
3706 match. If the value is not zero, the current replacement is not accepted. If
3707 the value is greater than zero, processing continues when
3708 PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
3709 PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
3722 When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
3728 only one of each set of identically-named groups participates. An example is
3737 the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is
3743 argument is the compiled pattern, and the second is the name. If the third and
3751 PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name.
3753 The format of the name table is described
3771 callout facility, which is described in the
3777 What you have to do is to insert a callout right at the end of the pattern.
3778 When your callout function is called, extract and save the current matched
3796 The function \fBpcre2_dfa_match()\fP is called to match a subject string
3799 This has different characteristics to the normal algorithm, and is not
3811 is used in a different way, and this is described below. The other common
3813 description is not repeated here.
3816 vector should contain at least 20 elements. It is used for keeping track of
3817 multiple paths through the pattern tree. More workspace is needed for patterns
3820 Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
3844 description is not repeated here.
3850 details are slightly different. When PCRE2_PARTIAL_HARD is set for
3852 subject is reached and there is still at least one matching possibility that
3854 already been found. When PCRE2_PARTIAL_SOFT is set, the return code
3855 PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the
3856 subject is reached, there have been no complete matches, but there is still at
3858 when the longest partial match was found is set as the first matching string in
3859 both cases. There is a more detailed discussion of partial and multi-segment
3870 works, this is necessarily the shortest possible match at the first possible
3875 When \fBpcre2_dfa_match()\fP returns a partial match, it is possible to call it
3877 match. The PCRE2_DFA_RESTART option requests this action; when it is set, the
3879 before because data about the match so far is left in them after a partial
3880 match. There is more discussion of this facility in the
3899 This is <something> <something else> <something further> no more
3907 On success, the yield of the function is a number greater than zero, which is
3919 is, the longest matching string is first. If there were too many matches to fit
3920 into the ovector, the yield of the function is zero, and the vector is filled
3925 pattern "a\ed+" is compiled as if it were "a\ed++". For DFA matching, this
3926 means that only one possible match is found. If you really do want multiple
3945 This return is given if \fBpcre2_dfa_match()\fP encounters an item in the
3951 This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
3957 This return is given if \fBpcre2_dfa_match()\fP is called for a pattern that
3958 was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for DFA
3963 This return is given if \fBpcre2_dfa_match()\fP runs out of space in the
3968 When a recursion or subroutine call is processed, the matching function calls
3970 This error is given if the internal ovector is not large enough. This should be
3971 extremely rare, as a vector of size 1000 is used.
3975 When \fBpcre2_dfa_match()\fP is called with the \fBPCRE2_DFA_RESTART\fP option,
3978 fail, this error is given.