• Home
  • Raw
  • Download

Lines Matching full:is

7 PCRE2 is a new API for PCRE, starting at release 10.0. This document contains a
278 backward compatibility. They should not be used in new code. The first is
279 replaced by \fBpcre2_set_depth_limit()\fP; the second is no longer needed and
309 patterns that can be processed by \fBpcre2_compile()\fP. This facility is
322 units, respectively. However, there is just one header file, \fBpcre2.h\fP.
342 example, PCRE2_UCHAR16 is usually defined as `uint16_t'. The SPTR types are
343 constant pointers to the equivalent UCHAR types, that is, they are pointers to
350 PCRE2_CODE_UNIT_WIDTH is not defined by default. An application must define it
356 including \fBpcre2.h\fP, and then use the real function names. Any code that is
357 to be included in an environment where the value of PCRE2_CODE_UNIT_WIDTH is
358 unknown should also use the real function names. (Unfortunately, it is not
361 If PCRE2_CODE_UNIT_WIDTH is not defined before including \fBpcre2.h\fP, a
378 PCRE2 has its own native API, which is described in this document. There are
399 sample program that demonstrates the simplest way of using them is provided in
401 of this program is given in the
417 Just-in-time (JIT) compiler support is an optional feature of PCRE2 that can be
422 support is not available.
428 JIT matching is automatically used by \fBpcre2_match()\fP if it is available,
429 unless the PCRE2_NO_JIT option is set. There is also a direct interface for JIT
437 A second matching function, \fBpcre2_dfa_match()\fP, which is not
438 Perl-compatible, is also provided. This uses a different algorithm for the
443 and disadvantages is given in the
447 documentation. There is no JIT support for \fBpcre2_dfa_match()\fP.
465 functions is called with a NULL argument, the function returns immediately
480 blocks of various sorts. In all cases, if one of these functions is called with
488 several places. These values are always of type PCRE2_SIZE, which is an
490 value that can be stored in such a type (that is ~(PCRE2_SIZE)0) is reserved
492 Therefore, the longest string that can be handled is one less than this
508 Each of the first three conventions is used by at least one operating system as
509 its standard newline sequence. When PCRE2 is built, a default can be specified.
510 If it is not, the default is set to LF, which is the Unix standard. However,
519 In the PCRE2 documentation the word "newline" is used to mean "the character or
522 metacharacters, the handling of #-comments in /x mode, and, when CRLF is a
524 non-anchored pattern. There is more detail about this in the
539 In a multithreaded application it is important to keep thread-specific data
541 itself is thread-safe: it contains no static or global variables. The API is
552 A pointer to the compiled form of a pattern is returned to the user when
553 \fBpcre2_compile()\fP is successful. The data in the compiled pattern is fixed,
554 and does not change when the pattern is matched. Therefore, it is thread-safe,
555 that is, the same compiled pattern can be used by more than one thread
558 just-in-time (JIT) optimization feature is being used, it needs separate memory
568 is somewhat tricky to do correctly. If you know that writing to a pointer is
582 The reason for checking the pointer a second time is as follows: Several
592 is not sufficient. The thread that is doing the compiling may be descheduled
610 If JIT is being used, but the JIT compilation is not being done immediately
611 (perhaps waiting to see if the pattern is used often enough), similar logic is
623 functions are called. A context is nothing more than a collection of parameters
625 in a context is a convenient way of passing them to a PCRE2 function without
653 directly. A context is just a block of memory that holds the parameter values.
655 NULL when a context pointer is required.
657 There are three different types of context: a general context that is relevant
666 library. The context is named `general' rather than specifically `memory'
669 general context. A general context is created by:
683 Whenever code in PCRE2 calls these functions, the final argument is the value
686 \fImalloc()\fP and \fIfree()\fP are used. (This is not currently useful, as
688 The \fIprivate_malloc()\fP function is used (if supplied) to obtain memory for
693 used. When the time comes to free the block, this function is called.
708 If this function is passed a NULL argument, it returns immediately without
716 A compile context is required if you want to provide an external function for
727 A compile context is also required if you are using custom memory management.
731 A compile context is created, copied, and freed by the following functions:
743 A compile context is created with default values for its parameters. These can
745 PCRE2_ERROR_BADDATA if invalid data is detected.
754 ending sequence. The value is used by the JIT compiler and by the two
764 argument is a general context. This function builds a set of character tables
789 This sets a maximum length, in code units, for any pattern string that is
790 compiled with this context. If the pattern is longer, an error is generated.
791 This facility is provided so that applications that accept patterns from
792 external sources can limit their size. The default is the largest number that a
793 PCRE2_SIZE variable can hold, which is effectively unlimited.
805 NUL character, that is a binary zero).
814 When a pattern is compiled with the PCRE2_EXTENDED or PCRE2_EXTENDED_MORE
816 comments starting with #. The value is saved with the compiled pattern for
825 This parameter adjusts the limit, set when PCRE2 is built (default 250), on the
835 There is at least one application that runs PCRE2 in threads with very limited
836 system stack, where running out of stack is to be avoided at all costs. The
837 parenthesis limit above cannot take account of how much stack is actually
839 that is called whenever \fBpcre2_compile()\fP starts to compile a parenthesized
844 nesting, and the second is user data that is set up by the last argument of
846 zero if all is well, or non-zero to force an error.
853 A match context is required if you want to:
865 A match context is created, copied, and freed by the following functions:
877 A match context is created with default values for its parameters. These can
879 PCRE2_ERROR_BADDATA if invalid data is detected.
914 advance in the subject string. The default value is PCRE2_UNSET. The
917 offset is not found. The \fBpcre2_substitute()\fP function makes no more
920 For example, if the pattern /abc/ is matched against "123abc" with an offset
921 limit less than 3, the result is PCRE2_ERROR_NOMATCH. A match can never be
923 \fBpcre2_dfa_match()\fP, or \fBpcre2_substitute()\fP is greater than the offset
927 calling \fBpcre2_compile()\fP so that when JIT is in use, different code can be
928 compiled. If a match is started with a non-default match limit when
929 PCRE2_USE_OFFSET_LIMIT is not set, an error is generated.
934 newline that follows the start of matching in the subject. If this is set with
936 offset limit. In other words, whichever limit comes first is used.
953 documentation for more details). If the limit is reached, the negative error
954 code PCRE2_ERROR_HEAPLIMIT is returned. The default limit can be set when PCRE2
955 is built; if it is not, the default is set very large and is essentially
963 where ddd is a decimal number. However, such a setting is ignored unless ddd is
965 limit is set, less than the default.
975 For \fBpcre2_dfa_match()\fP, a vector on the system stack is used when
977 is not big enough is heap memory used. In this case, setting a value of zero
988 trees. The classic example is a pattern that uses nested unlimited repeats.
990 There is an internal counter in \fBpcre2_match()\fP that is incremented each
996 though the counting is done in a different way.
998 When \fBpcre2_match()\fP is called with a pattern that was successfully
999 processed by \fBpcre2_jit_compile()\fP, the way in which matching is executed
1000 is entirely different. However, there is still the possibility of runaway
1005 The default value for the limit can be set when PCRE2 is built; the default
1006 default is 10 million, which handles all but the most extreme cases. A value
1012 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1014 \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
1022 Each time a nested backtracking point is passed, a new memory frame is used
1024 indirectly limits the amount of memory that is used in a match. However,
1030 The depth limit is not relevant, and is ignored, when matching is done using
1031 JIT compiled code. However, it is supported by \fBpcre2_dfa_match()\fP, which
1034 limits, indirectly, the amount of system stack that is used. It was more useful
1040 If the depth of internal recursive function calls is great enough, local
1042 depth limit also indirectly limits the amount of heap memory that is used. A
1044 using \fBpcre2_dfa_match()\fP, can use a great deal of memory. However, it is
1048 The default value for the depth limit can be set when PCRE2 is built; if it is
1049 not, the default is set to the same value as the default for the match limit.
1050 If the limit is exceeded, \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
1056 where ddd is a decimal number. However, such a setting is ignored unless ddd is
1058 \fBpcre2_dfa_match()\fP or, if no such limit is set, less than the default.
1074 The first argument for \fBpcre2_config()\fP specifies which information is
1075 required. The second argument is a pointer to memory into which the information
1076 is placed. If NULL is passed, the function returns the amount of memory that is
1078 the value is in bytes; when requesting these values, \fIwhere\fP should point
1080 length is given in code units, not counting the terminating zero.
1082 When requesting information, the returned value from \fBpcre2_config()\fP is
1084 the value in the first argument is not recognized. The following information is
1089 The output is a uint32_t integer whose value indicates what character
1093 default can be overridden when a pattern is compiled.
1097 The output is a uint32_t integer whose lower bits indicate which code unit
1103 The output is a uint32_t integer that gives the default limit for the depth of
1110 The output is a uint32_t integer that gives, in kibibytes, the default limit
1117 The output is a uint32_t integer that is set to one if support for just-in-time
1118 compiling is available; otherwise it is set to zero.
1122 The \fIwhere\fP argument should point to a buffer that is at least 48 code
1124 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with a
1125 string that contains the name of the architecture for which the JIT compiler is
1127 is not available, PCRE2_ERROR_BADOPTION is returned, otherwise the number of
1128 code units used is returned. This is the length of the string, plus one unit
1133 The output is a uint32_t integer that contains the number of bytes used for
1134 internal linkage in compiled regular expressions. When PCRE2 is configured, the
1135 value can be set to 2, 3, or 4, with the default being 2. This is the value
1136 that is returned by \fBpcre2_config()\fP. However, when the 16-bit library is
1137 compiled, a value of 3 is rounded up to 4, and when the 32-bit library is
1138 compiled, internal linkages always use 4 bytes, so the configured value is not
1141 The default value of 2 for the 8-bit and 16-bit libraries is sufficient for all
1148 The output is a uint32_t integer that gives the default match limit for
1154 The output is a uint32_t integer whose value specifies the default character
1155 sequence that is recognized as meaning "newline". The values are:
1169 The output is a uint32_t integer that is set to one if the use of \eC was
1170 permanently disabled when PCRE2 was built; otherwise it is set to zero.
1174 The output is a uint32_t integer that gives the maximum depth of nesting
1175 of parentheses (of any kind) in a pattern. This limit is imposed to cap the
1176 amount of system stack used when a pattern is compiled. It is specified when
1177 PCRE2 is built; the default is 250. This limit does not take into account the
1183 This parameter is obsolete and should not be used in new code. The output is a
1184 uint32_t integer that is always set to zero.
1188 The output is a uint32_t integer that gives the length of PCRE2's character
1198 The \fIwhere\fP argument should point to a buffer that is at least 24 code
1201 without Unicode support, the buffer is filled with the text "Unicode not
1202 supported". Otherwise, the Unicode version string (for example, "8.0.0") is
1203 inserted. The number of code units used is returned. This is the length of the
1208 The output is a uint32_t integer that is set to one if Unicode support is
1209 available; otherwise it is set to zero. Unicode support implies UTF support.
1213 The \fIwhere\fP argument should point to a buffer that is at least 24 code
1215 \fBpcre2_config()\fP with \fBwhere\fP set to NULL.) The buffer is filled with
1216 the PCRE2 version string, zero-terminated. The number of code units used is
1217 returned. This is the length of the string plus one unit for the terminating
1238 The pattern is defined by a pointer to a string of code units and a length (in
1239 code units). If the pattern is zero-terminated, the length can be specified as
1243 If the compile context argument \fIccontext\fP is NULL, memory for the compiled
1244 pattern is obtained by calling \fBmalloc()\fP. Otherwise, it is obtained from
1246 free the memory by calling \fBpcre2_code_free()\fP when it is no longer needed.
1247 If \fBpcre2_code_free()\fP is called with a NULL argument, it returns
1257 the JIT information cannot be copied (because it is position-dependent).
1259 passed to \fBpcre2_jit_compile()\fP if required. If \fBpcre2_code_copy()\fP is
1272 tables are used throughout, so this behaviour is appropriate. Nevertheless,
1276 the new tables. The memory for the new tables is automatically freed when
1277 \fBpcre2_code_free()\fP is called for the new copy of the compiled code. If
1278 \fBpcre2_code_copy_with_tables()\fP is called with a NULL argument, it returns
1281 NOTE: When one of the matching functions is called, pointers to the compiled
1291 PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
1322 If \fIerrorcode\fP or \fIerroroffset\fP is NULL, \fBpcre2_compile()\fP returns
1330 that are used for invalid UTF strings when validity checking is in force. These
1336 documentation. There is no separate documentation for the positive error codes,
1346 is successful \fIerrorcode\fP is set to a value that returns the message "no
1349 The value returned in \fIerroroffset\fP is an indication of where in the
1350 pattern an error occurred. When there is no error, zero is returned. A non-zero
1351 value is not necessarily the furthest point in the pattern that was read. For
1352 example, after the error "lookbehind assertion is not fixed length", the error
1354 UTF-16 string, the offset is that of the first code unit of the failing
1358 cases, the offset passed back is the length of the pattern. Note that the
1359 offset is in code units, not characters, even in a UTF mode. It may sometimes
1370 PCRE2_ZERO_TERMINATED, /* the pattern is zero-terminated */
1386 If this bit is set, the pattern is forced to be "anchored", that is, it is
1387 constrained to match only at the first matching point in the string that is
1389 appropriate constructs in the pattern itself, which is the only way to do it in
1395 immediately follows an opening one is treated as a data character for the
1396 class. When PCRE2_ALLOW_EMPTY_CLASS is set, it terminates the class, which
1402 makes PCRE2's behaviour more like ECMAscript (aka JavaScript). When it is set:
1407 (2) \eu matches a lower case "u" character unless it is followed by four
1412 (3) \ex matches a lower case "x" character unless it is followed by two
1414 to match. By default, as in Perl, a hexadecimal number is always expected after
1430 In multiline mode (when PCRE2_MULTILINE is set), the circumflex metacharacter
1431 matches at the start of the subject (unless PCRE2_NOTBOL is set), and also
1440 (*MARK:NAME) is any sequence of characters that does not include a closing
1441 parenthesis. The name is not processed in any way, and it is not possible to
1443 option is set, normal backslash processing is applied to verb names and only an
1446 or PCRE2_EXTENDED_MORE option is set with PCRE2_ALT_VERBNAMES, unescaped
1447 whitespace in verb names is skipped and #-comments are recognized, exactly as
1452 If this bit is set, \fBpcre2_compile()\fP automatically inserts callout items,
1463 If this bit is set, letters in the pattern match both upper and lower case
1464 letters in the subject. It is equivalent to Perl's /i option, and it can be
1466 PCRE2_UCP is set, Unicode properties are used for all characters with more than
1471 one other case, a lookup table is used for speed. When neither PCRE2_UTF nor
1472 PCRE2_UCP is set, a lookup table is used for all code points less than 256, and
1478 If this bit is set, a dollar metacharacter in the pattern matches only at the
1481 newlines). The PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is
1482 set. There is no equivalent to this option in Perl, and no way to set it within
1487 If this bit is set, a dot metacharacter in the pattern matches any character,
1490 not match when the current position in the subject is at a newline. This option
1498 If this bit is set, names used to identify capture groups need not be unique.
1499 This can be helpful for certain types of pattern when it is known that only one
1509 If this bit is set, the end of any pattern match must be right at the end of
1522 achieved by appropriate constructs in the pattern itself, which is the only way
1526 to the first (that is, the longest) matched string. Other parallel matches,
1532 If this bit is set, most white space characters in the pattern are totally
1536 white space is permitted between an item and a following quantifier and between
1537 a quantifier and a following + that indicates possessiveness. PCRE2_EXTENDED is
1541 When PCRE2 is compiled without Unicode support, PCRE2_EXTENDED recognizes as
1543 flagged as white space in its low-character table. The table is normally
1553 When PCRE2 is compiled with Unicode support, in addition to these characters,
1557 separator). This set of characters is the same as recognized by Perl's /x
1564 complicated patterns. Note that the end of this type of comment is a literal
1569 the compile context that is passed to \fBpcre2_compile()\fP or by a special
1575 in the \fBpcre2pattern\fP documentation. A default is defined when PCRE2 is
1583 characters that are ignored outside a character class. PCRE2_EXTENDED_MORE is
1589 If this option is set, the start of an unanchored pattern match must be before
1591 though the matched text may continue over the newline. If \fIstartoffset\fP is
1592 non-zero, the limiting newline is not necessarily the first newline in the
1593 subject. For example, if the subject string is "abc\enxyz" (where \en
1595 PCRE2_FIRSTLINE if \fIstartoffset\fP is greater than 3. See also
1597 PCRE2_FIRSTLINE is set with an offset limit, a match must occur in the first
1599 first is used.
1603 If this option is set, all meta-characters in the pattern are disabled, and it
1605 expression engine is not the most efficient way of doing it. If you are doing a
1618 This facility is not supported for DFA matching. For details, see the
1626 If this option is set, a backreference to an unset capture group matches an
1628 A pattern such as (\e1)(a) succeeds when this option is set (assuming it can
1640 (except when PCRE2_DOLLAR_ENDONLY is set). Note, however, that unless
1641 PCRE2_DOTALL is set, the "any character" metacharacter (.) does not match at a
1642 newline. This behaviour (for ^, $, and dot) is the same as Perl.
1644 When PCRE2_MULTILINE it is set, the "start of line" and "end of line"
1656 This option locks out the use of \eC in the pattern that is being compiled.
1660 external sources. Note that there is also a build-time option that permanently
1675 UTF-32, depending on which library is in use. In particular, it prevents the
1683 If this option is set, it disables the use of numbered capturing parentheses in
1684 the pattern. Any opening parenthesis that is not followed by ? behaves as if it
1686 they acquire numbers in the usual way). This is the same as Perl's /n option.
1687 Note that, when this option is set, references to capture groups
1693 If this option is set, it disables "auto-possessification", which is an
1698 search and run all the callouts, but it is mainly provided for testing
1703 If this option is set, it disables an optimization that is applied when .* is
1705 other branches also start with .* or with \eA or \eG or ^. The optimization is
1706 automatically disabled for .* if it is inside an atomic group or a capture
1707 group that is the subject of a backreference, or if the pattern contains
1708 (*PRUNE) or (*SKIP). When the optimization is not disabled, such a pattern is
1709 automatically anchored if PCRE2_DOTALL is set for all the .* items and
1710 PCRE2_MULTILINE is not set for any ^ items. Otherwise, the fact that any match
1711 must start either at the start of the subject or following a newline is
1716 This is an option whose main effect is at matching time. It does not change
1721 order to speed up the process. For example, if it is known that an unanchored
1725 such as (*COMMIT) at the start of a pattern is not considered until after a
1728 skipped if the pattern is never actually used. The start-up optimizations are
1729 in effect a pre-scan of the subject that takes place before the pattern is run.
1733 result is "no match", the callouts do occur, and that items such as (*COMMIT)
1742 When this is compiled, PCRE2 records the fact that a match must start with the
1743 character "A". Suppose the subject string is "DEFABC". The start-up
1747 match is run with PCRE2_NO_START_OPTIMIZE set, the initial scan along the
1748 subject string does not happen. The first match attempt is run starting from
1750 the overall result is "no match".
1753 subject, which is recorded when possible. Consider the pattern
1757 The minimum length for a match is two characters. If the subject is "XXBB", the
1759 is long enough. In the process, (*MARK:2) is encountered and remembered. When
1760 the match attempt fails, the next "B" is found, but there is only one character
1761 left, so there are no more attempts, and "no match" is returned with the "last
1762 mark seen" set to "2". If NO_START_OPTIMIZE is set, however, matches are tried
1764 (*MARK:1) is encountered, but there is no "B", so the "last mark seen" that is
1765 returned is "1". In this case, the optimizations do not affect the overall
1766 match result, which is still "no match", but they do affect the auxiliary
1767 information that is returned.
1771 When PCRE2_UTF is set, the validity of the pattern as a UTF string is
1790 document. If an invalid UTF sequence is found, \fBpcre2_compile()\fP returns a
1793 If you know that your pattern is a valid UTF string, and you want to skip this
1795 it is set, the effect of passing an invalid UTF string as a pattern is
1803 error that is given if an escape sequence for an invalid Unicode code point is
1812 However, this is possible only in UTF-8 and UTF-32 modes, because these values
1819 default, only ASCII characters are recognized, but if PCRE2_UCP is set, Unicode
1833 The second effect of PCRE2_UCP is to force the use of Unicode properties for
1835 even when PCRE2_UTF is not set. This makes it possible, for example, to process
1836 strings in the 16-bit UCS-2 code. This option is available only if PCRE2 has
1837 been compiled with Unicode support (which is the default).
1842 greedy by default, but become greedy if followed by "?". It is not compatible
1848 \fBpcre2_set_offset_limit()\fP is going to be used to set a non-default offset
1849 limit in a match context for matches that use this pattern. An error is
1850 generated if an offset limit is set without this option. For more details, see
1863 single-code-unit strings. It is available when PCRE2 is built to include
1864 Unicode support (which is the default). If Unicode support is not available,
1884 assertions, following Perl's lead. This option is provided to re-enable the
1886 case anybody is relying on it.
1890 This option applies when compiling a pattern in UTF-8 or UTF-32 mode. It is
1896 in a UTF-8 or UTF-32 string that is being checked for validity by PCRE2.
1905 If the extra option PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES is set, surrogate code
1908 characters if the matching function is called with PCRE2_NO_UTF_CHECK set.
1916 character code, where hhh.. is any number of hexadecimal digits.
1920 This is a dangerous option. Use with care. By default, an unrecognized escape
1922 detected by \fBpcre2_compile()\fP. Perl is somewhat inconsistent in handling
1923 such items: for example, \ej is treated as a literal "j", and non-hexadecimal
1925 Perl's warning switch is enabled. However, a malformed octal number after \eo{
1928 If the PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL extra option is passed to
1930 treated as single-character escapes. For example, \ej is a literal "j" and
1931 \ex{2z} is treated as the literal string "x{2z}". Setting this option means
1933 that a sequence such as [\eN{] is interpreted as a malformed attempt at
1934 [\eN{...}] and so is treated as [N{] whereas [\eN] gives an error because an
1935 unqualified \eN is a valid escape sequence but is not supported in a character
1936 class. To reiterate: this is a dangerous option. Use with great care.
1941 is expected to match a newline. If this option is set, \er in a pattern is
1948 This option is provided for use by the \fB-x\fP option of \fBpcre2grep\fP. It
1949 causes the pattern only to match complete lines. This is achieved by
1951 pattern and ")$" at the end. Thus, when PCRE2_MULTILINE is set, the matched
1957 This option is provided for use by the \fB-w\fP option of \fBpcre2grep\fP. It
1959 and the end. This is achieved by automatically inserting the code for "\eb(?:"
1961 used with PCRE2_LITERAL. However, it is ignored if PCRE2_EXTRA_MATCH_LINE is
1989 compiler is available, further processes a compiled pattern into machine code
1997 JIT compilation is a heavyweight optimization. It can take some time for
2020 When PCRE2 is built with Unicode support (the default), certain Unicode
2022 PCRE2_UCP option can be set when a pattern is compiled; this causes \ew and
2026 when PCRE2_UTF is not set.
2028 The use of locales with Unicode is discouraged. If you are handling characters
2034 recognize only ASCII characters. However, when PCRE2 is built, it is possible
2041 support is expected to die away.
2044 the relevant locale. The only argument to this function is a general context,
2045 which can be used to pass a custom memory allocator. If the argument is NULL,
2046 the system \fBmalloc()\fP is used. The result can be passed to
2060 The locale name "fr_FR" is used on Linux and other Unix-like systems; if you
2061 are using Windows, the name for the French locale is "french".
2063 The pointer that is passed (via the compile context) to \fBpcre2_compile()\fP
2069 It is the caller's responsibility to ensure that the memory containing the
2081 processor is 32-bit or 64-bit. A copy of the result of \fBpcre2_maketables()\fP
2086 return this value. Note that the \fBpcre2_dftables\fP program, which is part of
2110 The first argument for \fBpcre2_pattern_info()\fP is a pointer to the compiled
2111 pattern. The second argument specifies which piece of information is required,
2112 and the third argument is a pointer to a variable to receive the data. If the
2113 third argument is NULL, the first argument is ignored, and the function returns
2114 the size in bytes of the variable that is required for the information
2115 requested. Otherwise, the yield of the function is zero for success, or one of
2121 PCRE2_ERROR_UNSET the requested field is not set
2123 The "magic number" is placed at the start of each compiled pattern as a simple
2124 check against passing an arbitrary memory pointer. Here is a typical call of
2131 PCRE2_INFO_SIZE, /* what is required */
2149 For example, if the pattern /(*UTF)abc/ is compiled with the PCRE2_EXTENDED
2150 option, the result for PCRE2_INFO_ALLOPTIONS is PCRE2_EXTENDED and PCRE2_UTF.
2155 A pattern compiled without PCRE2_ANCHORED is automatically anchored by PCRE2 if
2156 the first significant item in every top-level branch is one of the following:
2158 ^ unless PCRE2_MULTILINE is set
2163 When .* is the first significant item, anchoring is possible only when all the
2166 .* is not in an atomic group
2168 .* is not in a capture group that is the subject
2170 PCRE2_DOTALL is in force for .*
2172 PCRE2_NO_DOTSTAR_ANCHOR is not set
2174 For patterns that are auto-anchored, the PCRE2_ANCHORED bit is set in the
2184 group is set in a conditional group such as (?(3)a|b) is also a backreference.
2185 Zero is returned if there are no backreferences.
2189 The output is a uint32_t integer whose value indicates what character sequences
2197 is not used, this is also the total number of capture groups. The third
2203 (*LIMIT_DEPTH=nnnn) at the start, the value is returned. The third argument
2206 limit will only be used during matching if it is less than the limit set or
2216 value 255 or above". If such a table was constructed, a pointer to it is
2217 returned. Otherwise NULL is returned. The third argument should point to a
2224 variable. If there is a fixed first value, for example, the letter "c" from a
2225 pattern such as (cat|cow|coyote), 1 is returned, and the value can be retrieved
2226 using PCRE2_INFO_FIRSTCODEUNIT. If there is no fixed first value, but it is
2228 newline in the subject, 2 is returned. Otherwise, and for anchored patterns, 0
2236 value is always less than 256. In the 16-bit library the value can be up to
2243 backtracking positions when the pattern is processed by \fBpcre2_match()\fP
2257 explicit match is either a literal CR or LF character, or \er or \en or one of
2263 (*LIMIT_HEAP=nnnn) at the start, the value is returned. The third argument
2266 limit will only be used during matching if it is less than the limit set or
2271 Return 1 if the (?J) or (?-J) option setting is used in the pattern, otherwise
2283 Returns 1 if there is a rightmost literal code unit that must exist in any
2285 \fBuint32_t\fP variable. If there is no such value, 0 is returned. When 1 is
2287 PCRE2_INFO_LASTCODEUNIT. For anchored patterns, a last literal value is
2289 pattern /^a\ed+z\ed+/ the returned value is 1 (with "z" returned from
2290 PCRE2_INFO_LASTCODEUNIT), but for /^a\edz\ed/ the returned value is 0.
2303 recursive subroutine calls it is not always possible to determine whether or
2310 (*LIMIT_MATCH=nnnn) at the start, the value is returned. The third argument
2313 limit will only be used during matching if it is less than the limit set or
2326 Note that this information is useful for multi-segment matching only
2328 (?<=a(?<=ba)c) returns a maximum lookbehind of 2, but when it is processed, the
2332 PCRE2_INFO_MAXLOOKBEHIND is really only useful as a debugging tool. See the
2340 If a minimum length for matching subject strings was computed, its value is
2341 returned. Otherwise the returned value is 0. This value is not computed when
2342 PCRE2_NO_START_OPTIMIZE is set. The value is a number of characters, which in
2344 should point to a \fBuint32_t\fP variable. The value is a lower bound to the
2346 do actually match, but every string that does match is at least that long.
2356 substrings by name. It is also possible to extract the data directly, by first
2359 you need to use the name-to-number map, which is described by these three
2367 PCRE2_INFO_NAMETABLE returns a pointer to the first entry of the table. This is
2373 the parenthesis number. The rest of the entry is the corresponding name, zero
2376 The names are in alphabetical order. If (?| is used to create multiple capture
2386 page, the groups may be given the same name, but there is only one entry in the
2390 only if PCRE2_DUPNAMES is set. They appear in the table in the order in which
2391 they were found in the pattern. In the absence of (?| this is the order of
2392 increasing number; when (?| is used this is not necessarily the case because
2396 after compilation by the 8-bit library (assume PCRE2_EXTENDED is set, so white
2397 space - including newlines - is ignored):
2404 entry in the table is eight bytes long. The table is as follows, with
2413 name-to-number map, remember that the length of the entries is likely to be
2418 The output is one of the following \fBuint32_t\fP values:
2435 pattern itself. The value that is used when \fBpcre2_compile()\fP is getting
2454 be done by calling \fBpcre2_callout_enumerate()\fP. The first argument is a
2456 the third is arbitrary user data. The callback function is called for every
2457 callout in the pattern in the order in which they appear. Its first argument is
2458 a pointer to a callout enumeration block, and its second argument is the
2470 It is possible to save compiled patterns on disc or elsewhere, and reload them
2475 "serialized" form, which in the case of PCRE2 is really just a bytecode dump.
2499 Information about a successful or unsuccessful match is placed in a match
2500 data block, which is an opaque structure that is accessed by function calls. In
2502 string that define the matched parts of the subject. This is known as the
2508 argument is the number of pairs of offsets in the \fIovector\fP.
2510 When using \fBpcre2_match()\fP, one pair of offsets is required to identify the
2519 A minimum of at least 1 pair is imposed by \fBpcre2_match_data_create()\fP, so
2520 it is always possible to return the overall matched string in the case of
2522 \fBpcre2_dfa_match()\fP. The maximum number of pairs is 65535; if the the first
2523 argument of \fBpcre2_match_data_create()\fP is greater than this, 65535 is
2526 The second argument of \fBpcre2_match_data_create()\fP is a pointer to a
2531 For \fBpcre2_match_data_create_from_pattern()\fP, the first argument is a
2532 pointer to a compiled pattern. The ovector is created to be exactly the right
2535 \fBpcre2_dfa_match()\fP. The second argument is again a pointer to a general
2536 context, but in this case if NULL is passed, the memory is obtained using the
2553 When a call of \fBpcre2_match()\fP fails, valid data is available in the match
2554 block only when the error is PCRE2_ERROR_NOMATCH, PCRE2_ERROR_PARTIAL, or one
2555 of the error codes for an invalid UTF string. Exactly what is available depends
2556 on the error, and is detailed below.
2558 When one of the matching functions is called, pointers to the compiled pattern
2564 PCRE2_COPY_MATCHED_SUBJECT option, which is described in the section entitled
2571 When a match data block itself is no longer needed, it should be freed by
2572 calling \fBpcre2_match_data_free()\fP. If this function is called with a NULL
2586 The function \fBpcre2_match()\fP is called to match a subject string against a
2587 compiled pattern, which is passed in the \fIcode\fP argument. You can call
2592 This function is the main matching facility of the library, and it operates in
2593 a Perl-like manner. For specialist use there is also an alternative matching
2594 function, which is described
2601 Here is an example of a simple call to \fBpcre2_match()\fP:
2613 If the subject string is zero-terminated, the length can be given as
2626 The subject string is passed to \fBpcre2_match()\fP as a pointer in
2629 That is, they are in bytes for the 8-bit library, 16-bit code units for the
2631 UTF processing is enabled. As a special case, if \fIsubject\fP is NULL and
2632 \fIlength\fP is zero, the subject is assumed to be an empty string. If
2633 \fIlength\fP is non-zero, an error occurs if \fIsubject\fP is NULL.
2635 If \fIstartoffset\fP is greater than the length of the subject,
2636 \fBpcre2_match()\fP returns PCRE2_ERROR_BADOFFSET. When the starting offset is
2643 A non-zero starting offset is useful when searching for another match in the
2652 the current position in the subject is not a word boundary.) When applied to
2654 occurrence. If \fBpcre2_match()\fP is called again with just the remainder of
2655 the subject, namely "issippi", it does not match, because \eB is always false
2656 at the start of the subject, which is deemed to be a word boundary. However, if
2657 \fBpcre2_match()\fP is passed the entire string again, but with
2659 is able to look behind the starting point to discover that it is preceded by a
2662 Finding all the matches in a subject is tricky when the pattern can match an
2663 empty string. It is possible to emulate Perl's /g behaviour by first trying the
2666 and trying an ordinary match again. There is some code that demonstrates how to
2673 character is CR followed by LF, advance the starting offset by two characters
2676 If a non-zero starting offset is passed when the pattern is anchored, a single
2677 attempt to match at the given offset is made. This can only succeed if the
2691 PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. Their action is described below.
2693 Setting PCRE2_ANCHORED or PCRE2_ENDANCHORED at match time is not supported by
2694 the just-in-time (JIT) compiler. If it is set, JIT matching is disabled and the
2695 interpretive code in \fBpcre2_match()\fP is run. Apart from PCRE2_NO_JIT
2708 By default, a pointer to the subject is remembered in the match data block so
2712 lifetime of the subject string is not guaranteed, it may be necessary to make a
2713 copy of the subject string, but it is wasteful to do this unless the match is
2714 successful. After a successful match, if PCRE2_COPY_MATCHED_SUBJECT is set, the
2715 subject is copied and the new pointer is remembered in the match data block
2717 the match block itself is used. The copy is automatically freed when
2718 \fBpcre2_match_data_free()\fP is called to free the match data block. It is also
2719 automatically freed if the match data block is re-used for another match
2724 If the PCRE2_ENDANCHORED option is set, any string that \fBpcre2_match()\fP
2730 This option specifies that first character of the subject string is not the
2738 This option specifies that the end of the subject string is not the end of a
2747 An empty string is not considered to be a valid match if this option is set. If
2754 string at the start of the subject. With PCRE2_NOTEMPTY set, this match is not
2760 This is like PCRE2_NOTEMPTY, except that it locks out an empty string match
2761 only at the first matching position, that is, at the start of the subject plus
2762 the starting offset. An empty string match later in the subject is permitted.
2763 If the pattern is anchored, such a match can occur only if the pattern contains
2769 \fBpcre2_jit_compile()\fP, JIT is automatically used when \fBpcre2_match()\fP
2775 When PCRE2_UTF is set at compile time, the validity of the subject as a UTF
2776 string is checked unless PCRE2_NO_UTF_CHECK is passed to \fBpcre2_match()\fP or
2778 case is discussed in detail in the
2784 In the default case, if a non-zero starting offset is given, the check is
2786 matching, and there is a check that the starting offset points to the first
2794 The check is carried out before any other processing takes place, and a
2795 negative error code is returned if the check fails. There are several UTF error
2817 If you know that your subject is valid, and you want to skip this check for
2824 PCRE2_NO_UTF_CHECK is set at match time the effect of passing an invalid
2825 string as a subject, or an invalid value of \fIstartoffset\fP, is undefined.
2832 the end of the subject string is reached successfully, but there are not enough
2839 complete match can be found is PCRE2_ERROR_PARTIAL returned instead of
2841 caller is prepared to handle a partial match, but only if no complete match can
2844 If PCRE2_PARTIAL_HARD is set, it overrides PCRE2_PARTIAL_SOFT. In this case, if
2845 a partial match is found, \fBpcre2_match()\fP immediately returns
2847 words, when PCRE2_PARTIAL_HARD is set, a partial match is considered to be more
2850 There is a more detailed discussion of partial and multi-segment matching, with
2862 When PCRE2 is built, a default newline convention is set; this is usually the
2881 starting position is advanced after a match failure for an unanchored pattern.
2883 When PCRE2_NEWLINE_CRLF, PCRE2_NEWLINE_ANYCRLF, or PCRE2_NEWLINE_ANY is set as
2885 when the current starting position is at a CRLF sequence, and the pattern
2886 contains no explicit matches for CR or LF characters, the match position is
2889 The above rule is a compromise that makes the most common cases work as
2890 expected. For example, if the pattern is .+A (and the PCRE2_DOTALL option is
2896 An explicit match for CR of LF is either a literal appearance of one of those
2901 Notwithstanding the above, anomalous effects may still occur when CRLF is a
2918 book, this is called "capturing" in what follows, and the phrase "capture
2919 group" (Perl terminology) is used for a fragment of a pattern that picks out a
2938 called the \fBovector\fP, which contains the offsets of captured strings. It is
2948 Within the ovector, the first in each pair of values is set to the offset of
2949 the first code unit of a substring, and the second is set to the offset of the
2951 offsets, not character offsets. That is, they are byte offsets in the 8-bit
2956 of offsets (that is, \fIovector[0]\fP and \fIovector[1]\fP) are set. They
2965 pair is used for the first captured substring, and so on. The value returned by
2966 \fBpcre2_match()\fP is one more than the highest numbered pair that has been
2967 set. For example, if two substrings have been captured, the returned value is
2969 match is 1, indicating that just the first pair of offsets has been set.
2973 For example, if the pattern (?=ab\eK) is matched against "ab", the start and
2976 If a capture group is matched repeatedly within a single match operation, it is
2977 the last portion of the subject that it matched that is returned.
2979 If the ovector is too small to hold all the captured substring offsets, as much
2980 as possible is filled in, and the function returns a value of zero. If captured
2982 data block whose ovector is of minimum length (that is, one pair).
2984 It is possible for capture group number \fIn+1\fP to match some part of the
2986 "abc" is matched against the pattern (a|(z))(bc) the return from the function
2987 is 4, and groups 1 and 3 are matched, but 2 is not. When this happens, both
2992 also set to PCRE2_UNSET. For example, if the string "abc" is matched against
2994 function is 2, because the highest used capture group number is 1. The offsets
2995 for for the second and third capture groupss (assuming the vector is large
2999 pattern are never changed. That is, if a pattern contains \fIn\fP capturing
3015 As well as the offsets in the ovector, other information about a match is
3017 appropriate circumstances. If they are called at other times, the result is
3025 the zero-terminated name, which is within the compiled pattern. If no name is
3026 available, NULL is returned. The length of the name (excluding the terminating
3027 zero) is stored in the code unit that precedes the name. You should use this
3031 After a successful match, the name that is returned is the last mark name
3034 contains (*MARK:A)(*PRUNE), the name "A" is returned. After a "no match" or a
3035 partial match, the last encountered name is returned. For example, consider
3040 When it matches "bc", the returned name is A. The B mark is "seen" in the first
3041 branch of the group, but it is not on the matching path. On the other hand,
3042 when this pattern fails to match "bx", the returned name is B.
3046 is removed from the pattern above, there is an initial check for the presence
3057 escape sequence. After a partial match, however, this value is always the same
3080 with them. The codes are given names in the header file. If UTF checking is in
3081 force and an invalid UTF subject string is detected, one of a number of
3082 UTF-specific negative error codes is returned. Details are given in the
3104 catch the case when it is passed a junk pointer. This is the error that is
3105 returned when the magic number is not present.
3109 This error is given when a compiled pattern is passed to a function in a
3111 the 8-bit library is passed to a 16-bit or 32-bit library function.
3130 This error is never generated by \fBpcre2_match()\fP itself. It is provided for
3153 This error is returned when a pattern that was successfully studied using JIT
3155 stack is not large enough. See the
3167 Heap memory is used to remember backgracking points. This error is given when
3169 error, PCRE2_ERROR_HEAPLIMIT, is given if the amount of memory needed exceeds
3170 the heap limit. PCRE2_ERROR_NOMEMORY is also returned if
3171 PCRE2_COPY_MATCHED_SUBJECT is set and memory allocation fails.
3180 This error is returned when \fBpcre2_match()\fP detects a recursion loop within
3186 matching is attempted.
3201 code unit buffer and its length in code units, into which the text message is
3202 placed. The message is returned in code units of the appropriate width for the
3203 library that is being used.
3205 The returned message is terminated with a trailing zero, and the function
3207 error number is unknown, the negative error code PCRE2_ERROR_BADDATA is
3208 returned. If the buffer is too small, the message is truncated (but still with
3209 a trailing zero), and the negative error code PCRE2_ERROR_NOMEMORY is returned.
3210 None of the messages are very long; a buffer size of 120 code units is ample.
3239 a binary zero is correctly extracted and has a further zero added on the end,
3240 but the result is not, of course, a C string.
3245 substring zero is available. An attempt to extract any other substring gives
3251 For example, if the pattern (?=ab\eK) is matched against "ab", the start and
3257 argument is a pointer to the match data block, the second is the group number,
3258 and the third is a pointer to a variable into which the length is placed. If
3270 This is updated to contain the actual number of code units used for the
3276 zero. When the substring is no longer needed, the memory should be freed by
3279 The return value from all these functions is zero for success, or a negative
3280 error code. If the pattern match failed, the match failure code is returned.
3281 If a substring number greater than zero is used after a partial match,
3282 PCRE2_ERROR_PARTIAL is returned. Other possible error codes are:
3291 There is no substring with that number in the pattern, that is, the number is
3297 pattern, is greater than the number of slots in the ovector, so the substring
3302 The substring did not participate in the match. For example, if the pattern is
3303 (abc)|(def) and the subject is "def", and the ovector contains at least two
3304 capturing slots, substring number 1 is unset.
3320 that is added to each of them. All this is done in a single block of memory
3321 that is obtained using the same memory allocation function that was used to get
3325 partial match, the error code PCRE2_ERROR_PARTIAL is returned.
3327 The address of the memory block is returned via \fIlistptr\fP, which is also
3328 the start of the list of string pointers. The end of the list is marked by a
3329 NULL pointer. The address of the list of lengths is returned via
3333 function is zero if all went well, or PCRE2_ERROR_NOMEMORY if the memory block
3334 could not be obtained. When the list is no longer needed, it should be freed by
3337 If this function encounters a substring that is unset, which can happen when
3370 the number of the capture group called "xxx" is 2. If the name is known to be
3372 calling \fBpcre2_substring_number_from_name()\fP. The first argument is the
3373 compiled pattern, and the second is the name. The yield of the function is the
3374 group number, PCRE2_ERROR_NOSUBSTRING if there is no group with that name, or
3375 PCRE2_ERROR_NOUNIQUESUBSTRING if there is more than one group with that name.
3380 "bynumber" functions, the only difference being that the second argument is a
3381 name instead of a number. If PCRE2_DUPNAMES is set and there are duplicate
3383 captured substring from the first named group that is set.
3385 If there are no groups with the given name, PCRE2_ERROR_NOSUBSTRING is
3387 number of slots in the ovector, PCRE2_ERROR_UNAVAILABLE is returned. If there
3388 is at least one group with a slot in the ovector, but no group is found to be
3389 set, PCRE2_ERROR_UNSET is returned.
3422 the \fIreplacement\fP string, whose length is supplied in \fBrlength\fP, which
3424 special case, if \fIreplacement\fP is NULL and \fIrlength\fP is zero, the
3425 replacement is assumed to be an empty string. If \fIrlength\fP is non-zero, an
3426 error occurs if \fIreplacement\fP is NULL.
3428 There is an option (see PCRE2_SUBSTITUTE_REPLACEMENT_ONLY below) to return just
3429 the replacement string(s). The default action is to perform just one
3430 replacement if the pattern matches, but there is an option that requests
3434 that were carried out. This may be zero if no match was found, and is never
3435 greater than one unless PCRE2_SUBSTITUTE_GLOBAL is set. A negative value is
3436 returned if an error is detected.
3447 data block is obtained and freed within this function, using memory management
3451 If \fImatch_data\fP is not NULL and PCRE2_SUBSTITUTE_MATCHED is not set, the
3452 provided block is used for all calls to \fBpcre2_match()\fP, and its contents
3459 One such option is PCRE2_SUBSTITUTE_MATCHED. When this is set, an external
3468 PCRE2_SUBSTITUTE_MATCHED is set. If PCRE2_SUBSTITUTE_GLOBAL is also set,
3469 \fBpcre2_match()\fP is called after the first substitution to check for further
3470 matches, but this is done using an internally obtained match data block, thus
3473 The \fIcode\fP argument is not used for matching before the first substitution
3474 when PCRE2_SUBSTITUTE_MATCHED is set, but it must be provided, even when
3475 PCRE2_SUBSTITUTE_GLOBAL is not set, because it contains information such as the
3478 The default action of \fBpcre2_substitute()\fP is to return a copy of the
3480 PCRE2_SUBSTITUTE_REPLACEMENT_ONLY is set, only the replacement substrings are
3491 function is successful, the value is updated to contain the length in code
3492 units of the new string, excluding the trailing zero that is automatically
3495 If the function is not successful, the value set via \fIoutlengthptr\fP depends
3496 on the type of error. For syntax errors in the replacement string, the value is
3498 errors, the value is PCRE2_UNSET by default. This includes the case of the
3499 output buffer being too small, unless PCRE2_SUBSTITUTE_OVERFLOW_LENGTH is set.
3501 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH changes what happens when the output buffer is
3502 too small. The default action is to return PCRE2_ERROR_NOMEMORY immediately. If
3503 this option is set, however, \fBpcre2_substitute()\fP continues to go through
3505 in order to compute the size of buffer that is needed. This value is passed
3509 Passing a buffer size of zero is a permitted way of finding out how much memory
3511 operation is carried out twice. Depending on the application, it may be more
3515 The replacement string, which is interpreted as a UTF string in UTF mode, is
3516 checked for UTF validity unless PCRE2_NO_UTF_CHECK is set. An invalid UTF
3519 If PCRE2_SUBSTITUTE_LITERAL is set, the replacement string is not interpreted
3520 in any way. By default, however, a dollar character is an escape character that
3532 For example, if the pattern a(b)c is matched with "=abc=" and the replacement
3533 string "+$1$0$1+", the result is "=+babcb+=".
3538 inserted is "A", but for (*MARK:A)(*PRUNE:B) the relevant name is "B". This
3547 replacing every matching substring. If this option is not set, only the first
3548 matching substring is replaced. The search for matches takes place in the
3549 original subject string (that is, previous replacements do not affect it).
3550 Iteration is implemented by advancing the \fIstartoffset\fP value for each
3551 search, which is always passed the entire subject string. If an offset limit is
3552 set in the match context, searching stops when that limit is reached.
3556 limit. Here is a \fBpcre2test\fP example:
3563 length, an attempt to find a non-empty match at the same offset is performed.
3564 If this is not successful, the offset is advanced by one character except when
3565 CRLF is a valid newline sequence and the next two characters are CR, LF. In
3566 this case, the offset is advanced by two characters.
3574 groups when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) to be treated as empty
3575 strings when inserted as described above. If this option is not set, an attempt
3580 replacement string. Without this option, only the dollar character is special,
3582 PCRE2_SUBSTITUTE_EXTENDED is set, two things change:
3584 Firstly, backslash in a replacement string is interpreted as an escape
3595 \eu and \el force the next character (if it is a letter) to upper or lower
3604 the result of processing "\eUaa\eLBB\eEcc\eE" is "AAbbcc"; the final \eE has no
3608 The second effect of setting PCRE2_SUBSTITUTE_EXTENDED is to add more
3609 flexibility to capture group substitution. The syntax is similar to that used
3616 default value. If group <n> is set, its value is inserted; if not, <string> is
3618 expanded and inserted when group <n> is set or unset, respectively. The first
3619 form is just a convenient shorthand for
3637 If PCRE2_SUBSTITUTE_LITERAL is set, PCRE2_SUBSTITUTE_UNKNOWN_UNSET,
3646 code. Except for PCRE2_ERROR_NOMATCH (which is never returned), errors from
3649 PCRE2_ERROR_NOSUBSTRING is returned for a non-existent substring insertion,
3650 unless PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set.
3652 PCRE2_ERROR_UNSET is returned for an unset substring insertion (including an
3653 unknown substring when PCRE2_SUBSTITUTE_UNKNOWN_UNSET is set) when the simple
3654 (non-extended) syntax is used and PCRE2_SUBSTITUTE_UNSET_EMPTY is not set.
3656 PCRE2_ERROR_NOMEMORY is returned if the output buffer is not big enough. If the
3657 PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set, the size of buffer that is
3658 needed is returned via \fIoutlengthptr\fP. Note that this does not happen by
3661 PCRE2_ERROR_NULL is returned if PCRE2_SUBSTITUTE_MATCHED is set but the
3662 \fImatch_data\fP argument is NULL or if the \fIsubject\fP or \fIreplacement\fP
3663 arguments are NULL. For backward compatibility reasons an exception is made for
3664 the \fIreplacement\fP argument if the \fIrlength\fP argument is also 0.
3666 PCRE2_ERROR_BADREPLACEMENT is used for miscellaneous syntax errors in the
3672 subject, which can happen if \eK is used in an assertion).
3694 callout function for \fBpcre2_substitute()\fP. This information is passed in
3695 a match context. The callout function is called after each substitution has
3697 function is not called for simulated substitutions that happen as a result of
3700 The first argument of the callout function is a pointer to a substitute callout
3713 current version is 0. The version number will increase in future if more fields
3714 are added, but the intention is never to remove any of the existing fields.
3716 The \fIsubscount\fP field is the number of the current match. It is 1 for the
3722 are set in the ovector, and is always greater than zero.
3728 The second argument of the callout function is the value passed as
3730 callout function is interpreted as follows:
3732 If the value is zero, the replacement is accepted, and, if
3733 PCRE2_SUBSTITUTE_GLOBAL is set, processing continues with a search for the next
3734 match. If the value is not zero, the current replacement is not accepted. If
3735 the value is greater than zero, processing continues when
3736 PCRE2_SUBSTITUTE_GLOBAL is set. Otherwise (the value is less than zero or
3737 PCRE2_SUBSTITUTE_GLOBAL is not set), the the rest of the input is copied to the
3750 When a pattern is compiled with the PCRE2_DUPNAMES option, names for capture
3756 only one of each set of identically-named groups participates. An example is
3765 the given name that is set. Only if none are set is PCRE2_ERROR_UNSET is
3771 argument is the compiled pattern, and the second is the name. If the third and
3779 PCRE2_ERROR_NOSUBSTRING is returned if there are no entries for the given name.
3781 The format of the name table is described
3799 callout facility, which is described in the
3805 What you have to do is to insert a callout right at the end of the pattern.
3806 When your callout function is called, extract and save the current matched
3824 The function \fBpcre2_dfa_match()\fP is called to match a subject string
3828 characteristics to the normal algorithm, and is not compatible with Perl. Some
3840 is used in a different way, and this is described below. The other common
3842 description is not repeated here.
3845 vector should contain at least 20 elements. It is used for keeping track of
3846 multiple paths through the pattern tree. More workspace is needed for patterns
3849 Here is an example of a simple call to \fBpcre2_dfa_match()\fP:
3873 description is not repeated here.
3879 details are slightly different. When PCRE2_PARTIAL_HARD is set for
3881 subject is reached and there is still at least one matching possibility that
3883 already been found. When PCRE2_PARTIAL_SOFT is set, the return code
3884 PCRE2_ERROR_NOMATCH is converted into PCRE2_ERROR_PARTIAL if the end of the
3885 subject is reached, there have been no complete matches, but there is still at
3887 when the longest partial match was found is set as the first matching string in
3888 both cases. There is a more detailed discussion of partial and multi-segment
3899 works, this is necessarily the shortest possible match at the first possible
3904 When \fBpcre2_dfa_match()\fP returns a partial match, it is possible to call it
3906 match. The PCRE2_DFA_RESTART option requests this action; when it is set, the
3908 before because data about the match so far is left in them after a partial
3909 match. There is more discussion of this facility in the
3928 This is <something> <something else> <something further> no more
3936 On success, the yield of the function is a number greater than zero, which is
3948 is, the longest matching string is first. If there were too many matches to fit
3949 into the ovector, the yield of the function is zero, and the vector is filled
3954 pattern "a\ed+" is compiled as if it were "a\ed++". For DFA matching, this
3955 means that only one possible match is found. If you really do want multiple
3974 This return is given if \fBpcre2_dfa_match()\fP encounters an item in the
3980 This return is given if \fBpcre2_dfa_match()\fP encounters a condition item
3986 This return is given if \fBpcre2_dfa_match()\fP is called for a pattern that
3987 was compiled with PCRE2_MATCH_INVALID_UTF. This is not supported for DFA
3992 This return is given if \fBpcre2_dfa_match()\fP runs out of space in the
3997 When a recursion or subroutine call is processed, the matching function calls
3999 This error is given if the internal ovector is not large enough. This should be
4000 extremely rare, as a vector of size 1000 is used.
4004 When \fBpcre2_dfa_match()\fP is called with the \fBPCRE2_DFA_RESTART\fP option,
4007 fail, this error is given.