1<html> 2<head> 3<title>pcresyntax specification</title> 4</head> 5<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB"> 6<h1>pcresyntax man page</h1> 7<p> 8Return to the <a href="index.html">PCRE index page</a>. 9</p> 10<p> 11This page is part of the PCRE HTML documentation. It was generated automatically 12from the original man page. If there is any nonsense in it, please consult the 13man page, in case the conversion went wrong. 14<br> 15<ul> 16<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a> 17<li><a name="TOC2" href="#SEC2">QUOTING</a> 18<li><a name="TOC3" href="#SEC3">CHARACTERS</a> 19<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a> 20<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a> 21<li><a name="TOC6" href="#SEC6">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a> 22<li><a name="TOC7" href="#SEC7">SCRIPT NAMES FOR \p AND \P</a> 23<li><a name="TOC8" href="#SEC8">CHARACTER CLASSES</a> 24<li><a name="TOC9" href="#SEC9">QUANTIFIERS</a> 25<li><a name="TOC10" href="#SEC10">ANCHORS AND SIMPLE ASSERTIONS</a> 26<li><a name="TOC11" href="#SEC11">MATCH POINT RESET</a> 27<li><a name="TOC12" href="#SEC12">ALTERNATION</a> 28<li><a name="TOC13" href="#SEC13">CAPTURING</a> 29<li><a name="TOC14" href="#SEC14">ATOMIC GROUPS</a> 30<li><a name="TOC15" href="#SEC15">COMMENT</a> 31<li><a name="TOC16" href="#SEC16">OPTION SETTING</a> 32<li><a name="TOC17" href="#SEC17">NEWLINE CONVENTION</a> 33<li><a name="TOC18" href="#SEC18">WHAT \R MATCHES</a> 34<li><a name="TOC19" href="#SEC19">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a> 35<li><a name="TOC20" href="#SEC20">BACKREFERENCES</a> 36<li><a name="TOC21" href="#SEC21">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a> 37<li><a name="TOC22" href="#SEC22">CONDITIONAL PATTERNS</a> 38<li><a name="TOC23" href="#SEC23">BACKTRACKING CONTROL</a> 39<li><a name="TOC24" href="#SEC24">CALLOUTS</a> 40<li><a name="TOC25" href="#SEC25">SEE ALSO</a> 41<li><a name="TOC26" href="#SEC26">AUTHOR</a> 42<li><a name="TOC27" href="#SEC27">REVISION</a> 43</ul> 44<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br> 45<P> 46The full syntax and semantics of the regular expressions that are supported by 47PCRE are described in the 48<a href="pcrepattern.html"><b>pcrepattern</b></a> 49documentation. This document contains a quick-reference summary of the syntax. 50</P> 51<br><a name="SEC2" href="#TOC1">QUOTING</a><br> 52<P> 53<pre> 54 \x where x is non-alphanumeric is a literal x 55 \Q...\E treat enclosed characters as literal 56</PRE> 57</P> 58<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br> 59<P> 60<pre> 61 \a alarm, that is, the BEL character (hex 07) 62 \cx "control-x", where x is any ASCII character 63 \e escape (hex 1B) 64 \f form feed (hex 0C) 65 \n newline (hex 0A) 66 \r carriage return (hex 0D) 67 \t tab (hex 09) 68 \0dd character with octal code 0dd 69 \ddd character with octal code ddd, or backreference 70 \o{ddd..} character with octal code ddd.. 71 \xhh character with hex code hh 72 \x{hhh..} character with hex code hhh.. 73</pre> 74Note that \0dd is always an octal code, and that \8 and \9 are the literal 75characters "8" and "9". 76</P> 77<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br> 78<P> 79<pre> 80 . any character except newline; 81 in dotall mode, any character whatsoever 82 \C one data unit, even in UTF mode (best avoided) 83 \d a decimal digit 84 \D a character that is not a decimal digit 85 \h a horizontal white space character 86 \H a character that is not a horizontal white space character 87 \N a character that is not a newline 88 \p{<i>xx</i>} a character with the <i>xx</i> property 89 \P{<i>xx</i>} a character without the <i>xx</i> property 90 \R a newline sequence 91 \s a white space character 92 \S a character that is not a white space character 93 \v a vertical white space character 94 \V a character that is not a vertical white space character 95 \w a "word" character 96 \W a "non-word" character 97 \X a Unicode extended grapheme cluster 98</pre> 99By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode 100or in the 16- bit and 32-bit libraries. However, if locale-specific matching is 101happening, \s and \w may also match characters with code points in the range 102128-255. If the PCRE_UCP option is set, the behaviour of these escape sequences 103is changed to use Unicode properties and they match many more characters. 104</P> 105<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br> 106<P> 107<pre> 108 C Other 109 Cc Control 110 Cf Format 111 Cn Unassigned 112 Co Private use 113 Cs Surrogate 114 115 L Letter 116 Ll Lower case letter 117 Lm Modifier letter 118 Lo Other letter 119 Lt Title case letter 120 Lu Upper case letter 121 L& Ll, Lu, or Lt 122 123 M Mark 124 Mc Spacing mark 125 Me Enclosing mark 126 Mn Non-spacing mark 127 128 N Number 129 Nd Decimal number 130 Nl Letter number 131 No Other number 132 133 P Punctuation 134 Pc Connector punctuation 135 Pd Dash punctuation 136 Pe Close punctuation 137 Pf Final punctuation 138 Pi Initial punctuation 139 Po Other punctuation 140 Ps Open punctuation 141 142 S Symbol 143 Sc Currency symbol 144 Sk Modifier symbol 145 Sm Mathematical symbol 146 So Other symbol 147 148 Z Separator 149 Zl Line separator 150 Zp Paragraph separator 151 Zs Space separator 152</PRE> 153</P> 154<br><a name="SEC6" href="#TOC1">PCRE SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br> 155<P> 156<pre> 157 Xan Alphanumeric: union of properties L and N 158 Xps POSIX space: property Z or tab, NL, VT, FF, CR 159 Xsp Perl space: property Z or tab, NL, VT, FF, CR 160 Xuc Univerally-named character: one that can be 161 represented by a Universal Character Name 162 Xwd Perl word: property Xan or underscore 163</pre> 164Perl and POSIX space are now the same. Perl added VT to its space character set 165at release 5.18 and PCRE changed at release 8.34. 166</P> 167<br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br> 168<P> 169Arabic, 170Armenian, 171Avestan, 172Balinese, 173Bamum, 174Bassa_Vah, 175Batak, 176Bengali, 177Bopomofo, 178Brahmi, 179Braille, 180Buginese, 181Buhid, 182Canadian_Aboriginal, 183Carian, 184Caucasian_Albanian, 185Chakma, 186Cham, 187Cherokee, 188Common, 189Coptic, 190Cuneiform, 191Cypriot, 192Cyrillic, 193Deseret, 194Devanagari, 195Duployan, 196Egyptian_Hieroglyphs, 197Elbasan, 198Ethiopic, 199Georgian, 200Glagolitic, 201Gothic, 202Grantha, 203Greek, 204Gujarati, 205Gurmukhi, 206Han, 207Hangul, 208Hanunoo, 209Hebrew, 210Hiragana, 211Imperial_Aramaic, 212Inherited, 213Inscriptional_Pahlavi, 214Inscriptional_Parthian, 215Javanese, 216Kaithi, 217Kannada, 218Katakana, 219Kayah_Li, 220Kharoshthi, 221Khmer, 222Khojki, 223Khudawadi, 224Lao, 225Latin, 226Lepcha, 227Limbu, 228Linear_A, 229Linear_B, 230Lisu, 231Lycian, 232Lydian, 233Mahajani, 234Malayalam, 235Mandaic, 236Manichaean, 237Meetei_Mayek, 238Mende_Kikakui, 239Meroitic_Cursive, 240Meroitic_Hieroglyphs, 241Miao, 242Modi, 243Mongolian, 244Mro, 245Myanmar, 246Nabataean, 247New_Tai_Lue, 248Nko, 249Ogham, 250Ol_Chiki, 251Old_Italic, 252Old_North_Arabian, 253Old_Permic, 254Old_Persian, 255Old_South_Arabian, 256Old_Turkic, 257Oriya, 258Osmanya, 259Pahawh_Hmong, 260Palmyrene, 261Pau_Cin_Hau, 262Phags_Pa, 263Phoenician, 264Psalter_Pahlavi, 265Rejang, 266Runic, 267Samaritan, 268Saurashtra, 269Sharada, 270Shavian, 271Siddham, 272Sinhala, 273Sora_Sompeng, 274Sundanese, 275Syloti_Nagri, 276Syriac, 277Tagalog, 278Tagbanwa, 279Tai_Le, 280Tai_Tham, 281Tai_Viet, 282Takri, 283Tamil, 284Telugu, 285Thaana, 286Thai, 287Tibetan, 288Tifinagh, 289Tirhuta, 290Ugaritic, 291Vai, 292Warang_Citi, 293Yi. 294</P> 295<br><a name="SEC8" href="#TOC1">CHARACTER CLASSES</a><br> 296<P> 297<pre> 298 [...] positive character class 299 [^...] negative character class 300 [x-y] range (can be used for hex characters) 301 [[:xxx:]] positive POSIX named set 302 [[:^xxx:]] negative POSIX named set 303 304 alnum alphanumeric 305 alpha alphabetic 306 ascii 0-127 307 blank space or tab 308 cntrl control character 309 digit decimal digit 310 graph printing, excluding space 311 lower lower case letter 312 print printing, including space 313 punct printing, excluding alphanumeric 314 space white space 315 upper upper case letter 316 word same as \w 317 xdigit hexadecimal digit 318</pre> 319In PCRE, POSIX character set names recognize only ASCII characters by default, 320but some of them use Unicode properties if PCRE_UCP is set. You can use 321\Q...\E inside a character class. 322</P> 323<br><a name="SEC9" href="#TOC1">QUANTIFIERS</a><br> 324<P> 325<pre> 326 ? 0 or 1, greedy 327 ?+ 0 or 1, possessive 328 ?? 0 or 1, lazy 329 * 0 or more, greedy 330 *+ 0 or more, possessive 331 *? 0 or more, lazy 332 + 1 or more, greedy 333 ++ 1 or more, possessive 334 +? 1 or more, lazy 335 {n} exactly n 336 {n,m} at least n, no more than m, greedy 337 {n,m}+ at least n, no more than m, possessive 338 {n,m}? at least n, no more than m, lazy 339 {n,} n or more, greedy 340 {n,}+ n or more, possessive 341 {n,}? n or more, lazy 342</PRE> 343</P> 344<br><a name="SEC10" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br> 345<P> 346<pre> 347 \b word boundary 348 \B not a word boundary 349 ^ start of subject 350 also after internal newline in multiline mode 351 \A start of subject 352 $ end of subject 353 also before newline at end of subject 354 also before internal newline in multiline mode 355 \Z end of subject 356 also before newline at end of subject 357 \z end of subject 358 \G first matching position in subject 359</PRE> 360</P> 361<br><a name="SEC11" href="#TOC1">MATCH POINT RESET</a><br> 362<P> 363<pre> 364 \K reset start of match 365</pre> 366\K is honoured in positive assertions, but ignored in negative ones. 367</P> 368<br><a name="SEC12" href="#TOC1">ALTERNATION</a><br> 369<P> 370<pre> 371 expr|expr|expr... 372</PRE> 373</P> 374<br><a name="SEC13" href="#TOC1">CAPTURING</a><br> 375<P> 376<pre> 377 (...) capturing group 378 (?<name>...) named capturing group (Perl) 379 (?'name'...) named capturing group (Perl) 380 (?P<name>...) named capturing group (Python) 381 (?:...) non-capturing group 382 (?|...) non-capturing group; reset group numbers for 383 capturing groups in each alternative 384</PRE> 385</P> 386<br><a name="SEC14" href="#TOC1">ATOMIC GROUPS</a><br> 387<P> 388<pre> 389 (?>...) atomic, non-capturing group 390</PRE> 391</P> 392<br><a name="SEC15" href="#TOC1">COMMENT</a><br> 393<P> 394<pre> 395 (?#....) comment (not nestable) 396</PRE> 397</P> 398<br><a name="SEC16" href="#TOC1">OPTION SETTING</a><br> 399<P> 400<pre> 401 (?i) caseless 402 (?J) allow duplicate names 403 (?m) multiline 404 (?s) single line (dotall) 405 (?U) default ungreedy (lazy) 406 (?x) extended (ignore white space) 407 (?-...) unset option(s) 408</pre> 409The following are recognized only at the very start of a pattern or after one 410of the newline or \R options with similar syntax. More than one of them may 411appear. 412<pre> 413 (*LIMIT_MATCH=d) set the match limit to d (decimal number) 414 (*LIMIT_RECURSION=d) set the recursion limit to d (decimal number) 415 (*NO_AUTO_POSSESS) no auto-possessification (PCRE_NO_AUTO_POSSESS) 416 (*NO_START_OPT) no start-match optimization (PCRE_NO_START_OPTIMIZE) 417 (*UTF8) set UTF-8 mode: 8-bit library (PCRE_UTF8) 418 (*UTF16) set UTF-16 mode: 16-bit library (PCRE_UTF16) 419 (*UTF32) set UTF-32 mode: 32-bit library (PCRE_UTF32) 420 (*UTF) set appropriate UTF mode for the library in use 421 (*UCP) set PCRE_UCP (use Unicode properties for \d etc) 422</pre> 423Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the 424limits set by the caller of pcre_exec(), not increase them. 425</P> 426<br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br> 427<P> 428These are recognized only at the very start of the pattern or after option 429settings with a similar syntax. 430<pre> 431 (*CR) carriage return only 432 (*LF) linefeed only 433 (*CRLF) carriage return followed by linefeed 434 (*ANYCRLF) all three of the above 435 (*ANY) any Unicode newline sequence 436</PRE> 437</P> 438<br><a name="SEC18" href="#TOC1">WHAT \R MATCHES</a><br> 439<P> 440These are recognized only at the very start of the pattern or after option 441setting with a similar syntax. 442<pre> 443 (*BSR_ANYCRLF) CR, LF, or CRLF 444 (*BSR_UNICODE) any Unicode newline sequence 445</PRE> 446</P> 447<br><a name="SEC19" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br> 448<P> 449<pre> 450 (?=...) positive look ahead 451 (?!...) negative look ahead 452 (?<=...) positive look behind 453 (?<!...) negative look behind 454</pre> 455Each top-level branch of a look behind must be of a fixed length. 456</P> 457<br><a name="SEC20" href="#TOC1">BACKREFERENCES</a><br> 458<P> 459<pre> 460 \n reference by number (can be ambiguous) 461 \gn reference by number 462 \g{n} reference by number 463 \g{-n} relative reference by number 464 \k<name> reference by name (Perl) 465 \k'name' reference by name (Perl) 466 \g{name} reference by name (Perl) 467 \k{name} reference by name (.NET) 468 (?P=name) reference by name (Python) 469</PRE> 470</P> 471<br><a name="SEC21" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br> 472<P> 473<pre> 474 (?R) recurse whole pattern 475 (?n) call subpattern by absolute number 476 (?+n) call subpattern by relative number 477 (?-n) call subpattern by relative number 478 (?&name) call subpattern by name (Perl) 479 (?P>name) call subpattern by name (Python) 480 \g<name> call subpattern by name (Oniguruma) 481 \g'name' call subpattern by name (Oniguruma) 482 \g<n> call subpattern by absolute number (Oniguruma) 483 \g'n' call subpattern by absolute number (Oniguruma) 484 \g<+n> call subpattern by relative number (PCRE extension) 485 \g'+n' call subpattern by relative number (PCRE extension) 486 \g<-n> call subpattern by relative number (PCRE extension) 487 \g'-n' call subpattern by relative number (PCRE extension) 488</PRE> 489</P> 490<br><a name="SEC22" href="#TOC1">CONDITIONAL PATTERNS</a><br> 491<P> 492<pre> 493 (?(condition)yes-pattern) 494 (?(condition)yes-pattern|no-pattern) 495 496 (?(n)... absolute reference condition 497 (?(+n)... relative reference condition 498 (?(-n)... relative reference condition 499 (?(<name>)... named reference condition (Perl) 500 (?('name')... named reference condition (Perl) 501 (?(name)... named reference condition (PCRE) 502 (?(R)... overall recursion condition 503 (?(Rn)... specific group recursion condition 504 (?(R&name)... specific recursion condition 505 (?(DEFINE)... define subpattern for reference 506 (?(assert)... assertion condition 507</PRE> 508</P> 509<br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br> 510<P> 511The following act immediately they are reached: 512<pre> 513 (*ACCEPT) force successful match 514 (*FAIL) force backtrack; synonym (*F) 515 (*MARK:NAME) set name to be passed back; synonym (*:NAME) 516</pre> 517The following act only when a subsequent match failure causes a backtrack to 518reach them. They all force a match failure, but they differ in what happens 519afterwards. Those that advance the start-of-match point do so only if the 520pattern is not anchored. 521<pre> 522 (*COMMIT) overall failure, no advance of starting point 523 (*PRUNE) advance to next starting character 524 (*PRUNE:NAME) equivalent to (*MARK:NAME)(*PRUNE) 525 (*SKIP) advance to current matching position 526 (*SKIP:NAME) advance to position corresponding to an earlier 527 (*MARK:NAME); if not found, the (*SKIP) is ignored 528 (*THEN) local failure, backtrack to next alternation 529 (*THEN:NAME) equivalent to (*MARK:NAME)(*THEN) 530</PRE> 531</P> 532<br><a name="SEC24" href="#TOC1">CALLOUTS</a><br> 533<P> 534<pre> 535 (?C) callout 536 (?Cn) callout with data n 537</PRE> 538</P> 539<br><a name="SEC25" href="#TOC1">SEE ALSO</a><br> 540<P> 541<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), 542<b>pcrematching</b>(3), <b>pcre</b>(3). 543</P> 544<br><a name="SEC26" href="#TOC1">AUTHOR</a><br> 545<P> 546Philip Hazel 547<br> 548University Computing Service 549<br> 550Cambridge CB2 3QH, England. 551<br> 552</P> 553<br><a name="SEC27" href="#TOC1">REVISION</a><br> 554<P> 555Last updated: 08 January 2014 556<br> 557Copyright © 1997-2014 University of Cambridge. 558<br> 559<p> 560Return to the <a href="index.html">PCRE index page</a>. 561</p> 562