1# This set of tests is for UTF-8 support and Unicode property support, with 2# relevance only for the 8-bit library. 3 4# The next 4 patterns have UTF-8 errors 5 6/[�]/utf 7Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80 8 9/�/utf 10Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end 11 12/���xxx/utf 13Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80 14 15/Â��������/utf 16Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set 17 18# Now test subjects 19 20/badutf/utf 21\= Expect UTF-8 errors 22 X\xdf 23Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1 24 XX\xef 25Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 26 XXX\xef\x80 27Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 28 X\xf7 29Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1 30 XX\xf7\x80 31Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 32 XXX\xf7\x80\x80 33Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3 34 \xfb 35Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 36 \xfb\x80 37Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 38 \xfb\x80\x80 39Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 40 \xfb\x80\x80\x80 41Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 42 \xfd 43Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0 44 \xfd\x80 45Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 46 \xfd\x80\x80 47Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 48 \xfd\x80\x80\x80 49Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 50 \xfd\x80\x80\x80\x80 51Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 52 \xdf\x7f 53Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 54 \xef\x7f\x80 55Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 56 \xef\x80\x7f 57Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 58 \xf7\x7f\x80\x80 59Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 60 \xf7\x80\x7f\x80 61Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 62 \xf7\x80\x80\x7f 63Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 64 \xfb\x7f\x80\x80\x80 65Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 66 \xfb\x80\x7f\x80\x80 67Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 68 \xfb\x80\x80\x7f\x80 69Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 70 \xfb\x80\x80\x80\x7f 71Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0 72 \xfd\x7f\x80\x80\x80\x80 73Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0 74 \xfd\x80\x7f\x80\x80\x80 75Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0 76 \xfd\x80\x80\x7f\x80\x80 77Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0 78 \xfd\x80\x80\x80\x7f\x80 79Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0 80 \xfd\x80\x80\x80\x80\x7f 81Failed: error -12: UTF-8 error: byte 6 top bits not 0x80 at offset 0 82 \xed\xa0\x80 83Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 84 \xc0\x8f 85Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 0 86 \xe0\x80\x8f 87Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 0 88 \xf0\x80\x80\x8f 89Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0 90 \xf8\x80\x80\x80\x8f 91Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0 92 \xfc\x80\x80\x80\x80\x8f 93Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0 94 \x80 95Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0 96 \xfe 97Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 98 \xff 99Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 100 101/badutf/utf 102\= Expect UTF-8 errors 103 XX\xfb\x80\x80\x80\x80 104Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 2 105 XX\xfd\x80\x80\x80\x80\x80 106Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 2 107 XX\xf7\xbf\xbf\xbf 108Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2 109 110/shortutf/utf 111\= Expect UTF-8 errors 112 XX\xdf\=ph 113Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 114 XX\xef\=ph 115Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2 116 XX\xef\x80\=ph 117Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2 118 \xf7\=ph 119Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 120 \xf7\x80\=ph 121Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 122 \xf7\x80\x80\=ph 123Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 124 \xfb\=ph 125Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 126 \xfb\x80\=ph 127Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 128 \xfb\x80\x80\=ph 129Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 130 \xfb\x80\x80\x80\=ph 131Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 132 \xfd\=ph 133Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0 134 \xfd\x80\=ph 135Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0 136 \xfd\x80\x80\=ph 137Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0 138 \xfd\x80\x80\x80\=ph 139Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0 140 \xfd\x80\x80\x80\x80\=ph 141Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0 142 143/anything/utf 144\= Expect UTF-8 errors 145 X\xc0\x80 146Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 1 147 XX\xc1\x8f 148Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 2 149 XXX\xe0\x9f\x80 150Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 3 151 \xf0\x8f\x80\x80 152Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0 153 \xf8\x87\x80\x80\x80 154Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0 155 \xfc\x83\x80\x80\x80\x80 156Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0 157 \xfe\x80\x80\x80\x80\x80 158Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 159 \xff\x80\x80\x80\x80\x80 160Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0 161 \xf8\x88\x80\x80\x80 162Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 163 \xf9\x87\x80\x80\x80 164Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 165 \xfc\x84\x80\x80\x80\x80 166Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 167 \xfd\x83\x80\x80\x80\x80 168Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 169\= Expect no match 170 \xc3\x8f 171No match 172 \xe0\xaf\x80 173No match 174 \xe1\x80\x80 175No match 176 \xf0\x9f\x80\x80 177No match 178 \xf1\x8f\x80\x80 179No match 180 \xf8\x88\x80\x80\x80\=no_utf_check 181No match 182 \xf9\x87\x80\x80\x80\=no_utf_check 183No match 184 \xfc\x84\x80\x80\x80\x80\=no_utf_check 185No match 186 \xfd\x83\x80\x80\x80\x80\=no_utf_check 187No match 188 189# Similar tests with offsets 190 191/badutf/utf 192\= Expect UTF-8 errors 193 X\xdfabcd 194Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 195 X\xdfabcd\=offset=1 196Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 197\= Expect no match 198 X\xdfabcd\=offset=2 199No match 200 201/(?<=x)badutf/utf 202\= Expect UTF-8 errors 203 X\xdfabcd 204Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 205 X\xdfabcd\=offset=1 206Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 207 X\xdfabcd\=offset=2 208Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 209 X\xdfabcd\xdf\=offset=3 210Failed: error -3: UTF-8 error: 1 byte missing at end at offset 6 211\= Expect no match 212 X\xdfabcd\=offset=3 213No match 214 215/(?<=xx)badutf/utf 216\= Expect UTF-8 errors 217 X\xdfabcd 218Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 219 X\xdfabcd\=offset=1 220Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 221 X\xdfabcd\=offset=2 222Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 223 X\xdfabcd\=offset=3 224Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 225 226/(?<=xxxx)badutf/utf 227\= Expect UTF-8 errors 228 X\xdfabcd 229Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 230 X\xdfabcd\=offset=1 231Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 232 X\xdfabcd\=offset=2 233Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 234 X\xdfabcd\=offset=3 235Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1 236 X\xdfabc\xdf\=offset=6 237Failed: error -3: UTF-8 error: 1 byte missing at end at offset 5 238 X\xdfabc\xdf\=offset=7 239Failed: error -33: bad offset value 240\= Expect no match 241 X\xdfabcd\=offset=6 242No match 243 244/\x{100}/IB,utf 245------------------------------------------------------------------ 246 Bra 247 \x{100} 248 Ket 249 End 250------------------------------------------------------------------ 251Capture group count = 0 252Options: utf 253First code unit = \xc4 254Last code unit = \x80 255Subject length lower bound = 1 256 257/\x{1000}/IB,utf 258------------------------------------------------------------------ 259 Bra 260 \x{1000} 261 Ket 262 End 263------------------------------------------------------------------ 264Capture group count = 0 265Options: utf 266First code unit = \xe1 267Last code unit = \x80 268Subject length lower bound = 1 269 270/\x{10000}/IB,utf 271------------------------------------------------------------------ 272 Bra 273 \x{10000} 274 Ket 275 End 276------------------------------------------------------------------ 277Capture group count = 0 278Options: utf 279First code unit = \xf0 280Last code unit = \x80 281Subject length lower bound = 1 282 283/\x{100000}/IB,utf 284------------------------------------------------------------------ 285 Bra 286 \x{100000} 287 Ket 288 End 289------------------------------------------------------------------ 290Capture group count = 0 291Options: utf 292First code unit = \xf4 293Last code unit = \x80 294Subject length lower bound = 1 295 296/\x{10ffff}/IB,utf 297------------------------------------------------------------------ 298 Bra 299 \x{10ffff} 300 Ket 301 End 302------------------------------------------------------------------ 303Capture group count = 0 304Options: utf 305First code unit = \xf4 306Last code unit = \xbf 307Subject length lower bound = 1 308 309/[\x{ff}]/IB,utf 310------------------------------------------------------------------ 311 Bra 312 \x{ff} 313 Ket 314 End 315------------------------------------------------------------------ 316Capture group count = 0 317Options: utf 318First code unit = \xc3 319Last code unit = \xbf 320Subject length lower bound = 1 321 322/[\x{100}]/IB,utf 323------------------------------------------------------------------ 324 Bra 325 \x{100} 326 Ket 327 End 328------------------------------------------------------------------ 329Capture group count = 0 330Options: utf 331First code unit = \xc4 332Last code unit = \x80 333Subject length lower bound = 1 334 335/\x80/IB,utf 336------------------------------------------------------------------ 337 Bra 338 \x{80} 339 Ket 340 End 341------------------------------------------------------------------ 342Capture group count = 0 343Options: utf 344First code unit = \xc2 345Last code unit = \x80 346Subject length lower bound = 1 347 348/\xff/IB,utf 349------------------------------------------------------------------ 350 Bra 351 \x{ff} 352 Ket 353 End 354------------------------------------------------------------------ 355Capture group count = 0 356Options: utf 357First code unit = \xc3 358Last code unit = \xbf 359Subject length lower bound = 1 360 361/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf 362------------------------------------------------------------------ 363 Bra 364 \x{d55c}\x{ad6d}\x{c5b4} 365 Ket 366 End 367------------------------------------------------------------------ 368Capture group count = 0 369Options: utf 370First code unit = \xed 371Last code unit = \xb4 372Subject length lower bound = 3 373 \x{D55c}\x{ad6d}\x{C5B4} 374 0: \x{d55c}\x{ad6d}\x{c5b4} 375 376/\x{65e5}\x{672c}\x{8a9e}/IB,utf 377------------------------------------------------------------------ 378 Bra 379 \x{65e5}\x{672c}\x{8a9e} 380 Ket 381 End 382------------------------------------------------------------------ 383Capture group count = 0 384Options: utf 385First code unit = \xe6 386Last code unit = \x9e 387Subject length lower bound = 3 388 \x{65e5}\x{672c}\x{8a9e} 389 0: \x{65e5}\x{672c}\x{8a9e} 390 391/\x{80}/IB,utf 392------------------------------------------------------------------ 393 Bra 394 \x{80} 395 Ket 396 End 397------------------------------------------------------------------ 398Capture group count = 0 399Options: utf 400First code unit = \xc2 401Last code unit = \x80 402Subject length lower bound = 1 403 404/\x{084}/IB,utf 405------------------------------------------------------------------ 406 Bra 407 \x{84} 408 Ket 409 End 410------------------------------------------------------------------ 411Capture group count = 0 412Options: utf 413First code unit = \xc2 414Last code unit = \x84 415Subject length lower bound = 1 416 417/\x{104}/IB,utf 418------------------------------------------------------------------ 419 Bra 420 \x{104} 421 Ket 422 End 423------------------------------------------------------------------ 424Capture group count = 0 425Options: utf 426First code unit = \xc4 427Last code unit = \x84 428Subject length lower bound = 1 429 430/\x{861}/IB,utf 431------------------------------------------------------------------ 432 Bra 433 \x{861} 434 Ket 435 End 436------------------------------------------------------------------ 437Capture group count = 0 438Options: utf 439First code unit = \xe0 440Last code unit = \xa1 441Subject length lower bound = 1 442 443/\x{212ab}/IB,utf 444------------------------------------------------------------------ 445 Bra 446 \x{212ab} 447 Ket 448 End 449------------------------------------------------------------------ 450Capture group count = 0 451Options: utf 452First code unit = \xf0 453Last code unit = \xab 454Subject length lower bound = 1 455 456/[^ab\xC0-\xF0]/IB,utf 457------------------------------------------------------------------ 458 Bra 459 [\x00-`c-\xbf\xf1-\xff] (neg) 460 Ket 461 End 462------------------------------------------------------------------ 463Capture group count = 0 464Options: utf 465Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a 466 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 467 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 468 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 469 Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f 470 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 471 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf 472 \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee 473 \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd 474 \xfe \xff 475Subject length lower bound = 1 476 \x{f1} 477 0: \x{f1} 478 \x{bf} 479 0: \x{bf} 480 \x{100} 481 0: \x{100} 482 \x{1000} 483 0: \x{1000} 484\= Expect no match 485 \x{c0} 486No match 487 \x{f0} 488No match 489 490/Ā{3,4}/IB,utf 491------------------------------------------------------------------ 492 Bra 493 \x{100}{3} 494 \x{100}?+ 495 Ket 496 End 497------------------------------------------------------------------ 498Capture group count = 0 499Options: utf 500First code unit = \xc4 501Last code unit = \x80 502Subject length lower bound = 3 503 \x{100}\x{100}\x{100}\x{100\x{100} 504 0: \x{100}\x{100}\x{100} 505 506/(\x{100}+|x)/IB,utf 507------------------------------------------------------------------ 508 Bra 509 CBra 1 510 \x{100}++ 511 Alt 512 x 513 Ket 514 Ket 515 End 516------------------------------------------------------------------ 517Capture group count = 1 518Options: utf 519Starting code units: x \xc4 520Subject length lower bound = 1 521 522/(\x{100}*a|x)/IB,utf 523------------------------------------------------------------------ 524 Bra 525 CBra 1 526 \x{100}*+ 527 a 528 Alt 529 x 530 Ket 531 Ket 532 End 533------------------------------------------------------------------ 534Capture group count = 1 535Options: utf 536Starting code units: a x \xc4 537Subject length lower bound = 1 538 539/(\x{100}{0,2}a|x)/IB,utf 540------------------------------------------------------------------ 541 Bra 542 CBra 1 543 \x{100}{0,2}+ 544 a 545 Alt 546 x 547 Ket 548 Ket 549 End 550------------------------------------------------------------------ 551Capture group count = 1 552Options: utf 553Starting code units: a x \xc4 554Subject length lower bound = 1 555 556/(\x{100}{1,2}a|x)/IB,utf 557------------------------------------------------------------------ 558 Bra 559 CBra 1 560 \x{100} 561 \x{100}{0,1}+ 562 a 563 Alt 564 x 565 Ket 566 Ket 567 End 568------------------------------------------------------------------ 569Capture group count = 1 570Options: utf 571Starting code units: x \xc4 572Subject length lower bound = 1 573 574/\x{100}/IB,utf 575------------------------------------------------------------------ 576 Bra 577 \x{100} 578 Ket 579 End 580------------------------------------------------------------------ 581Capture group count = 0 582Options: utf 583First code unit = \xc4 584Last code unit = \x80 585Subject length lower bound = 1 586 587/a\x{100}\x{101}*/IB,utf 588------------------------------------------------------------------ 589 Bra 590 a\x{100} 591 \x{101}*+ 592 Ket 593 End 594------------------------------------------------------------------ 595Capture group count = 0 596Options: utf 597First code unit = 'a' 598Last code unit = \x80 599Subject length lower bound = 2 600 601/a\x{100}\x{101}+/IB,utf 602------------------------------------------------------------------ 603 Bra 604 a\x{100} 605 \x{101}++ 606 Ket 607 End 608------------------------------------------------------------------ 609Capture group count = 0 610Options: utf 611First code unit = 'a' 612Last code unit = \x81 613Subject length lower bound = 3 614 615/[^\x{c4}]/IB 616------------------------------------------------------------------ 617 Bra 618 [^\x{c4}] 619 Ket 620 End 621------------------------------------------------------------------ 622Capture group count = 0 623Subject length lower bound = 1 624 625/[\x{100}]/IB,utf 626------------------------------------------------------------------ 627 Bra 628 \x{100} 629 Ket 630 End 631------------------------------------------------------------------ 632Capture group count = 0 633Options: utf 634First code unit = \xc4 635Last code unit = \x80 636Subject length lower bound = 1 637 \x{100} 638 0: \x{100} 639 Z\x{100} 640 0: \x{100} 641 \x{100}Z 642 0: \x{100} 643 644/[\xff]/IB,utf 645------------------------------------------------------------------ 646 Bra 647 \x{ff} 648 Ket 649 End 650------------------------------------------------------------------ 651Capture group count = 0 652Options: utf 653First code unit = \xc3 654Last code unit = \xbf 655Subject length lower bound = 1 656 >\x{ff}< 657 0: \x{ff} 658 659/[^\xff]/IB,utf 660------------------------------------------------------------------ 661 Bra 662 [^\x{ff}] 663 Ket 664 End 665------------------------------------------------------------------ 666Capture group count = 0 667Options: utf 668Subject length lower bound = 1 669 670/\x{100}abc(xyz(?1))/IB,utf 671------------------------------------------------------------------ 672 Bra 673 \x{100}abc 674 CBra 1 675 xyz 676 Recurse 677 Ket 678 Ket 679 End 680------------------------------------------------------------------ 681Capture group count = 1 682Options: utf 683First code unit = \xc4 684Last code unit = 'z' 685Subject length lower bound = 7 686 687/\777/I,utf 688Capture group count = 0 689Options: utf 690First code unit = \xc7 691Last code unit = \xbf 692Subject length lower bound = 1 693 \x{1ff} 694 0: \x{1ff} 695 \777 696 0: \x{1ff} 697 698/\x{100}+\x{200}/IB,utf 699------------------------------------------------------------------ 700 Bra 701 \x{100}++ 702 \x{200} 703 Ket 704 End 705------------------------------------------------------------------ 706Capture group count = 0 707Options: utf 708First code unit = \xc4 709Last code unit = \x80 710Subject length lower bound = 2 711 712/\x{100}+X/IB,utf 713------------------------------------------------------------------ 714 Bra 715 \x{100}++ 716 X 717 Ket 718 End 719------------------------------------------------------------------ 720Capture group count = 0 721Options: utf 722First code unit = \xc4 723Last code unit = 'X' 724Subject length lower bound = 2 725 726/^[\QĀ\E-\QŐ\E/B,utf 727Failed: error 106 at offset 15: missing terminating ] for character class 728 729# This tests the stricter UTF-8 check according to RFC 3629. 730 731/X/utf 732\= Expect UTF-8 errors 733 \x{d800} 734Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 735 \x{da00} 736Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 737 \x{dfff} 738Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0 739 \x{110000} 740Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 0 741 \x{2000000} 742Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0 743 \x{7fffffff} 744Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 745\= Expect no match 746 \x{d800}\=no_utf_check 747No match 748 \x{da00}\=no_utf_check 749No match 750 \x{dfff}\=no_utf_check 751No match 752 \x{110000}\=no_utf_check 753No match 754 \x{2000000}\=no_utf_check 755No match 756 \x{7fffffff}\=no_utf_check 757No match 758 759/(*UTF8)\x{1234}/ 760 abcd\x{1234}pqr 761 0: \x{1234} 762 763/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I 764Capture group count = 0 765Compile options: <none> 766Overall options: utf 767\R matches any Unicode newline 768Forced newline is CRLF 769First code unit = 'a' 770Last code unit = 'b' 771Subject length lower bound = 3 772 773/\h/I,utf 774Capture group count = 0 775Options: utf 776Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3 777Subject length lower bound = 1 778 ABC\x{09} 779 0: \x{09} 780 ABC\x{20} 781 0: 782 ABC\x{a0} 783 0: \x{a0} 784 ABC\x{1680} 785 0: \x{1680} 786 ABC\x{180e} 787 0: \x{180e} 788 ABC\x{2000} 789 0: \x{2000} 790 ABC\x{202f} 791 0: \x{202f} 792 ABC\x{205f} 793 0: \x{205f} 794 ABC\x{3000} 795 0: \x{3000} 796 797/\v/I,utf 798Capture group count = 0 799Options: utf 800Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 801Subject length lower bound = 1 802 ABC\x{0a} 803 0: \x{0a} 804 ABC\x{0b} 805 0: \x{0b} 806 ABC\x{0c} 807 0: \x{0c} 808 ABC\x{0d} 809 0: \x{0d} 810 ABC\x{85} 811 0: \x{85} 812 ABC\x{2028} 813 0: \x{2028} 814 815/\h*A/I,utf 816Capture group count = 0 817Options: utf 818Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3 819Last code unit = 'A' 820Subject length lower bound = 1 821 CDBABC 822 0: A 823 824/\v+A/I,utf 825Capture group count = 0 826Options: utf 827Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 828Last code unit = 'A' 829Subject length lower bound = 2 830 831/\s?xxx\s/I,utf 832Capture group count = 0 833Options: utf 834Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x 835Last code unit = 'x' 836Subject length lower bound = 4 837 838/\sxxx\s/I,utf,tables=2 839Capture group count = 0 840Options: utf 841Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2 842Last code unit = 'x' 843Subject length lower bound = 5 844 AB\x{85}xxx\x{a0}XYZ 845 0: \x{85}xxx\x{a0} 846 AB\x{a0}xxx\x{85}XYZ 847 0: \x{a0}xxx\x{85} 848 849/\S \S/I,utf,tables=2 850Capture group count = 0 851Options: utf 852Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f 853 \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e 854 \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C 855 D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h 856 i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 857 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 858 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 859 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 860 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 861Last code unit = ' ' 862Subject length lower bound = 3 863 \x{a2} \x{84} 864 0: \x{a2} \x{84} 865 A Z 866 0: A Z 867 868/a+/utf 869 a\x{123}aa\=offset=1 870 0: aa 871 a\x{123}aa\=offset=3 872 0: aa 873 a\x{123}aa\=offset=4 874 0: a 875\= Expect bad offset value 876 a\x{123}aa\=offset=6 877Failed: error -33: bad offset value 878\= Expect bad UTF-8 offset 879 a\x{123}aa\=offset=2 880Error -36 (bad UTF-8 offset) 881\= Expect no match 882 a\x{123}aa\=offset=5 883No match 884 885/\x{1234}+/Ii,utf 886Capture group count = 0 887Options: caseless utf 888Starting code units: \xe1 889Subject length lower bound = 1 890 891/\x{1234}+?/Ii,utf 892Capture group count = 0 893Options: caseless utf 894Starting code units: \xe1 895Subject length lower bound = 1 896 897/\x{1234}++/Ii,utf 898Capture group count = 0 899Options: caseless utf 900Starting code units: \xe1 901Subject length lower bound = 1 902 903/\x{1234}{2}/Ii,utf 904Capture group count = 0 905Options: caseless utf 906Starting code units: \xe1 907Subject length lower bound = 2 908 909/[^\x{c4}]/IB,utf 910------------------------------------------------------------------ 911 Bra 912 [^\x{c4}] 913 Ket 914 End 915------------------------------------------------------------------ 916Capture group count = 0 917Options: utf 918Subject length lower bound = 1 919 920/X+\x{200}/IB,utf 921------------------------------------------------------------------ 922 Bra 923 X++ 924 \x{200} 925 Ket 926 End 927------------------------------------------------------------------ 928Capture group count = 0 929Options: utf 930First code unit = 'X' 931Last code unit = \x80 932Subject length lower bound = 2 933 934/\R/I,utf 935Capture group count = 0 936Options: utf 937Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2 938Subject length lower bound = 1 939 940/\777/IB,utf 941------------------------------------------------------------------ 942 Bra 943 \x{1ff} 944 Ket 945 End 946------------------------------------------------------------------ 947Capture group count = 0 948Options: utf 949First code unit = \xc7 950Last code unit = \xbf 951Subject length lower bound = 1 952 953/\w+\x{C4}/B,utf 954------------------------------------------------------------------ 955 Bra 956 \w++ 957 \x{c4} 958 Ket 959 End 960------------------------------------------------------------------ 961 a\x{C4}\x{C4} 962 0: a\x{c4} 963 964/\w+\x{C4}/B,utf,tables=2 965------------------------------------------------------------------ 966 Bra 967 \w+ 968 \x{c4} 969 Ket 970 End 971------------------------------------------------------------------ 972 a\x{C4}\x{C4} 973 0: a\x{c4}\x{c4} 974 975/\W+\x{C4}/B,utf 976------------------------------------------------------------------ 977 Bra 978 \W+ 979 \x{c4} 980 Ket 981 End 982------------------------------------------------------------------ 983 !\x{C4} 984 0: !\x{c4} 985 986/\W+\x{C4}/B,utf,tables=2 987------------------------------------------------------------------ 988 Bra 989 \W++ 990 \x{c4} 991 Ket 992 End 993------------------------------------------------------------------ 994 !\x{C4} 995 0: !\x{c4} 996 997/\W+\x{A1}/B,utf 998------------------------------------------------------------------ 999 Bra 1000 \W+ 1001 \x{a1} 1002 Ket 1003 End 1004------------------------------------------------------------------ 1005 !\x{A1} 1006 0: !\x{a1} 1007 1008/\W+\x{A1}/B,utf,tables=2 1009------------------------------------------------------------------ 1010 Bra 1011 \W+ 1012 \x{a1} 1013 Ket 1014 End 1015------------------------------------------------------------------ 1016 !\x{A1} 1017 0: !\x{a1} 1018 1019/X\s+\x{A0}/B,utf 1020------------------------------------------------------------------ 1021 Bra 1022 X 1023 \s++ 1024 \x{a0} 1025 Ket 1026 End 1027------------------------------------------------------------------ 1028 X\x20\x{A0}\x{A0} 1029 0: X \x{a0} 1030 1031/X\s+\x{A0}/B,utf,tables=2 1032------------------------------------------------------------------ 1033 Bra 1034 X 1035 \s+ 1036 \x{a0} 1037 Ket 1038 End 1039------------------------------------------------------------------ 1040 X\x20\x{A0}\x{A0} 1041 0: X \x{a0}\x{a0} 1042 1043/\S+\x{A0}/B,utf 1044------------------------------------------------------------------ 1045 Bra 1046 \S+ 1047 \x{a0} 1048 Ket 1049 End 1050------------------------------------------------------------------ 1051 X\x{A0}\x{A0} 1052 0: X\x{a0}\x{a0} 1053 1054/\S+\x{A0}/B,utf,tables=2 1055------------------------------------------------------------------ 1056 Bra 1057 \S++ 1058 \x{a0} 1059 Ket 1060 End 1061------------------------------------------------------------------ 1062 X\x{A0}\x{A0} 1063 0: X\x{a0} 1064 1065/\x{a0}+\s!/B,utf 1066------------------------------------------------------------------ 1067 Bra 1068 \x{a0}++ 1069 \s 1070 ! 1071 Ket 1072 End 1073------------------------------------------------------------------ 1074 \x{a0}\x20! 1075 0: \x{a0} ! 1076 1077/\x{a0}+\s!/B,utf,tables=2 1078------------------------------------------------------------------ 1079 Bra 1080 \x{a0}+ 1081 \s 1082 ! 1083 Ket 1084 End 1085------------------------------------------------------------------ 1086 \x{a0}\x20! 1087 0: \x{a0} ! 1088 1089/A/utf 1090 \x{ff000041} 1091** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8 1092 \x{7f000041} 1093Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0 1094 1095/(*UTF8)abc/never_utf 1096Failed: error 174 at offset 7: using UTF is disabled by the application 1097 1098/abc/utf,never_utf 1099Failed: error 174 at offset 0: using UTF is disabled by the application 1100 1101/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf 1102------------------------------------------------------------------ 1103 Bra 1104 /i A\x{391}\x{10427}\x{ff3a}\x{1fb0} 1105 Ket 1106 End 1107------------------------------------------------------------------ 1108Capture group count = 0 1109Options: caseless utf 1110First code unit = 'A' (caseless) 1111Subject length lower bound = 5 1112 1113/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf 1114------------------------------------------------------------------ 1115 Bra 1116 A\x{391}\x{10427}\x{ff3a}\x{1fb0} 1117 Ket 1118 End 1119------------------------------------------------------------------ 1120Capture group count = 0 1121Options: utf 1122First code unit = 'A' 1123Last code unit = \xb0 1124Subject length lower bound = 5 1125 1126/AB\x{1fb0}/IB,utf 1127------------------------------------------------------------------ 1128 Bra 1129 AB\x{1fb0} 1130 Ket 1131 End 1132------------------------------------------------------------------ 1133Capture group count = 0 1134Options: utf 1135First code unit = 'A' 1136Last code unit = \xb0 1137Subject length lower bound = 3 1138 1139/AB\x{1fb0}/IBi,utf 1140------------------------------------------------------------------ 1141 Bra 1142 /i AB\x{1fb0} 1143 Ket 1144 End 1145------------------------------------------------------------------ 1146Capture group count = 0 1147Options: caseless utf 1148First code unit = 'A' (caseless) 1149Last code unit = 'B' (caseless) 1150Subject length lower bound = 3 1151 1152/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf 1153Capture group count = 0 1154Options: caseless utf 1155Starting code units: \xd0 \xd1 1156Subject length lower bound = 17 1157 \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} 1158 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f} 1159 \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} 1160 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f} 1161 1162/[ⱥ]/Bi,utf 1163------------------------------------------------------------------ 1164 Bra 1165 /i \x{2c65} 1166 Ket 1167 End 1168------------------------------------------------------------------ 1169 1170/[^ⱥ]/Bi,utf 1171------------------------------------------------------------------ 1172 Bra 1173 /i [^\x{2c65}] 1174 Ket 1175 End 1176------------------------------------------------------------------ 1177 1178/\h/I 1179Capture group count = 0 1180Starting code units: \x09 \x20 \xa0 1181Subject length lower bound = 1 1182 1183/\v/I 1184Capture group count = 0 1185Starting code units: \x0a \x0b \x0c \x0d \x85 1186Subject length lower bound = 1 1187 1188/\R/I 1189Capture group count = 0 1190Starting code units: \x0a \x0b \x0c \x0d \x85 1191Subject length lower bound = 1 1192 1193/[[:blank:]]/B,ucp 1194------------------------------------------------------------------ 1195 Bra 1196 [\x09 \xa0] 1197 Ket 1198 End 1199------------------------------------------------------------------ 1200 1201/\x{212a}+/Ii,utf 1202Capture group count = 0 1203Options: caseless utf 1204Starting code units: K k \xe2 1205Subject length lower bound = 1 1206 KKkk\x{212a} 1207 0: KKkk\x{212a} 1208 1209/s+/Ii,utf 1210Capture group count = 0 1211Options: caseless utf 1212Starting code units: S s \xc5 1213Subject length lower bound = 1 1214 SSss\x{17f} 1215 0: SSss\x{17f} 1216 1217/\x{100}*A/IB,utf 1218------------------------------------------------------------------ 1219 Bra 1220 \x{100}*+ 1221 A 1222 Ket 1223 End 1224------------------------------------------------------------------ 1225Capture group count = 0 1226Options: utf 1227Starting code units: A \xc4 1228Last code unit = 'A' 1229Subject length lower bound = 1 1230 A 1231 0: A 1232 1233/\x{100}*\d(?R)/IB,utf 1234------------------------------------------------------------------ 1235 Bra 1236 \x{100}*+ 1237 \d 1238 Recurse 1239 Ket 1240 End 1241------------------------------------------------------------------ 1242Capture group count = 0 1243Options: utf 1244Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 1245Subject length lower bound = 1 1246 1247/[Z\x{100}]/IB,utf 1248------------------------------------------------------------------ 1249 Bra 1250 [Z\x{100}] 1251 Ket 1252 End 1253------------------------------------------------------------------ 1254Capture group count = 0 1255Options: utf 1256Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd 1257 \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc 1258 \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb 1259 \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa 1260 \xfb \xfc \xfd \xfe \xff 1261Subject length lower bound = 1 1262 Z\x{100} 1263 0: Z 1264 \x{100} 1265 0: \x{100} 1266 \x{100}Z 1267 0: \x{100} 1268 1269/[z-\x{100}]/IB,utf 1270------------------------------------------------------------------ 1271 Bra 1272 [z-\xff\x{100}] 1273 Ket 1274 End 1275------------------------------------------------------------------ 1276Capture group count = 0 1277Options: utf 1278Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 1279 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 1280 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 1281 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 1282 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 1283Subject length lower bound = 1 1284 1285/[z\Qa-d]Ā\E]/IB,utf 1286------------------------------------------------------------------ 1287 Bra 1288 [\-\]adz\x{100}] 1289 Ket 1290 End 1291------------------------------------------------------------------ 1292Capture group count = 0 1293Options: utf 1294Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc 1295 \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb 1296 \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea 1297 \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 1298 \xfa \xfb \xfc \xfd \xfe \xff 1299Subject length lower bound = 1 1300 \x{100} 1301 0: \x{100} 1302 Ā 1303 0: \x{100} 1304 1305/[ab\x{100}]abc(xyz(?1))/IB,utf 1306------------------------------------------------------------------ 1307 Bra 1308 [ab\x{100}] 1309 abc 1310 CBra 1 1311 xyz 1312 Recurse 1313 Ket 1314 Ket 1315 End 1316------------------------------------------------------------------ 1317Capture group count = 1 1318Options: utf 1319Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd 1320 \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc 1321 \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb 1322 \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa 1323 \xfb \xfc \xfd \xfe \xff 1324Last code unit = 'z' 1325Subject length lower bound = 7 1326 1327/\x{100}*\s/IB,utf 1328------------------------------------------------------------------ 1329 Bra 1330 \x{100}*+ 1331 \s 1332 Ket 1333 End 1334------------------------------------------------------------------ 1335Capture group count = 0 1336Options: utf 1337Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4 1338Subject length lower bound = 1 1339 1340/\x{100}*\d/IB,utf 1341------------------------------------------------------------------ 1342 Bra 1343 \x{100}*+ 1344 \d 1345 Ket 1346 End 1347------------------------------------------------------------------ 1348Capture group count = 0 1349Options: utf 1350Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4 1351Subject length lower bound = 1 1352 1353/\x{100}*\w/IB,utf 1354------------------------------------------------------------------ 1355 Bra 1356 \x{100}*+ 1357 \w 1358 Ket 1359 End 1360------------------------------------------------------------------ 1361Capture group count = 0 1362Options: utf 1363Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P 1364 Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z 1365 \xc4 1366Subject length lower bound = 1 1367 1368/\x{100}*\D/IB,utf 1369------------------------------------------------------------------ 1370 Bra 1371 \x{100}* 1372 \D 1373 Ket 1374 End 1375------------------------------------------------------------------ 1376Capture group count = 0 1377Options: utf 1378Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a 1379 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 1380 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > 1381 ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c 1382 d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 1383 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 1384 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 1385 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef 1386 \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe 1387 \xff 1388Subject length lower bound = 1 1389 1390/\x{100}*\S/IB,utf 1391------------------------------------------------------------------ 1392 Bra 1393 \x{100}* 1394 \S 1395 Ket 1396 End 1397------------------------------------------------------------------ 1398Capture group count = 0 1399Options: utf 1400Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f 1401 \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e 1402 \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C 1403 D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h 1404 i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 1405 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 1406 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 1407 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 1408 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 1409Subject length lower bound = 1 1410 1411/\x{100}*\W/IB,utf 1412------------------------------------------------------------------ 1413 Bra 1414 \x{100}* 1415 \W 1416 Ket 1417 End 1418------------------------------------------------------------------ 1419Capture group count = 0 1420Options: utf 1421Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a 1422 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 1423 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = > 1424 ? @ [ \ ] ^ ` { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 1425 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 1426 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 1427 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 1428 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 1429Subject length lower bound = 1 1430 1431/[\x{105}-\x{109}]/IBi,utf 1432------------------------------------------------------------------ 1433 Bra 1434 [\x{104}-\x{109}] 1435 Ket 1436 End 1437------------------------------------------------------------------ 1438Capture group count = 0 1439Options: caseless utf 1440Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce 1441 \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd 1442 \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec 1443 \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb 1444 \xfc \xfd \xfe \xff 1445Subject length lower bound = 1 1446 \x{104} 1447 0: \x{104} 1448 \x{105} 1449 0: \x{105} 1450 \x{109} 1451 0: \x{109} 1452\= Expect no match 1453 \x{100} 1454No match 1455 \x{10a} 1456No match 1457 1458/[z-\x{100}]/IBi,utf 1459------------------------------------------------------------------ 1460 Bra 1461 [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] 1462 Ket 1463 End 1464------------------------------------------------------------------ 1465Capture group count = 0 1466Options: caseless utf 1467Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 1468 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 1469 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 1470 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 1471 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 1472Subject length lower bound = 1 1473 Z 1474 0: Z 1475 z 1476 0: z 1477 \x{39c} 1478 0: \x{39c} 1479 \x{178} 1480 0: \x{178} 1481 | 1482 0: | 1483 \x{80} 1484 0: \x{80} 1485 \x{ff} 1486 0: \x{ff} 1487 \x{100} 1488 0: \x{100} 1489 \x{101} 1490 0: \x{101} 1491\= Expect no match 1492 \x{102} 1493No match 1494 Y 1495No match 1496 y 1497No match 1498 1499/[z-\x{100}]/IBi,utf 1500------------------------------------------------------------------ 1501 Bra 1502 [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}] 1503 Ket 1504 End 1505------------------------------------------------------------------ 1506Capture group count = 0 1507Options: caseless utf 1508Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 1509 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 1510 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 1511 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 1512 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff 1513Subject length lower bound = 1 1514 1515/\x{3a3}B/IBi,utf 1516------------------------------------------------------------------ 1517 Bra 1518 clist 03a3 03c2 03c3 1519 /i B 1520 Ket 1521 End 1522------------------------------------------------------------------ 1523Capture group count = 0 1524Options: caseless utf 1525Starting code units: \xce \xcf 1526Last code unit = 'B' (caseless) 1527Subject length lower bound = 2 1528 1529/abc/utf,replace=� 1530 abc 1531Failed: error -3: UTF-8 error: 1 byte missing at end 1532 1533/(?<=(a)(?-1))x/I,utf 1534Capture group count = 1 1535Max lookbehind = 2 1536Options: utf 1537First code unit = 'x' 1538Subject length lower bound = 1 1539 a\x80zx\=offset=3 1540Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 1 1541 1542/[\W\p{Any}]/B 1543------------------------------------------------------------------ 1544 Bra 1545 [\x00-/:-@[-^`{-\xff\p{Any}] 1546 Ket 1547 End 1548------------------------------------------------------------------ 1549 abc 1550 0: a 1551 123 1552 0: 1 1553 1554/[\W\pL]/B 1555------------------------------------------------------------------ 1556 Bra 1557 [\x00-/:-@[-^`{-\xff\p{L}] 1558 Ket 1559 End 1560------------------------------------------------------------------ 1561 abc 1562 0: a 1563\= Expect no match 1564 123 1565No match 1566 1567/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf 1568Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN) 1569 1570/[\s[:^ascii:]]/B,ucp 1571------------------------------------------------------------------ 1572 Bra 1573 [\x80-\xff\p{Xsp}] 1574 Ket 1575 End 1576------------------------------------------------------------------ 1577 1578# A special extra option allows excaped surrogate code points in 8-bit mode, 1579# but subjects containing them must not be UTF-checked. 1580 1581/\x{d800}/I,utf,allow_surrogate_escapes 1582Capture group count = 0 1583Options: utf 1584Extra options: allow_surrogate_escapes 1585First code unit = \xed 1586Last code unit = \x80 1587Subject length lower bound = 1 1588 \x{d800}\=no_utf_check 1589 0: \x{d800} 1590 1591/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes 1592 \x{dfff}\x{df01}\=no_utf_check 1593 0: \x{dfff}\x{df01} 1594 1595# This has different starting code units in 8-bit mode. 1596 1597/^[^ab]/IB,utf 1598------------------------------------------------------------------ 1599 Bra 1600 ^ 1601 [\x00-`c-\xff] (neg) 1602 Ket 1603 End 1604------------------------------------------------------------------ 1605Capture group count = 0 1606Compile options: utf 1607Overall options: anchored utf 1608Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a 1609 \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 1610 \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 1611 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y 1612 Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f 1613 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 1614 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf 1615 \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee 1616 \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd 1617 \xfe \xff 1618Subject length lower bound = 1 1619 c 1620 0: c 1621 \x{ff} 1622 0: \x{ff} 1623 \x{100} 1624 0: \x{100} 1625\= Expect no match 1626 aaa 1627No match 1628 1629# Offsets are different in 8-bit mode. 1630 1631/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout 1632 123abcáyzabcdef789abcሴqr 1633 1(2) Old 6 6 "" New 6 8 "<>" 1634 2(2) Old 13 13 "" New 15 17 "<>" 1635 3(2) Old 13 16 "def" New 17 22 "<def>" 1636 4(2) Old 22 22 "" New 28 30 "<>" 1637 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr 1638 1639# Check name length with non-ASCII characters 1640 1641/(?'ABáC678901234567890123456789012'...)/utf 1642 1643/(?'ABáC6789012345678901234567890123'...)/utf 1644Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units) 1645 1646/(?'ABZC6789012345678901234567890123'...)/utf 1647 1648/(?(n/utf 1649Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?) 1650 1651/(?(á/utf 1652Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?) 1653 1654# End of testinput10 1655