• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# This set of tests is for UTF-8 support and Unicode property support, with
2# relevance only for the 8-bit library.
3
4# The next 4 patterns have UTF-8 errors
5
6/[�]/utf
7Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80
8
9/�/utf
10Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
11
12/���xxx/utf
13Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
14
15/��������/utf
16Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set
17
18# Now test subjects
19
20/badutf/utf
21\= Expect UTF-8 errors
22    X\xdf
23Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1
24    XX\xef
25Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
26    XXX\xef\x80
27Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
28    X\xf7
29Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1
30    XX\xf7\x80
31Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
32    XXX\xf7\x80\x80
33Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
34    \xfb
35Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0
36    \xfb\x80
37Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
38    \xfb\x80\x80
39Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
40    \xfb\x80\x80\x80
41Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0
42    \xfd
43Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0
44    \xfd\x80
45Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0
46    \xfd\x80\x80
47Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
48    \xfd\x80\x80\x80
49Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
50    \xfd\x80\x80\x80\x80
51Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0
52    \xdf\x7f
53Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0
54    \xef\x7f\x80
55Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0
56    \xef\x80\x7f
57Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0
58    \xf7\x7f\x80\x80
59Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0
60    \xf7\x80\x7f\x80
61Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0
62    \xf7\x80\x80\x7f
63Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0
64    \xfb\x7f\x80\x80\x80
65Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0
66    \xfb\x80\x7f\x80\x80
67Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0
68    \xfb\x80\x80\x7f\x80
69Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0
70    \xfb\x80\x80\x80\x7f
71Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0
72    \xfd\x7f\x80\x80\x80\x80
73Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 0
74    \xfd\x80\x7f\x80\x80\x80
75Failed: error -9: UTF-8 error: byte 3 top bits not 0x80 at offset 0
76    \xfd\x80\x80\x7f\x80\x80
77Failed: error -10: UTF-8 error: byte 4 top bits not 0x80 at offset 0
78    \xfd\x80\x80\x80\x7f\x80
79Failed: error -11: UTF-8 error: byte 5 top bits not 0x80 at offset 0
80    \xfd\x80\x80\x80\x80\x7f
81Failed: error -12: UTF-8 error: byte 6 top bits not 0x80 at offset 0
82    \xed\xa0\x80
83Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
84    \xc0\x8f
85Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 0
86    \xe0\x80\x8f
87Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 0
88    \xf0\x80\x80\x8f
89Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0
90    \xf8\x80\x80\x80\x8f
91Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0
92    \xfc\x80\x80\x80\x80\x8f
93Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0
94    \x80
95Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0
96    \xfe
97Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
98    \xff
99Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
100
101/badutf/utf
102\= Expect UTF-8 errors
103    XX\xfb\x80\x80\x80\x80
104Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 2
105    XX\xfd\x80\x80\x80\x80\x80
106Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 2
107    XX\xf7\xbf\xbf\xbf
108Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2
109
110/shortutf/utf
111\= Expect UTF-8 errors
112    XX\xdf\=ph
113Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
114    XX\xef\=ph
115Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
116    XX\xef\x80\=ph
117Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
118    \xf7\=ph
119Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
120    \xf7\x80\=ph
121Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
122    \xf7\x80\x80\=ph
123Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0
124    \xfb\=ph
125Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0
126    \xfb\x80\=ph
127Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
128    \xfb\x80\x80\=ph
129Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
130    \xfb\x80\x80\x80\=ph
131Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0
132    \xfd\=ph
133Failed: error -7: UTF-8 error: 5 bytes missing at end at offset 0
134    \xfd\x80\=ph
135Failed: error -6: UTF-8 error: 4 bytes missing at end at offset 0
136    \xfd\x80\x80\=ph
137Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
138    \xfd\x80\x80\x80\=ph
139Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
140    \xfd\x80\x80\x80\x80\=ph
141Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0
142
143/anything/utf
144\= Expect UTF-8 errors
145    X\xc0\x80
146Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 1
147    XX\xc1\x8f
148Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 2
149    XXX\xe0\x9f\x80
150Failed: error -18: UTF-8 error: overlong 3-byte sequence at offset 3
151    \xf0\x8f\x80\x80
152Failed: error -19: UTF-8 error: overlong 4-byte sequence at offset 0
153    \xf8\x87\x80\x80\x80
154Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0
155    \xfc\x83\x80\x80\x80\x80
156Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0
157    \xfe\x80\x80\x80\x80\x80
158Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
159    \xff\x80\x80\x80\x80\x80
160Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
161    \xf8\x88\x80\x80\x80
162Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
163    \xf9\x87\x80\x80\x80
164Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
165    \xfc\x84\x80\x80\x80\x80
166Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
167    \xfd\x83\x80\x80\x80\x80
168Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
169\= Expect no match
170    \xc3\x8f
171No match
172    \xe0\xaf\x80
173No match
174    \xe1\x80\x80
175No match
176    \xf0\x9f\x80\x80
177No match
178    \xf1\x8f\x80\x80
179No match
180    \xf8\x88\x80\x80\x80\=no_utf_check
181No match
182    \xf9\x87\x80\x80\x80\=no_utf_check
183No match
184    \xfc\x84\x80\x80\x80\x80\=no_utf_check
185No match
186    \xfd\x83\x80\x80\x80\x80\=no_utf_check
187No match
188
189# Similar tests with offsets
190
191/badutf/utf
192\= Expect UTF-8 errors
193    X\xdfabcd
194Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
195    X\xdfabcd\=offset=1
196Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
197\= Expect no match
198    X\xdfabcd\=offset=2
199No match
200
201/(?<=x)badutf/utf
202\= Expect UTF-8 errors
203    X\xdfabcd
204Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
205    X\xdfabcd\=offset=1
206Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
207    X\xdfabcd\=offset=2
208Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
209    X\xdfabcd\xdf\=offset=3
210Failed: error -3: UTF-8 error: 1 byte missing at end at offset 6
211\= Expect no match
212    X\xdfabcd\=offset=3
213No match
214
215/(?<=xx)badutf/utf
216\= Expect UTF-8 errors
217    X\xdfabcd
218Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
219    X\xdfabcd\=offset=1
220Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
221    X\xdfabcd\=offset=2
222Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
223    X\xdfabcd\=offset=3
224Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
225
226/(?<=xxxx)badutf/utf
227\= Expect UTF-8 errors
228    X\xdfabcd
229Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
230    X\xdfabcd\=offset=1
231Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
232    X\xdfabcd\=offset=2
233Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
234    X\xdfabcd\=offset=3
235Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
236    X\xdfabc\xdf\=offset=6
237Failed: error -3: UTF-8 error: 1 byte missing at end at offset 5
238    X\xdfabc\xdf\=offset=7
239Failed: error -33: bad offset value
240\= Expect no match
241    X\xdfabcd\=offset=6
242No match
243
244/\x{100}/IB,utf
245------------------------------------------------------------------
246        Bra
247        \x{100}
248        Ket
249        End
250------------------------------------------------------------------
251Capture group count = 0
252Options: utf
253First code unit = \xc4
254Last code unit = \x80
255Subject length lower bound = 1
256
257/\x{1000}/IB,utf
258------------------------------------------------------------------
259        Bra
260        \x{1000}
261        Ket
262        End
263------------------------------------------------------------------
264Capture group count = 0
265Options: utf
266First code unit = \xe1
267Last code unit = \x80
268Subject length lower bound = 1
269
270/\x{10000}/IB,utf
271------------------------------------------------------------------
272        Bra
273        \x{10000}
274        Ket
275        End
276------------------------------------------------------------------
277Capture group count = 0
278Options: utf
279First code unit = \xf0
280Last code unit = \x80
281Subject length lower bound = 1
282
283/\x{100000}/IB,utf
284------------------------------------------------------------------
285        Bra
286        \x{100000}
287        Ket
288        End
289------------------------------------------------------------------
290Capture group count = 0
291Options: utf
292First code unit = \xf4
293Last code unit = \x80
294Subject length lower bound = 1
295
296/\x{10ffff}/IB,utf
297------------------------------------------------------------------
298        Bra
299        \x{10ffff}
300        Ket
301        End
302------------------------------------------------------------------
303Capture group count = 0
304Options: utf
305First code unit = \xf4
306Last code unit = \xbf
307Subject length lower bound = 1
308
309/[\x{ff}]/IB,utf
310------------------------------------------------------------------
311        Bra
312        \x{ff}
313        Ket
314        End
315------------------------------------------------------------------
316Capture group count = 0
317Options: utf
318First code unit = \xc3
319Last code unit = \xbf
320Subject length lower bound = 1
321
322/[\x{100}]/IB,utf
323------------------------------------------------------------------
324        Bra
325        \x{100}
326        Ket
327        End
328------------------------------------------------------------------
329Capture group count = 0
330Options: utf
331First code unit = \xc4
332Last code unit = \x80
333Subject length lower bound = 1
334
335/\x80/IB,utf
336------------------------------------------------------------------
337        Bra
338        \x{80}
339        Ket
340        End
341------------------------------------------------------------------
342Capture group count = 0
343Options: utf
344First code unit = \xc2
345Last code unit = \x80
346Subject length lower bound = 1
347
348/\xff/IB,utf
349------------------------------------------------------------------
350        Bra
351        \x{ff}
352        Ket
353        End
354------------------------------------------------------------------
355Capture group count = 0
356Options: utf
357First code unit = \xc3
358Last code unit = \xbf
359Subject length lower bound = 1
360
361/\x{D55c}\x{ad6d}\x{C5B4}/IB,utf
362------------------------------------------------------------------
363        Bra
364        \x{d55c}\x{ad6d}\x{c5b4}
365        Ket
366        End
367------------------------------------------------------------------
368Capture group count = 0
369Options: utf
370First code unit = \xed
371Last code unit = \xb4
372Subject length lower bound = 3
373    \x{D55c}\x{ad6d}\x{C5B4}
374 0: \x{d55c}\x{ad6d}\x{c5b4}
375
376/\x{65e5}\x{672c}\x{8a9e}/IB,utf
377------------------------------------------------------------------
378        Bra
379        \x{65e5}\x{672c}\x{8a9e}
380        Ket
381        End
382------------------------------------------------------------------
383Capture group count = 0
384Options: utf
385First code unit = \xe6
386Last code unit = \x9e
387Subject length lower bound = 3
388    \x{65e5}\x{672c}\x{8a9e}
389 0: \x{65e5}\x{672c}\x{8a9e}
390
391/\x{80}/IB,utf
392------------------------------------------------------------------
393        Bra
394        \x{80}
395        Ket
396        End
397------------------------------------------------------------------
398Capture group count = 0
399Options: utf
400First code unit = \xc2
401Last code unit = \x80
402Subject length lower bound = 1
403
404/\x{084}/IB,utf
405------------------------------------------------------------------
406        Bra
407        \x{84}
408        Ket
409        End
410------------------------------------------------------------------
411Capture group count = 0
412Options: utf
413First code unit = \xc2
414Last code unit = \x84
415Subject length lower bound = 1
416
417/\x{104}/IB,utf
418------------------------------------------------------------------
419        Bra
420        \x{104}
421        Ket
422        End
423------------------------------------------------------------------
424Capture group count = 0
425Options: utf
426First code unit = \xc4
427Last code unit = \x84
428Subject length lower bound = 1
429
430/\x{861}/IB,utf
431------------------------------------------------------------------
432        Bra
433        \x{861}
434        Ket
435        End
436------------------------------------------------------------------
437Capture group count = 0
438Options: utf
439First code unit = \xe0
440Last code unit = \xa1
441Subject length lower bound = 1
442
443/\x{212ab}/IB,utf
444------------------------------------------------------------------
445        Bra
446        \x{212ab}
447        Ket
448        End
449------------------------------------------------------------------
450Capture group count = 0
451Options: utf
452First code unit = \xf0
453Last code unit = \xab
454Subject length lower bound = 1
455
456/[^ab\xC0-\xF0]/IB,utf
457------------------------------------------------------------------
458        Bra
459        [\x00-`c-\xbf\xf1-\xff] (neg)
460        Ket
461        End
462------------------------------------------------------------------
463Capture group count = 0
464Options: utf
465Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
466  \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
467  \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
468  5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
469  Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
470  \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
471  \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
472  \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
473  \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
474  \xfe \xff
475Subject length lower bound = 1
476    \x{f1}
477 0: \x{f1}
478    \x{bf}
479 0: \x{bf}
480    \x{100}
481 0: \x{100}
482    \x{1000}
483 0: \x{1000}
484\= Expect no match
485    \x{c0}
486No match
487    \x{f0}
488No match
489
490/Ā{3,4}/IB,utf
491------------------------------------------------------------------
492        Bra
493        \x{100}{3}
494        \x{100}?+
495        Ket
496        End
497------------------------------------------------------------------
498Capture group count = 0
499Options: utf
500First code unit = \xc4
501Last code unit = \x80
502Subject length lower bound = 3
503  \x{100}\x{100}\x{100}\x{100\x{100}
504 0: \x{100}\x{100}\x{100}
505
506/(\x{100}+|x)/IB,utf
507------------------------------------------------------------------
508        Bra
509        CBra 1
510        \x{100}++
511        Alt
512        x
513        Ket
514        Ket
515        End
516------------------------------------------------------------------
517Capture group count = 1
518Options: utf
519Starting code units: x \xc4
520Subject length lower bound = 1
521
522/(\x{100}*a|x)/IB,utf
523------------------------------------------------------------------
524        Bra
525        CBra 1
526        \x{100}*+
527        a
528        Alt
529        x
530        Ket
531        Ket
532        End
533------------------------------------------------------------------
534Capture group count = 1
535Options: utf
536Starting code units: a x \xc4
537Subject length lower bound = 1
538
539/(\x{100}{0,2}a|x)/IB,utf
540------------------------------------------------------------------
541        Bra
542        CBra 1
543        \x{100}{0,2}+
544        a
545        Alt
546        x
547        Ket
548        Ket
549        End
550------------------------------------------------------------------
551Capture group count = 1
552Options: utf
553Starting code units: a x \xc4
554Subject length lower bound = 1
555
556/(\x{100}{1,2}a|x)/IB,utf
557------------------------------------------------------------------
558        Bra
559        CBra 1
560        \x{100}
561        \x{100}{0,1}+
562        a
563        Alt
564        x
565        Ket
566        Ket
567        End
568------------------------------------------------------------------
569Capture group count = 1
570Options: utf
571Starting code units: x \xc4
572Subject length lower bound = 1
573
574/\x{100}/IB,utf
575------------------------------------------------------------------
576        Bra
577        \x{100}
578        Ket
579        End
580------------------------------------------------------------------
581Capture group count = 0
582Options: utf
583First code unit = \xc4
584Last code unit = \x80
585Subject length lower bound = 1
586
587/a\x{100}\x{101}*/IB,utf
588------------------------------------------------------------------
589        Bra
590        a\x{100}
591        \x{101}*+
592        Ket
593        End
594------------------------------------------------------------------
595Capture group count = 0
596Options: utf
597First code unit = 'a'
598Last code unit = \x80
599Subject length lower bound = 2
600
601/a\x{100}\x{101}+/IB,utf
602------------------------------------------------------------------
603        Bra
604        a\x{100}
605        \x{101}++
606        Ket
607        End
608------------------------------------------------------------------
609Capture group count = 0
610Options: utf
611First code unit = 'a'
612Last code unit = \x81
613Subject length lower bound = 3
614
615/[^\x{c4}]/IB
616------------------------------------------------------------------
617        Bra
618        [^\x{c4}]
619        Ket
620        End
621------------------------------------------------------------------
622Capture group count = 0
623Subject length lower bound = 1
624
625/[\x{100}]/IB,utf
626------------------------------------------------------------------
627        Bra
628        \x{100}
629        Ket
630        End
631------------------------------------------------------------------
632Capture group count = 0
633Options: utf
634First code unit = \xc4
635Last code unit = \x80
636Subject length lower bound = 1
637    \x{100}
638 0: \x{100}
639    Z\x{100}
640 0: \x{100}
641    \x{100}Z
642 0: \x{100}
643
644/[\xff]/IB,utf
645------------------------------------------------------------------
646        Bra
647        \x{ff}
648        Ket
649        End
650------------------------------------------------------------------
651Capture group count = 0
652Options: utf
653First code unit = \xc3
654Last code unit = \xbf
655Subject length lower bound = 1
656    >\x{ff}<
657 0: \x{ff}
658
659/[^\xff]/IB,utf
660------------------------------------------------------------------
661        Bra
662        [^\x{ff}]
663        Ket
664        End
665------------------------------------------------------------------
666Capture group count = 0
667Options: utf
668Subject length lower bound = 1
669
670/\x{100}abc(xyz(?1))/IB,utf
671------------------------------------------------------------------
672        Bra
673        \x{100}abc
674        CBra 1
675        xyz
676        Recurse
677        Ket
678        Ket
679        End
680------------------------------------------------------------------
681Capture group count = 1
682Options: utf
683First code unit = \xc4
684Last code unit = 'z'
685Subject length lower bound = 7
686
687/\777/I,utf
688Capture group count = 0
689Options: utf
690First code unit = \xc7
691Last code unit = \xbf
692Subject length lower bound = 1
693  \x{1ff}
694 0: \x{1ff}
695  \777
696 0: \x{1ff}
697
698/\x{100}+\x{200}/IB,utf
699------------------------------------------------------------------
700        Bra
701        \x{100}++
702        \x{200}
703        Ket
704        End
705------------------------------------------------------------------
706Capture group count = 0
707Options: utf
708First code unit = \xc4
709Last code unit = \x80
710Subject length lower bound = 2
711
712/\x{100}+X/IB,utf
713------------------------------------------------------------------
714        Bra
715        \x{100}++
716        X
717        Ket
718        End
719------------------------------------------------------------------
720Capture group count = 0
721Options: utf
722First code unit = \xc4
723Last code unit = 'X'
724Subject length lower bound = 2
725
726/^[\QĀ\E-\QŐ\E/B,utf
727Failed: error 106 at offset 15: missing terminating ] for character class
728
729# This tests the stricter UTF-8 check according to RFC 3629.
730
731/X/utf
732\= Expect UTF-8 errors
733    \x{d800}
734Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
735    \x{da00}
736Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
737    \x{dfff}
738Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
739    \x{110000}
740Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 0
741    \x{2000000}
742Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
743    \x{7fffffff}
744Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
745\= Expect no match
746    \x{d800}\=no_utf_check
747No match
748    \x{da00}\=no_utf_check
749No match
750    \x{dfff}\=no_utf_check
751No match
752    \x{110000}\=no_utf_check
753No match
754    \x{2000000}\=no_utf_check
755No match
756    \x{7fffffff}\=no_utf_check
757No match
758
759/(*UTF8)\x{1234}/
760    abcd\x{1234}pqr
761 0: \x{1234}
762
763/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
764Capture group count = 0
765Compile options: <none>
766Overall options: utf
767\R matches any Unicode newline
768Forced newline is CRLF
769First code unit = 'a'
770Last code unit = 'b'
771Subject length lower bound = 3
772
773/\h/I,utf
774Capture group count = 0
775Options: utf
776Starting code units: \x09 \x20 \xc2 \xe1 \xe2 \xe3
777Subject length lower bound = 1
778    ABC\x{09}
779 0: \x{09}
780    ABC\x{20}
781 0:
782    ABC\x{a0}
783 0: \x{a0}
784    ABC\x{1680}
785 0: \x{1680}
786    ABC\x{180e}
787 0: \x{180e}
788    ABC\x{2000}
789 0: \x{2000}
790    ABC\x{202f}
791 0: \x{202f}
792    ABC\x{205f}
793 0: \x{205f}
794    ABC\x{3000}
795 0: \x{3000}
796
797/\v/I,utf
798Capture group count = 0
799Options: utf
800Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
801Subject length lower bound = 1
802    ABC\x{0a}
803 0: \x{0a}
804    ABC\x{0b}
805 0: \x{0b}
806    ABC\x{0c}
807 0: \x{0c}
808    ABC\x{0d}
809 0: \x{0d}
810    ABC\x{85}
811 0: \x{85}
812    ABC\x{2028}
813 0: \x{2028}
814
815/\h*A/I,utf
816Capture group count = 0
817Options: utf
818Starting code units: \x09 \x20 A \xc2 \xe1 \xe2 \xe3
819Last code unit = 'A'
820Subject length lower bound = 1
821    CDBABC
822 0: A
823
824/\v+A/I,utf
825Capture group count = 0
826Options: utf
827Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
828Last code unit = 'A'
829Subject length lower bound = 2
830
831/\s?xxx\s/I,utf
832Capture group count = 0
833Options: utf
834Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 x
835Last code unit = 'x'
836Subject length lower bound = 4
837
838/\sxxx\s/I,utf,tables=2
839Capture group count = 0
840Options: utf
841Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc2
842Last code unit = 'x'
843Subject length lower bound = 5
844    AB\x{85}xxx\x{a0}XYZ
845 0: \x{85}xxx\x{a0}
846    AB\x{a0}xxx\x{85}XYZ
847 0: \x{a0}xxx\x{85}
848
849/\S \S/I,utf,tables=2
850Capture group count = 0
851Options: utf
852Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
853  \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
854  \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
855  D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
856  i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
857  \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
858  \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
859  \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
860  \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
861Last code unit = ' '
862Subject length lower bound = 3
863    \x{a2} \x{84}
864 0: \x{a2} \x{84}
865    A Z
866 0: A Z
867
868/a+/utf
869    a\x{123}aa\=offset=1
870 0: aa
871    a\x{123}aa\=offset=3
872 0: aa
873    a\x{123}aa\=offset=4
874 0: a
875\= Expect bad offset value
876    a\x{123}aa\=offset=6
877Failed: error -33: bad offset value
878\= Expect bad UTF-8 offset
879    a\x{123}aa\=offset=2
880Error -36 (bad UTF-8 offset)
881\= Expect no match
882    a\x{123}aa\=offset=5
883No match
884
885/\x{1234}+/Ii,utf
886Capture group count = 0
887Options: caseless utf
888Starting code units: \xe1
889Subject length lower bound = 1
890
891/\x{1234}+?/Ii,utf
892Capture group count = 0
893Options: caseless utf
894Starting code units: \xe1
895Subject length lower bound = 1
896
897/\x{1234}++/Ii,utf
898Capture group count = 0
899Options: caseless utf
900Starting code units: \xe1
901Subject length lower bound = 1
902
903/\x{1234}{2}/Ii,utf
904Capture group count = 0
905Options: caseless utf
906Starting code units: \xe1
907Subject length lower bound = 2
908
909/[^\x{c4}]/IB,utf
910------------------------------------------------------------------
911        Bra
912        [^\x{c4}]
913        Ket
914        End
915------------------------------------------------------------------
916Capture group count = 0
917Options: utf
918Subject length lower bound = 1
919
920/X+\x{200}/IB,utf
921------------------------------------------------------------------
922        Bra
923        X++
924        \x{200}
925        Ket
926        End
927------------------------------------------------------------------
928Capture group count = 0
929Options: utf
930First code unit = 'X'
931Last code unit = \x80
932Subject length lower bound = 2
933
934/\R/I,utf
935Capture group count = 0
936Options: utf
937Starting code units: \x0a \x0b \x0c \x0d \xc2 \xe2
938Subject length lower bound = 1
939
940/\777/IB,utf
941------------------------------------------------------------------
942        Bra
943        \x{1ff}
944        Ket
945        End
946------------------------------------------------------------------
947Capture group count = 0
948Options: utf
949First code unit = \xc7
950Last code unit = \xbf
951Subject length lower bound = 1
952
953/\w+\x{C4}/B,utf
954------------------------------------------------------------------
955        Bra
956        \w++
957        \x{c4}
958        Ket
959        End
960------------------------------------------------------------------
961    a\x{C4}\x{C4}
962 0: a\x{c4}
963
964/\w+\x{C4}/B,utf,tables=2
965------------------------------------------------------------------
966        Bra
967        \w+
968        \x{c4}
969        Ket
970        End
971------------------------------------------------------------------
972    a\x{C4}\x{C4}
973 0: a\x{c4}\x{c4}
974
975/\W+\x{C4}/B,utf
976------------------------------------------------------------------
977        Bra
978        \W+
979        \x{c4}
980        Ket
981        End
982------------------------------------------------------------------
983    !\x{C4}
984 0: !\x{c4}
985
986/\W+\x{C4}/B,utf,tables=2
987------------------------------------------------------------------
988        Bra
989        \W++
990        \x{c4}
991        Ket
992        End
993------------------------------------------------------------------
994    !\x{C4}
995 0: !\x{c4}
996
997/\W+\x{A1}/B,utf
998------------------------------------------------------------------
999        Bra
1000        \W+
1001        \x{a1}
1002        Ket
1003        End
1004------------------------------------------------------------------
1005    !\x{A1}
1006 0: !\x{a1}
1007
1008/\W+\x{A1}/B,utf,tables=2
1009------------------------------------------------------------------
1010        Bra
1011        \W+
1012        \x{a1}
1013        Ket
1014        End
1015------------------------------------------------------------------
1016    !\x{A1}
1017 0: !\x{a1}
1018
1019/X\s+\x{A0}/B,utf
1020------------------------------------------------------------------
1021        Bra
1022        X
1023        \s++
1024        \x{a0}
1025        Ket
1026        End
1027------------------------------------------------------------------
1028    X\x20\x{A0}\x{A0}
1029 0: X \x{a0}
1030
1031/X\s+\x{A0}/B,utf,tables=2
1032------------------------------------------------------------------
1033        Bra
1034        X
1035        \s+
1036        \x{a0}
1037        Ket
1038        End
1039------------------------------------------------------------------
1040    X\x20\x{A0}\x{A0}
1041 0: X \x{a0}\x{a0}
1042
1043/\S+\x{A0}/B,utf
1044------------------------------------------------------------------
1045        Bra
1046        \S+
1047        \x{a0}
1048        Ket
1049        End
1050------------------------------------------------------------------
1051    X\x{A0}\x{A0}
1052 0: X\x{a0}\x{a0}
1053
1054/\S+\x{A0}/B,utf,tables=2
1055------------------------------------------------------------------
1056        Bra
1057        \S++
1058        \x{a0}
1059        Ket
1060        End
1061------------------------------------------------------------------
1062    X\x{A0}\x{A0}
1063 0: X\x{a0}
1064
1065/\x{a0}+\s!/B,utf
1066------------------------------------------------------------------
1067        Bra
1068        \x{a0}++
1069        \s
1070        !
1071        Ket
1072        End
1073------------------------------------------------------------------
1074    \x{a0}\x20!
1075 0: \x{a0} !
1076
1077/\x{a0}+\s!/B,utf,tables=2
1078------------------------------------------------------------------
1079        Bra
1080        \x{a0}+
1081        \s
1082        !
1083        Ket
1084        End
1085------------------------------------------------------------------
1086    \x{a0}\x20!
1087 0: \x{a0} !
1088
1089/A/utf
1090  \x{ff000041}
1091** Character \x{ff000041} is greater than 0x7fffffff and so cannot be converted to UTF-8
1092  \x{7f000041}
1093Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
1094
1095/(*UTF8)abc/never_utf
1096Failed: error 174 at offset 7: using UTF is disabled by the application
1097
1098/abc/utf,never_utf
1099Failed: error 174 at offset 0: using UTF is disabled by the application
1100
1101/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IBi,utf
1102------------------------------------------------------------------
1103        Bra
1104     /i A\x{391}\x{10427}\x{ff3a}\x{1fb0}
1105        Ket
1106        End
1107------------------------------------------------------------------
1108Capture group count = 0
1109Options: caseless utf
1110First code unit = 'A' (caseless)
1111Subject length lower bound = 5
1112
1113/A\x{391}\x{10427}\x{ff3a}\x{1fb0}/IB,utf
1114------------------------------------------------------------------
1115        Bra
1116        A\x{391}\x{10427}\x{ff3a}\x{1fb0}
1117        Ket
1118        End
1119------------------------------------------------------------------
1120Capture group count = 0
1121Options: utf
1122First code unit = 'A'
1123Last code unit = \xb0
1124Subject length lower bound = 5
1125
1126/AB\x{1fb0}/IB,utf
1127------------------------------------------------------------------
1128        Bra
1129        AB\x{1fb0}
1130        Ket
1131        End
1132------------------------------------------------------------------
1133Capture group count = 0
1134Options: utf
1135First code unit = 'A'
1136Last code unit = \xb0
1137Subject length lower bound = 3
1138
1139/AB\x{1fb0}/IBi,utf
1140------------------------------------------------------------------
1141        Bra
1142     /i AB\x{1fb0}
1143        Ket
1144        End
1145------------------------------------------------------------------
1146Capture group count = 0
1147Options: caseless utf
1148First code unit = 'A' (caseless)
1149Last code unit = 'B' (caseless)
1150Subject length lower bound = 3
1151
1152/\x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}/Ii,utf
1153Capture group count = 0
1154Options: caseless utf
1155Starting code units: \xd0 \xd1
1156Subject length lower bound = 17
1157    \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
1158 0: \x{401}\x{420}\x{421}\x{422}\x{423}\x{424}\x{425}\x{426}\x{427}\x{428}\x{429}\x{42a}\x{42b}\x{42c}\x{42d}\x{42e}\x{42f}
1159    \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
1160 0: \x{451}\x{440}\x{441}\x{442}\x{443}\x{444}\x{445}\x{446}\x{447}\x{448}\x{449}\x{44a}\x{44b}\x{44c}\x{44d}\x{44e}\x{44f}
1161
1162/[ⱥ]/Bi,utf
1163------------------------------------------------------------------
1164        Bra
1165     /i \x{2c65}
1166        Ket
1167        End
1168------------------------------------------------------------------
1169
1170/[^ⱥ]/Bi,utf
1171------------------------------------------------------------------
1172        Bra
1173     /i [^\x{2c65}]
1174        Ket
1175        End
1176------------------------------------------------------------------
1177
1178/\h/I
1179Capture group count = 0
1180Starting code units: \x09 \x20 \xa0
1181Subject length lower bound = 1
1182
1183/\v/I
1184Capture group count = 0
1185Starting code units: \x0a \x0b \x0c \x0d \x85
1186Subject length lower bound = 1
1187
1188/\R/I
1189Capture group count = 0
1190Starting code units: \x0a \x0b \x0c \x0d \x85
1191Subject length lower bound = 1
1192
1193/[[:blank:]]/B,ucp
1194------------------------------------------------------------------
1195        Bra
1196        [\x09 \xa0]
1197        Ket
1198        End
1199------------------------------------------------------------------
1200
1201/\x{212a}+/Ii,utf
1202Capture group count = 0
1203Options: caseless utf
1204Starting code units: K k \xe2
1205Subject length lower bound = 1
1206    KKkk\x{212a}
1207 0: KKkk\x{212a}
1208
1209/s+/Ii,utf
1210Capture group count = 0
1211Options: caseless utf
1212Starting code units: S s \xc5
1213Subject length lower bound = 1
1214    SSss\x{17f}
1215 0: SSss\x{17f}
1216
1217/\x{100}*A/IB,utf
1218------------------------------------------------------------------
1219        Bra
1220        \x{100}*+
1221        A
1222        Ket
1223        End
1224------------------------------------------------------------------
1225Capture group count = 0
1226Options: utf
1227Starting code units: A \xc4
1228Last code unit = 'A'
1229Subject length lower bound = 1
1230    A
1231 0: A
1232
1233/\x{100}*\d(?R)/IB,utf
1234------------------------------------------------------------------
1235        Bra
1236        \x{100}*+
1237        \d
1238        Recurse
1239        Ket
1240        End
1241------------------------------------------------------------------
1242Capture group count = 0
1243Options: utf
1244Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
1245Subject length lower bound = 1
1246
1247/[Z\x{100}]/IB,utf
1248------------------------------------------------------------------
1249        Bra
1250        [Z\x{100}]
1251        Ket
1252        End
1253------------------------------------------------------------------
1254Capture group count = 0
1255Options: utf
1256Starting code units: Z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
1257  \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
1258  \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb
1259  \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa
1260  \xfb \xfc \xfd \xfe \xff
1261Subject length lower bound = 1
1262    Z\x{100}
1263 0: Z
1264    \x{100}
1265 0: \x{100}
1266    \x{100}Z
1267 0: \x{100}
1268
1269/[z-\x{100}]/IB,utf
1270------------------------------------------------------------------
1271        Bra
1272        [z-\xff\x{100}]
1273        Ket
1274        End
1275------------------------------------------------------------------
1276Capture group count = 0
1277Options: utf
1278Starting code units: z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
1279  \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
1280  \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7
1281  \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6
1282  \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
1283Subject length lower bound = 1
1284
1285/[z\Qa-d]Ā\E]/IB,utf
1286------------------------------------------------------------------
1287        Bra
1288        [\-\]adz\x{100}]
1289        Ket
1290        End
1291------------------------------------------------------------------
1292Capture group count = 0
1293Options: utf
1294Starting code units: - ] a d z \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc
1295  \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb
1296  \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea
1297  \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9
1298  \xfa \xfb \xfc \xfd \xfe \xff
1299Subject length lower bound = 1
1300    \x{100}
1301 0: \x{100}
1302    Ā
1303 0: \x{100}
1304
1305/[ab\x{100}]abc(xyz(?1))/IB,utf
1306------------------------------------------------------------------
1307        Bra
1308        [ab\x{100}]
1309        abc
1310        CBra 1
1311        xyz
1312        Recurse
1313        Ket
1314        Ket
1315        End
1316------------------------------------------------------------------
1317Capture group count = 1
1318Options: utf
1319Starting code units: a b \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd
1320  \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc
1321  \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb
1322  \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa
1323  \xfb \xfc \xfd \xfe \xff
1324Last code unit = 'z'
1325Subject length lower bound = 7
1326
1327/\x{100}*\s/IB,utf
1328------------------------------------------------------------------
1329        Bra
1330        \x{100}*+
1331        \s
1332        Ket
1333        End
1334------------------------------------------------------------------
1335Capture group count = 0
1336Options: utf
1337Starting code units: \x09 \x0a \x0b \x0c \x0d \x20 \xc4
1338Subject length lower bound = 1
1339
1340/\x{100}*\d/IB,utf
1341------------------------------------------------------------------
1342        Bra
1343        \x{100}*+
1344        \d
1345        Ket
1346        End
1347------------------------------------------------------------------
1348Capture group count = 0
1349Options: utf
1350Starting code units: 0 1 2 3 4 5 6 7 8 9 \xc4
1351Subject length lower bound = 1
1352
1353/\x{100}*\w/IB,utf
1354------------------------------------------------------------------
1355        Bra
1356        \x{100}*+
1357        \w
1358        Ket
1359        End
1360------------------------------------------------------------------
1361Capture group count = 0
1362Options: utf
1363Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
1364  Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
1365  \xc4
1366Subject length lower bound = 1
1367
1368/\x{100}*\D/IB,utf
1369------------------------------------------------------------------
1370        Bra
1371        \x{100}*
1372        \D
1373        Ket
1374        End
1375------------------------------------------------------------------
1376Capture group count = 0
1377Options: utf
1378Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
1379  \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
1380  \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = >
1381  ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c
1382  d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2
1383  \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1
1384  \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0
1385  \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef
1386  \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe
1387  \xff
1388Subject length lower bound = 1
1389
1390/\x{100}*\S/IB,utf
1391------------------------------------------------------------------
1392        Bra
1393        \x{100}*
1394        \S
1395        Ket
1396        End
1397------------------------------------------------------------------
1398Capture group count = 0
1399Options: utf
1400Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x0e \x0f
1401  \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19 \x1a \x1b \x1c \x1d \x1e
1402  \x1f ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C
1403  D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h
1404  i j k l m n o p q r s t u v w x y z { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4
1405  \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3
1406  \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2
1407  \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1
1408  \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
1409Subject length lower bound = 1
1410
1411/\x{100}*\W/IB,utf
1412------------------------------------------------------------------
1413        Bra
1414        \x{100}*
1415        \W
1416        Ket
1417        End
1418------------------------------------------------------------------
1419Capture group count = 0
1420Options: utf
1421Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
1422  \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
1423  \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / : ; < = >
1424  ? @ [ \ ] ^ ` { | } ~ \x7f \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
1425  \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
1426  \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7
1427  \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6
1428  \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
1429Subject length lower bound = 1
1430
1431/[\x{105}-\x{109}]/IBi,utf
1432------------------------------------------------------------------
1433        Bra
1434        [\x{104}-\x{109}]
1435        Ket
1436        End
1437------------------------------------------------------------------
1438Capture group count = 0
1439Options: caseless utf
1440Starting code units: \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce
1441  \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd
1442  \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec
1443  \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb
1444  \xfc \xfd \xfe \xff
1445Subject length lower bound = 1
1446    \x{104}
1447 0: \x{104}
1448    \x{105}
1449 0: \x{105}
1450    \x{109}
1451 0: \x{109}
1452\= Expect no match
1453    \x{100}
1454No match
1455    \x{10a}
1456No match
1457
1458/[z-\x{100}]/IBi,utf
1459------------------------------------------------------------------
1460        Bra
1461        [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}]
1462        Ket
1463        End
1464------------------------------------------------------------------
1465Capture group count = 0
1466Options: caseless utf
1467Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
1468  \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
1469  \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6
1470  \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5
1471  \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
1472Subject length lower bound = 1
1473    Z
1474 0: Z
1475    z
1476 0: z
1477    \x{39c}
1478 0: \x{39c}
1479    \x{178}
1480 0: \x{178}
1481    |
1482 0: |
1483    \x{80}
1484 0: \x{80}
1485    \x{ff}
1486 0: \x{ff}
1487    \x{100}
1488 0: \x{100}
1489    \x{101}
1490 0: \x{101}
1491\= Expect no match
1492    \x{102}
1493No match
1494    Y
1495No match
1496    y
1497No match
1498
1499/[z-\x{100}]/IBi,utf
1500------------------------------------------------------------------
1501        Bra
1502        [Zz-\xff\x{39c}\x{3bc}\x{212b}\x{1e9e}\x{212b}\x{178}\x{100}-\x{101}]
1503        Ket
1504        End
1505------------------------------------------------------------------
1506Capture group count = 0
1507Options: caseless utf
1508Starting code units: Z z { | } ~ \x7f \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8
1509  \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7
1510  \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6
1511  \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5
1512  \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
1513Subject length lower bound = 1
1514
1515/\x{3a3}B/IBi,utf
1516------------------------------------------------------------------
1517        Bra
1518        clist 03a3 03c2 03c3
1519     /i B
1520        Ket
1521        End
1522------------------------------------------------------------------
1523Capture group count = 0
1524Options: caseless utf
1525Starting code units: \xce \xcf
1526Last code unit = 'B' (caseless)
1527Subject length lower bound = 2
1528
1529/abc/utf,replace=�
1530    abc
1531Failed: error -3: UTF-8 error: 1 byte missing at end
1532
1533/(?<=(a)(?-1))x/I,utf
1534Capture group count = 1
1535Max lookbehind = 2
1536Options: utf
1537First code unit = 'x'
1538Subject length lower bound = 1
1539    a\x80zx\=offset=3
1540Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 1
1541
1542/[\W\p{Any}]/B
1543------------------------------------------------------------------
1544        Bra
1545        [\x00-/:-@[-^`{-\xff\p{Any}]
1546        Ket
1547        End
1548------------------------------------------------------------------
1549    abc
1550 0: a
1551    123
1552 0: 1
1553
1554/[\W\pL]/B
1555------------------------------------------------------------------
1556        Bra
1557        [\x00-/:-@[-^`{-\xff\p{L}]
1558        Ket
1559        End
1560------------------------------------------------------------------
1561    abc
1562 0: a
1563\= Expect no match
1564    123
1565No match
1566
1567/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf
1568Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
1569
1570/[\s[:^ascii:]]/B,ucp
1571------------------------------------------------------------------
1572        Bra
1573        [\x80-\xff\p{Xsp}]
1574        Ket
1575        End
1576------------------------------------------------------------------
1577
1578# A special extra option allows excaped surrogate code points in 8-bit mode,
1579# but subjects containing them must not be UTF-checked.
1580
1581/\x{d800}/I,utf,allow_surrogate_escapes
1582Capture group count = 0
1583Options: utf
1584Extra options: allow_surrogate_escapes
1585First code unit = \xed
1586Last code unit = \x80
1587Subject length lower bound = 1
1588    \x{d800}\=no_utf_check
1589 0: \x{d800}
1590
1591/\udfff\o{157401}/utf,alt_bsux,allow_surrogate_escapes
1592    \x{dfff}\x{df01}\=no_utf_check
1593 0: \x{dfff}\x{df01}
1594
1595# This has different starting code units in 8-bit mode.
1596
1597/^[^ab]/IB,utf
1598------------------------------------------------------------------
1599        Bra
1600        ^
1601        [\x00-`c-\xff] (neg)
1602        Ket
1603        End
1604------------------------------------------------------------------
1605Capture group count = 0
1606Compile options: utf
1607Overall options: anchored utf
1608Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
1609  \x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18 \x19
1610  \x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4
1611  5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
1612  Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ \x7f
1613  \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9 \xca \xcb \xcc \xcd \xce \xcf \xd0
1614  \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8 \xd9 \xda \xdb \xdc \xdd \xde \xdf
1615  \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7 \xe8 \xe9 \xea \xeb \xec \xed \xee
1616  \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6 \xf7 \xf8 \xf9 \xfa \xfb \xfc \xfd
1617  \xfe \xff
1618Subject length lower bound = 1
1619    c
1620 0: c
1621    \x{ff}
1622 0: \x{ff}
1623    \x{100}
1624 0: \x{100}
1625\= Expect no match
1626    aaa
1627No match
1628
1629# Offsets are different in 8-bit mode.
1630
1631/(?<=abc)(|def)/g,utf,replace=<$0>,substitute_callout
1632    123abcáyzabcdef789abcሴqr
1633 1(2) Old 6 6 "" New 6 8 "<>"
1634 2(2) Old 13 13 "" New 15 17 "<>"
1635 3(2) Old 13 16 "def" New 17 22 "<def>"
1636 4(2) Old 22 22 "" New 28 30 "<>"
1637 4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
1638
1639# Check name length with non-ASCII characters
1640
1641/(?'ABáC678901234567890123456789012'...)/utf
1642
1643/(?'ABáC6789012345678901234567890123'...)/utf
1644Failed: error 148 at offset 36: subpattern name is too long (maximum 32 code units)
1645
1646/(?'ABZC6789012345678901234567890123'...)/utf
1647
1648/(?(n/utf
1649Failed: error 142 at offset 4: syntax error in subpattern name (missing terminator?)
1650
1651/(?(á/utf
1652Failed: error 142 at offset 5: syntax error in subpattern name (missing terminator?)
1653
1654# End of testinput10
1655