Home
last modified time | relevance | path

Searched full:utf (Results 1 – 25 of 6490) sorted by relevance

12345678910>>...260

/third_party/pcre2/pcre2/testdata/
Dtestinput266 /^\p{sc=Latin}/utf
9 /^\p{Script=Latn}/utf
13 /^\p{Latin}/utf
16 /^\p{scx=Latn}/utf
20 /^\p{Latin}/utf
23 /^\p{sc=Latin}/utf
27 /^\p{Latin}/utf
31 /^\p{sc=Greek}/utf
34 /^\p{Script=Grek}/utf
38 /^\p{Greek}/utf
[all …]
Dtestoutput266 /^\p{sc=Latin}/utf
10 /^\p{Script=Latn}/utf
15 /^\p{Latin}/utf
19 /^\p{scx=Latn}/utf
24 /^\p{Latin}/utf
28 /^\p{sc=Latin}/utf
33 /^\p{Latin}/utf
38 /^\p{sc=Greek}/utf
42 /^\p{Script=Grek}/utf
47 /^\p{Greek}/utf
[all …]
Dtestinput101 # This set of tests is for UTF-8 support and Unicode property support, with
4 # The next 5 patterns have UTF-8 errors
6 /[�]/utf
8 /�/utf
10 /���xxx/utf
12 /��������/utf
18 /badutf/utf
19 \= Expect UTF-8 errors
60 /badutf/utf
61 \= Expect UTF-8 errors
[all …]
Dtestinput121 # This set of tests is for UTF-16 and UTF-32 support, including Unicode
5 /���xxx/IB,utf,no_utf_check
7 /abc/utf
12 /\x{ffff}/IB,utf
14 /\x{10000}/IB,utf
16 /\x{100}/IB,utf
18 /\x{1000}/IB,utf
20 /\x{10000}/IB,utf
22 /\x{100000}/IB,utf
24 /\x{10ffff}/IB,utf
[all …]
Dtestinput71 # This set of tests checks UTF and Unicode property support with the DFA
8 /\x{100}ab/utf
11 /a\x{100}*b/utf
16 /a\x{100}+b/utf
22 /\bX/utf
29 /\BX/utf
36 /X\b/utf
43 /X\B/utf
50 /[^a]/utf
54 /^[abc\x{123}\x{400}-\x{402}]{2,3}\d/utf
[all …]
Dtestinput41 # This set of tests is for UTF support, including Unicode properties. The
13 /a.b/utf
20 /a(.{3})b/utf
31 /a(.*?)(.)/utf
37 /a(.*)(.)/utf
43 /a(.)(.)/utf
49 /a(.?)(.)/utf
55 /a(.??)(.)/utf
58 /a(.{3})b/utf
66 /a(.{3,})b/utf
[all …]
Dtestinput51 # This set of tests checks the API, internals, and non-Perl stuff for UTF
18 /^[\p{Arabic}]/utf
21 /^[[:graph:]]+$/utf,ucp
29 /^[[:print:]]+$/utf,ucp
37 /^[[:^graph:]]+$/utf,ucp
41 /^[[:^print:]]+$/utf,ucp
50 /^>[[:blank:]]*/utf,ucp
53 /^A\s+Z/utf,ucp
56 /^A[\s]+Z/utf,ucp
60 /^[[:graph:]]+$/utf,ucp
[all …]
Dtestoutput101 # This set of tests is for UTF-8 support and Unicode property support, with
4 # The next 5 patterns have UTF-8 errors
6 /[�]/utf
7 Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80
9 /�/utf
10 Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
12 /���xxx/utf
13 Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
15 /��������/utf
16 Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set
[all …]
Dtestoutput41 # This set of tests is for UTF support, including Unicode properties. The
13 /a.b/utf
24 /a(.{3})b/utf
46 /a(.*?)(.)/utf
58 /a(.*)(.)/utf
70 /a(.)(.)/utf
82 /a(.?)(.)/utf
94 /a(.??)(.)/utf
100 /a(.{3})b/utf
116 /a(.{3,})b/utf
[all …]
Dtestinput222 # for DFA matching in UTF mode, so this test is not run with -dfa. The output
6 /ab\Cde/utf,info
9 # This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
12 /(?<=ab\Cde)X/utf
19 /\C+\X \X+\C/Bx,utf
24 /utf
28 /\C(\W?ſ)'?{{/utf
32 /X(\C{3})/utf
37 /X(\C{4})/utf
42 /X\C*/utf
[all …]
/third_party/icu/icu4j/main/tests/core/src/com/ibm/icu/dev/test/charsetdet/
DCharsetDetectionTests.xml1 <?xml version="1.0" encoding="UTF-8"?>
9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window…
25 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da">
41 <test-case id="IUC10-da" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/da">
57 <test-case id="IUC10-de" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/de">
73 <!-- No UTF-8 in this test because there are no non-ASCII characters. -->
74 <test-case id="IUC10-en" encodings="UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/en">
90 <test-case id="IUC10-es" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/es">
106 <test-case id="IUC10-fr" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/fr">
123 <test-case id="IUC10-he" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-8-I/he">
[all …]
/third_party/icu/ohos_icu4j/src/main/tests/resources/ohos/global/icu/dev/test/charsetdet/
DCharsetDetectionTests.xml1 <?xml version="1.0" encoding="UTF-8"?>
9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window…
25 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da">
41 <test-case id="IUC10-da" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/da">
57 <test-case id="IUC10-de" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/de">
73 <!-- No UTF-8 in this test because there are no non-ASCII characters. -->
74 <test-case id="IUC10-en" encodings="UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/en">
90 <test-case id="IUC10-es" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/es">
106 <test-case id="IUC10-fr" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/fr">
123 <test-case id="IUC10-he" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-8-I/he">
[all …]
/third_party/icu/icu4c/source/test/testdata/
Dcsdetest.xml1 <?xml version="1.0" encoding="UTF-8"?>
8 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window…
24 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da">
40 <test-case id="IUC10-da" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/da">
56 <test-case id="IUC10-de" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/de">
72 <!-- No UTF-8 in this test because there are no non-ASCII characters. -->
73 <test-case id="IUC10-en" encodings="UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/en">
89 <test-case id="IUC10-es" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/es">
105 <test-case id="IUC10-fr" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/fr">
122 <test-case id="IUC10-he" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-8-I/he">
[all …]
/third_party/icu/docs/userguide/strings/
Dutf-8.md3 title: UTF-8
12 # UTF-8
15 UTF-16, except for conversion from bytes to strings (via InputStreamReader or
18 While most of ICU works with UTF-16 strings and uses data structures optimized
19 for UTF-16, there are APIs that facilitate working with UTF-8, or are optimized
20 for UTF-8, or work with Unicode code points (21-bit integer values) regardless
22 UTF-16 and UTF-8.
24 For UTF-8 strings, ICU normally uses `(const) char *` pointers and `int32_t`
25 lengths, normally with semantics parallel to UTF-16 handling. (Input length=-1
31 ## Conversion Between UTF-8 and UTF-16
[all …]
/third_party/node/test/parallel/
Dtest-whatwg-encoding-custom-textdecoder-fatal.js14 { encoding: 'utf-8', input: [0xFF], name: 'invalid code' },
15 { encoding: 'utf-8', input: [0xC0], name: 'ends early' },
16 { encoding: 'utf-8', input: [0xE0], name: 'ends early 2' },
17 { encoding: 'utf-8', input: [0xC0, 0x00], name: 'invalid trail' },
18 { encoding: 'utf-8', input: [0xC0, 0xC0], name: 'invalid trail 2' },
19 { encoding: 'utf-8', input: [0xE0, 0x00], name: 'invalid trail 3' },
20 { encoding: 'utf-8', input: [0xE0, 0xC0], name: 'invalid trail 4' },
21 { encoding: 'utf-8', input: [0xE0, 0x80, 0x00], name: 'invalid trail 5' },
22 { encoding: 'utf-8', input: [0xE0, 0x80, 0xC0], name: 'invalid trail 6' },
23 { encoding: 'utf-8', input: [0xFC, 0x80, 0x80, 0x80, 0x80, 0x80],
[all …]
Dtest-whatwg-encoding-custom-textdecoder.js19 // Test TextDecoder, UTF-8, fatal: false, ignoreBOM: false
21 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => {
23 assert.strictEqual(dec.encoding, 'utf-8');
28 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => {
37 // Test TextDecoder, UTF-8, fatal: false, ignoreBOM: true
39 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => {
45 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => {
54 // Test TextDecoder, UTF-8, fatal: true, ignoreBOM: false
56 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => {
63 'for encoding utf-8'
[all …]
/third_party/python/Lib/
Dlocale.py390 if encoding in ('ISO8859-15', 'UTF-8'):
507 elif code == 'UTF-8':
508 # On macOS "LC_CTYPE=UTF-8" is a valid locale setting
509 # for getting UTF-8 handling for text.
510 return None, 'UTF-8'
638 # On Android langinfo.h and CODESET are missing, and UTF-8 is
640 return 'UTF-8'
642 return 'UTF-8'
661 return 'UTF-8'
733 'utf_8': 'UTF-8',
[all …]
/third_party/node/test/fixtures/wpt/encoding/
Dutf-32.html2 <meta charset=utf-8>
3 <title>Character Decoding: UTF-32 (not supported) subresource of UTF-8 document</title>
9 // Since UTF-32 is not supported:
10 // * HTML resources will use the parent encoding (UTF-8)
11 // * XML resources will default to UTF-8
12 // ... except for the UTF-32LE-with-BOM case, where the UTF-32
13 // BOM will be mistaken for a UTF-16LE BOM (FF FE 00 00), in which
14 // case it will be interpreted as UTF-16LE.
17 {file: 'resources/utf-32-big-endian-bom.html',
18 characterSet: 'UTF-8',
[all …]
Dtextdecoder-fatal.any.js4 { encoding: 'utf-8', input: [0xFF], name: 'invalid code' },
5 { encoding: 'utf-8', input: [0xC0], name: 'ends early' },
6 { encoding: 'utf-8', input: [0xE0], name: 'ends early 2' },
7 { encoding: 'utf-8', input: [0xC0, 0x00], name: 'invalid trail' },
8 { encoding: 'utf-8', input: [0xC0, 0xC0], name: 'invalid trail 2' },
9 { encoding: 'utf-8', input: [0xE0, 0x00], name: 'invalid trail 3' },
10 { encoding: 'utf-8', input: [0xE0, 0xC0], name: 'invalid trail 4' },
11 { encoding: 'utf-8', input: [0xE0, 0x80, 0x00], name: 'invalid trail 5' },
12 { encoding: 'utf-8', input: [0xE0, 0x80, 0xC0], name: 'invalid trail 6' },
13 { encoding: 'utf-8', input: [0xFC, 0x80, 0x80, 0x80, 0x80, 0x80], name: '> 0x10FFFF' },
[all …]
Dutf-32-from-win1252.html3 <title>Character Decoding: UTF-32 (not supported) subresource of windows-1252 document</title>
9 // Since UTF-32 is not supported:
11 // * XML resources will default to UTF-8
12 // ... except for the UTF-32LE-with-BOM case, where the UTF-32
13 // BOM will be mistaken for a UTF-16LE BOM (FF FE 00 00), in which
14 // case it will be interpreted as UTF-16LE.
17 {file: 'resources/utf-32-big-endian-bom.html',
21 {file: 'resources/utf-32-big-endian-bom.xml',
22 characterSet: 'UTF-8',
25 {file: 'resources/utf-32-big-endian-nobom.html',
[all …]
Dunsupported-encodings.any.js4 // Attempting to decode '<' as UTF-7 (+AD4) ends up as '+AD4'.
5 ['UTF-7', 'utf-7'].forEach(label => {
10 // UTF-32 will be detected as UTF-16LE if leading BOM, or UTF-8 otherwise (due to XMLHttpRequest).
11 ['UTF-32', 'utf-32', 'UTF-32LE', 'utf-32le'].forEach(label => {
15 `${label} with BOM should decode as UTF-16LE`);
20 `${label} with no BOM should decode as UTF-8`);;
22 ['UTF-32be', 'utf-32be'].forEach(label => {
26 `${label} with no BOM should decode as UTF-8`);
31 `${label} with BOM should decode as UTF-8`);
/third_party/pcre2/pcre2/doc/
Dpcre2unicode.34 .SH "UNICODE AND UTF SUPPORT"
10 strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit
14 There are two ways of telling PCRE2 to switch to UTF mode, where characters may
20 with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF).
23 to UTF mode.
32 In UTF mode, both the pattern and any subject strings that are matched against
33 it are treated as UTF strings instead of strings of individual one-code-unit
61 .SH "WIDE CHARACTERS AND UTF MODES"
70 specifying a Unicode character by code point in a UTF mode. It is not allowed
71 in non-UTF mode.
[all …]
/third_party/PyYAML/tests/lib/
Dtest_input_output.py7 data = file.read().decode('utf-8')
13 for input in [data.encode('utf-8'),
14 codecs.BOM_UTF8+data.encode('utf-8'),
15 codecs.BOM_UTF16_BE+data.encode('utf-16-be'),
16 codecs.BOM_UTF16_LE+data.encode('utf-16-le')]:
28 data = file.read().decode('utf-8')
29 for input in [data.encode('utf-16-be'),
30 data.encode('utf-16-le'),
31 codecs.BOM_UTF8+data.encode('utf-16-be'),
32 codecs.BOM_UTF8+data.encode('utf-16-le')]:
[all …]
/third_party/pcre2/pcre2/doc/html/
Dpcre2unicode.html16 UNICODE AND UTF SUPPORT
22 strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit
27 There are two ways of telling PCRE2 to switch to UTF mode, where characters may
31 with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF).
34 to UTF mode.
42 In UTF mode, both the pattern and any subject strings that are matched against
43 it are treated as UTF strings instead of strings of individual one-code-unit
67 WIDE CHARACTERS AND UTF MODES
77 specifying a Unicode character by code point in a UTF mode. It is not allowed
78 in non-UTF mode.
[all …]
/third_party/icu/docs/userguide/
Dunicode.md203 1. UTF-16, the default encoding form, maps a character code point to either one
206 2. UTF-8 is a byte-based encoding that offers backwards compatibility with
210 3. UTF-32 is the simplest, but most memory-intensive encoding form: It uses one
216 ICU uses UTF-16 internally. ICU 2.0 fully supports supplementary characters
221 text. UTF-8 is itself both an encoding form, and an encoding scheme because it is
222 byte-based. For each of UTF-16 and UTF-32, there are two variants defined: one
226 UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE.
228 > :point_right: *The names "UTF-16" and "UTF-32" are ambiguous. Depending on context, they refer
235 ## Overview of UTF-16
244 DFFF<sub>16</sub>. Every Unicode code point has only one possible UTF-16 encoding with
[all …]

12345678910>>...260