| /third_party/pcre2/pcre2/testdata/ |
| D | testinput26 | 6 /^\p{sc=Latin}/utf 9 /^\p{Script=Latn}/utf 13 /^\p{Latin}/utf 16 /^\p{scx=Latn}/utf 20 /^\p{Latin}/utf 23 /^\p{sc=Latin}/utf 27 /^\p{Latin}/utf 31 /^\p{sc=Greek}/utf 34 /^\p{Script=Grek}/utf 38 /^\p{Greek}/utf [all …]
|
| D | testoutput26 | 6 /^\p{sc=Latin}/utf 10 /^\p{Script=Latn}/utf 15 /^\p{Latin}/utf 19 /^\p{scx=Latn}/utf 24 /^\p{Latin}/utf 28 /^\p{sc=Latin}/utf 33 /^\p{Latin}/utf 38 /^\p{sc=Greek}/utf 42 /^\p{Script=Grek}/utf 47 /^\p{Greek}/utf [all …]
|
| D | testinput10 | 1 # This set of tests is for UTF-8 support and Unicode property support, with 4 # The next 5 patterns have UTF-8 errors 6 /[�]/utf 8 /�/utf 10 /���xxx/utf 12 /Â��������/utf 18 /badutf/utf 19 \= Expect UTF-8 errors 60 /badutf/utf 61 \= Expect UTF-8 errors [all …]
|
| D | testinput12 | 1 # This set of tests is for UTF-16 and UTF-32 support, including Unicode 5 /���xxx/IB,utf,no_utf_check 7 /abc/utf 12 /\x{ffff}/IB,utf 14 /\x{10000}/IB,utf 16 /\x{100}/IB,utf 18 /\x{1000}/IB,utf 20 /\x{10000}/IB,utf 22 /\x{100000}/IB,utf 24 /\x{10ffff}/IB,utf [all …]
|
| D | testinput7 | 1 # This set of tests checks UTF and Unicode property support with the DFA 8 /\x{100}ab/utf 11 /a\x{100}*b/utf 16 /a\x{100}+b/utf 22 /\bX/utf 29 /\BX/utf 36 /X\b/utf 43 /X\B/utf 50 /[^a]/utf 54 /^[abc\x{123}\x{400}-\x{402}]{2,3}\d/utf [all …]
|
| D | testinput4 | 1 # This set of tests is for UTF support, including Unicode properties. The 13 /a.b/utf 20 /a(.{3})b/utf 31 /a(.*?)(.)/utf 37 /a(.*)(.)/utf 43 /a(.)(.)/utf 49 /a(.?)(.)/utf 55 /a(.??)(.)/utf 58 /a(.{3})b/utf 66 /a(.{3,})b/utf [all …]
|
| D | testinput5 | 1 # This set of tests checks the API, internals, and non-Perl stuff for UTF 18 /^[\p{Arabic}]/utf 21 /^[[:graph:]]+$/utf,ucp 29 /^[[:print:]]+$/utf,ucp 37 /^[[:^graph:]]+$/utf,ucp 41 /^[[:^print:]]+$/utf,ucp 50 /^>[[:blank:]]*/utf,ucp 53 /^A\s+Z/utf,ucp 56 /^A[\s]+Z/utf,ucp 60 /^[[:graph:]]+$/utf,ucp [all …]
|
| D | testoutput10 | 1 # This set of tests is for UTF-8 support and Unicode property support, with 4 # The next 5 patterns have UTF-8 errors 6 /[�]/utf 7 Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80 9 /�/utf 10 Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end 12 /���xxx/utf 13 Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80 15 /Â��������/utf 16 Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set [all …]
|
| D | testoutput4 | 1 # This set of tests is for UTF support, including Unicode properties. The 13 /a.b/utf 24 /a(.{3})b/utf 46 /a(.*?)(.)/utf 58 /a(.*)(.)/utf 70 /a(.)(.)/utf 82 /a(.?)(.)/utf 94 /a(.??)(.)/utf 100 /a(.{3})b/utf 116 /a(.{3,})b/utf [all …]
|
| D | testinput22 | 2 # for DFA matching in UTF mode, so this test is not run with -dfa. The output 6 /ab\Cde/utf,info 9 # This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and 12 /(?<=ab\Cde)X/utf 19 /\C+\X \X+\C/Bx,utf 24 /utf 28 /\C(\W?ſ)'?{{/utf 32 /X(\C{3})/utf 37 /X(\C{4})/utf 42 /X\C*/utf [all …]
|
| /third_party/icu/icu4j/main/tests/core/src/com/ibm/icu/dev/test/charsetdet/ |
| D | CharsetDetectionTests.xml | 1 <?xml version="1.0" encoding="UTF-8"?> 9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window… 25 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da"> 41 <test-case id="IUC10-da" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/da"> 57 <test-case id="IUC10-de" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/de"> 73 <!-- No UTF-8 in this test because there are no non-ASCII characters. --> 74 <test-case id="IUC10-en" encodings="UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/en"> 90 <test-case id="IUC10-es" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/es"> 106 <test-case id="IUC10-fr" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/fr"> 123 <test-case id="IUC10-he" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-8-I/he"> [all …]
|
| /third_party/icu/ohos_icu4j/src/main/tests/resources/ohos/global/icu/dev/test/charsetdet/ |
| D | CharsetDetectionTests.xml | 1 <?xml version="1.0" encoding="UTF-8"?> 9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window… 25 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da"> 41 <test-case id="IUC10-da" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/da"> 57 <test-case id="IUC10-de" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/de"> 73 <!-- No UTF-8 in this test because there are no non-ASCII characters. --> 74 <test-case id="IUC10-en" encodings="UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/en"> 90 <test-case id="IUC10-es" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/es"> 106 <test-case id="IUC10-fr" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/fr"> 123 <test-case id="IUC10-he" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-8-I/he"> [all …]
|
| /third_party/icu/icu4c/source/test/testdata/ |
| D | csdetest.xml | 1 <?xml version="1.0" encoding="UTF-8"?> 8 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window… 24 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da"> 40 <test-case id="IUC10-da" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/da"> 56 <test-case id="IUC10-de" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/de"> 72 <!-- No UTF-8 in this test because there are no non-ASCII characters. --> 73 <test-case id="IUC10-en" encodings="UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/en"> 89 <test-case id="IUC10-es" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/es"> 105 <test-case id="IUC10-fr" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-1/fr"> 122 <test-case id="IUC10-he" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-8-I/he"> [all …]
|
| /third_party/icu/docs/userguide/strings/ |
| D | utf-8.md | 3 title: UTF-8 12 # UTF-8 15 UTF-16, except for conversion from bytes to strings (via InputStreamReader or 18 While most of ICU works with UTF-16 strings and uses data structures optimized 19 for UTF-16, there are APIs that facilitate working with UTF-8, or are optimized 20 for UTF-8, or work with Unicode code points (21-bit integer values) regardless 22 UTF-16 and UTF-8. 24 For UTF-8 strings, ICU normally uses `(const) char *` pointers and `int32_t` 25 lengths, normally with semantics parallel to UTF-16 handling. (Input length=-1 31 ## Conversion Between UTF-8 and UTF-16 [all …]
|
| /third_party/node/test/parallel/ |
| D | test-whatwg-encoding-custom-textdecoder-fatal.js | 14 { encoding: 'utf-8', input: [0xFF], name: 'invalid code' }, 15 { encoding: 'utf-8', input: [0xC0], name: 'ends early' }, 16 { encoding: 'utf-8', input: [0xE0], name: 'ends early 2' }, 17 { encoding: 'utf-8', input: [0xC0, 0x00], name: 'invalid trail' }, 18 { encoding: 'utf-8', input: [0xC0, 0xC0], name: 'invalid trail 2' }, 19 { encoding: 'utf-8', input: [0xE0, 0x00], name: 'invalid trail 3' }, 20 { encoding: 'utf-8', input: [0xE0, 0xC0], name: 'invalid trail 4' }, 21 { encoding: 'utf-8', input: [0xE0, 0x80, 0x00], name: 'invalid trail 5' }, 22 { encoding: 'utf-8', input: [0xE0, 0x80, 0xC0], name: 'invalid trail 6' }, 23 { encoding: 'utf-8', input: [0xFC, 0x80, 0x80, 0x80, 0x80, 0x80], [all …]
|
| D | test-whatwg-encoding-custom-textdecoder.js | 19 // Test TextDecoder, UTF-8, fatal: false, ignoreBOM: false 21 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => { 23 assert.strictEqual(dec.encoding, 'utf-8'); 28 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => { 37 // Test TextDecoder, UTF-8, fatal: false, ignoreBOM: true 39 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => { 45 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => { 54 // Test TextDecoder, UTF-8, fatal: true, ignoreBOM: false 56 ['unicode-1-1-utf-8', 'utf8', 'utf-8'].forEach((i) => { 63 'for encoding utf-8' [all …]
|
| /third_party/python/Lib/ |
| D | locale.py | 390 if encoding in ('ISO8859-15', 'UTF-8'): 507 elif code == 'UTF-8': 508 # On macOS "LC_CTYPE=UTF-8" is a valid locale setting 509 # for getting UTF-8 handling for text. 510 return None, 'UTF-8' 638 # On Android langinfo.h and CODESET are missing, and UTF-8 is 640 return 'UTF-8' 642 return 'UTF-8' 661 return 'UTF-8' 733 'utf_8': 'UTF-8', [all …]
|
| /third_party/node/test/fixtures/wpt/encoding/ |
| D | utf-32.html | 2 <meta charset=utf-8> 3 <title>Character Decoding: UTF-32 (not supported) subresource of UTF-8 document</title> 9 // Since UTF-32 is not supported: 10 // * HTML resources will use the parent encoding (UTF-8) 11 // * XML resources will default to UTF-8 12 // ... except for the UTF-32LE-with-BOM case, where the UTF-32 13 // BOM will be mistaken for a UTF-16LE BOM (FF FE 00 00), in which 14 // case it will be interpreted as UTF-16LE. 17 {file: 'resources/utf-32-big-endian-bom.html', 18 characterSet: 'UTF-8', [all …]
|
| D | textdecoder-fatal.any.js | 4 { encoding: 'utf-8', input: [0xFF], name: 'invalid code' }, 5 { encoding: 'utf-8', input: [0xC0], name: 'ends early' }, 6 { encoding: 'utf-8', input: [0xE0], name: 'ends early 2' }, 7 { encoding: 'utf-8', input: [0xC0, 0x00], name: 'invalid trail' }, 8 { encoding: 'utf-8', input: [0xC0, 0xC0], name: 'invalid trail 2' }, 9 { encoding: 'utf-8', input: [0xE0, 0x00], name: 'invalid trail 3' }, 10 { encoding: 'utf-8', input: [0xE0, 0xC0], name: 'invalid trail 4' }, 11 { encoding: 'utf-8', input: [0xE0, 0x80, 0x00], name: 'invalid trail 5' }, 12 { encoding: 'utf-8', input: [0xE0, 0x80, 0xC0], name: 'invalid trail 6' }, 13 { encoding: 'utf-8', input: [0xFC, 0x80, 0x80, 0x80, 0x80, 0x80], name: '> 0x10FFFF' }, [all …]
|
| D | utf-32-from-win1252.html | 3 <title>Character Decoding: UTF-32 (not supported) subresource of windows-1252 document</title> 9 // Since UTF-32 is not supported: 11 // * XML resources will default to UTF-8 12 // ... except for the UTF-32LE-with-BOM case, where the UTF-32 13 // BOM will be mistaken for a UTF-16LE BOM (FF FE 00 00), in which 14 // case it will be interpreted as UTF-16LE. 17 {file: 'resources/utf-32-big-endian-bom.html', 21 {file: 'resources/utf-32-big-endian-bom.xml', 22 characterSet: 'UTF-8', 25 {file: 'resources/utf-32-big-endian-nobom.html', [all …]
|
| D | unsupported-encodings.any.js | 4 // Attempting to decode '<' as UTF-7 (+AD4) ends up as '+AD4'. 5 ['UTF-7', 'utf-7'].forEach(label => { 10 // UTF-32 will be detected as UTF-16LE if leading BOM, or UTF-8 otherwise (due to XMLHttpRequest). 11 ['UTF-32', 'utf-32', 'UTF-32LE', 'utf-32le'].forEach(label => { 15 `${label} with BOM should decode as UTF-16LE`); 20 `${label} with no BOM should decode as UTF-8`);; 22 ['UTF-32be', 'utf-32be'].forEach(label => { 26 `${label} with no BOM should decode as UTF-8`); 31 `${label} with BOM should decode as UTF-8`);
|
| /third_party/pcre2/pcre2/doc/ |
| D | pcre2unicode.3 | 4 .SH "UNICODE AND UTF SUPPORT" 10 strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit 14 There are two ways of telling PCRE2 to switch to UTF mode, where characters may 20 with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF). 23 to UTF mode. 32 In UTF mode, both the pattern and any subject strings that are matched against 33 it are treated as UTF strings instead of strings of individual one-code-unit 61 .SH "WIDE CHARACTERS AND UTF MODES" 70 specifying a Unicode character by code point in a UTF mode. It is not allowed 71 in non-UTF mode. [all …]
|
| /third_party/PyYAML/tests/lib/ |
| D | test_input_output.py | 7 data = file.read().decode('utf-8') 13 for input in [data.encode('utf-8'), 14 codecs.BOM_UTF8+data.encode('utf-8'), 15 codecs.BOM_UTF16_BE+data.encode('utf-16-be'), 16 codecs.BOM_UTF16_LE+data.encode('utf-16-le')]: 28 data = file.read().decode('utf-8') 29 for input in [data.encode('utf-16-be'), 30 data.encode('utf-16-le'), 31 codecs.BOM_UTF8+data.encode('utf-16-be'), 32 codecs.BOM_UTF8+data.encode('utf-16-le')]: [all …]
|
| /third_party/pcre2/pcre2/doc/html/ |
| D | pcre2unicode.html | 16 UNICODE AND UTF SUPPORT 22 strings of text in UTF-8, UTF-16, and UTF-32 format (depending on the code unit 27 There are two ways of telling PCRE2 to switch to UTF mode, where characters may 31 with the PCRE2_UTF option, or the pattern may start with the sequence (*UTF). 34 to UTF mode. 42 In UTF mode, both the pattern and any subject strings that are matched against 43 it are treated as UTF strings instead of strings of individual one-code-unit 67 WIDE CHARACTERS AND UTF MODES 77 specifying a Unicode character by code point in a UTF mode. It is not allowed 78 in non-UTF mode. [all …]
|
| /third_party/icu/docs/userguide/ |
| D | unicode.md | 203 1. UTF-16, the default encoding form, maps a character code point to either one 206 2. UTF-8 is a byte-based encoding that offers backwards compatibility with 210 3. UTF-32 is the simplest, but most memory-intensive encoding form: It uses one 216 ICU uses UTF-16 internally. ICU 2.0 fully supports supplementary characters 221 text. UTF-8 is itself both an encoding form, and an encoding scheme because it is 222 byte-based. For each of UTF-16 and UTF-32, there are two variants defined: one 226 UTF-16BE, UTF-16LE, UTF-32BE, and UTF-32LE. 228 > :point_right: *The names "UTF-16" and "UTF-32" are ambiguous. Depending on context, they refer 235 ## Overview of UTF-16 244 DFFF<sub>16</sub>. Every Unicode code point has only one possible UTF-16 encoding with [all …]
|