| /third_party/python/Lib/ |
| D | locale.py | 3 The module provides low-level access to the C lib's locale APIs and adds high 25 # Yuck: LC_MESSAGES is non-standard: can't tell whether it exists before 34 """ strcoll(string,string) -> int. 37 return (a > b) - (a < b) 40 """ strxfrm(string) -> string. 41 Returns a string that behaves for cmp locale-aware. 64 """ localeconv() -> dict. 65 Returns numeric and monetary locale-specific parameters. 88 """ setlocale(integer,string=None) -> string. 125 # if grouping is -1, we are done [all …]
|
| /third_party/icu/docs/userguide/strings/ |
| D | utf-8.md | 1 --- 3 title: UTF-8 6 --- 7 <!-- 10 --> 12 # UTF-8 chapter 15 UTF-16, except for conversion from bytes to strings (via InputStreamReader or 18 While most of ICU works with UTF-16 strings and uses data structures optimized 19 for UTF-16, there are APIs that facilitate working with UTF-8, or are optimized 20 for UTF-8, or work with Unicode code points (21-bit integer values) regardless [all …]
|
| /third_party/lzma/CPP/Common/ |
| D | UTFConvert.h | 49 if (NonUtf) s.Add_OptSpaced("non-UTF8"); in PrintStatus() 84 if (allowReduced == false) - all UTF-8 character sequences must be finished. 85 if (allowReduced == true) - it allows truncated last character-Utf8-sequence 100 it processes SINGLE-SURROGATE-8 as valid Unicode point. 101 it converts SINGLE-SURROGATE-8 to SINGLE-SURROGATE-16 102 Note: some sequencies of two SINGLE-SURROGATE-8 points 103 will generate correct SURROGATE-16-PAIR, and 104 that SURROGATE-16-PAIR later will be converted to correct 105 UTF8-SURROGATE-21 point. So we don't restore original 106 STR-8 sequence in that case. [all …]
|
| /third_party/icu/ohos_icu4j/src/main/tests/resources/ohos/global/icu/dev/test/charsetdet/ |
| D | CharsetDetectionTests.xml | 1 <?xml version="1.0" encoding="UTF-8"?> 3 <!-- Copyright (C) 2016 and later: Unicode, Inc. and others. --> 4 <!-- License & terms of use: http://www.unicode.org/copyright.html#License --> 5 <!-- Copyright (c) 2005-2015 IBM Corporation and others. All rights reserved --> 6 <!-- See individual test cases for their specific copyright. --> 8 <charset-detection-tests> 9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window… 10 <!-- Copyright © 1991-2005 Unicode, Inc. All rights reserved. --> 15 تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود, الذي سيعقد في 10-12 آذار 1997 بمدينة ماينتس, 23 </test-case> [all …]
|
| /third_party/icu/icu4j/main/tests/core/src/com/ibm/icu/dev/test/charsetdet/ |
| D | CharsetDetectionTests.xml | 1 <?xml version="1.0" encoding="UTF-8"?> 3 <!-- Copyright (C) 2016 and later: Unicode, Inc. and others. --> 4 <!-- License & terms of use: http://www.unicode.org/copyright.html --> 5 <!-- Copyright (c) 2005-2015 IBM Corporation and others. All rights reserved --> 6 <!-- See individual test cases for their specific copyright. --> 8 <charset-detection-tests> 9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window… 10 <!-- Copyright © 1991-2005 Unicode, Inc. All rights reserved. --> 15 تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود, الذي سيعقد في 10-12 آذار 1997 بمدينة ماينتس, 23 </test-case> [all …]
|
| /third_party/icu/icu4c/source/test/testdata/ |
| D | csdetest.xml | 1 <?xml version="1.0" encoding="UTF-8"?> 3 <!-- Copyright (C) 2016 and later: Unicode, Inc. and others. License & terms of use: http://www.uni… 4 <!-- Copyright (c) 2005-2013 IBM Corporation and others. All rights reserved --> 5 <!-- See individual test cases for their specific copyright. --> 7 <charset-detection-tests> 8 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window… 9 <!-- Copyright © 1991-2005 Unicode, Inc. All rights reserved. --> 14 تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود, الذي سيعقد في 10-12 آذار 1997 بمدينة ماينتس, 22 </test-case> 24 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da"> [all …]
|
| /third_party/pcre2/pcre2/testdata/ |
| D | testoutput10 | 1 # This set of tests is for UTF-8 support and Unicode property support, with 2 # relevance only for the 8-bit library. 6 # The next 5 patterns have UTF-8 errors 8 /[�]/utf 9 Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80 11 /�/utf 12 Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end 14 /���xxx/utf 15 Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80 17 /Â��������/utf [all …]
|
| D | testinput10 | 1 # This set of tests is for UTF-8 support and Unicode property support, with 2 # relevance only for the 8-bit library. 6 # The next 5 patterns have UTF-8 errors 8 /[�]/utf 10 /�/utf 12 /���xxx/utf 14 /Â��������/utf 20 /badutf/utf 21 \= Expect UTF-8 errors 62 /badutf/utf [all …]
|
| D | testoutput14-8 | 1 # These test special UTF and UCP features of DFA matching. The output is 6 # ---------------------------------------------------- 8 # non-DFA matching. 10 /X/utf 12 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 14 Error -36 (bad UTF-8 offset) 18 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 22 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 26 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 30 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2 [all …]
|
| /third_party/protobuf/java/core/src/test/java/com/google/protobuf/ |
| D | CheckUtf8Test.java | 1 // Protocol Buffers - Google's data interchange format 3 // https://developers.google.com/protocol-buffers/ 42 * UTF-8 checks. 67 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildRequiredStringWithBadUtf8() 69 assertEquals("Byte string is not UTF-8.", exception.getMessage()); in testBuildRequiredStringWithBadUtf8() 76 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildOptionalStringWithBadUtf8() 78 assertEquals("Byte string is not UTF-8.", exception.getMessage()); in testBuildOptionalStringWithBadUtf8() 85 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildRepeatedStringWithBadUtf8() 87 assertEquals("Byte string is not UTF-8.", exception.getMessage()); in testBuildRepeatedStringWithBadUtf8() 100 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildRequiredStringWithBadUtf8Size() [all …]
|
| /third_party/python/Lib/test/test_email/ |
| D | test__encoded_words.py | 62 _ew.decode('=?utf-8?X?somevalue?=') 64 def _test(self, source, result, charset='us-ascii', lang='', defects=[]): 72 self._test('=?us-ascii?q?foo?=', 'foo') 75 self._test('=?us-ascii?b?dmk=?=', 'vi') 78 self._test('=?us-ascii?Q?foo?=', 'foo') 81 self._test('=?us-ascii?B?dmk=?=', 'vi') 84 self._test('=?latin-1?q?=20F=fcr=20Elise=20?=', ' Für Elise ', 'latin-1') 87 self._test(b'=?us-ascii?q?=20\xACfoo?='.decode('us-ascii', 93 self._test(b'=?us-ascii?b?dm\xACk?='.decode('us-ascii', 101 self._test('=?us-ascii?b?dm\x01k===?=', [all …]
|
| /third_party/python/Lib/test/ |
| D | test_utf8_mode.py | 2 Test the implementation of the PEP 540: the UTF-8 Mode. 46 out = self.get_output('-c', code, LC_ALL=loc) 52 out = self.get_output('-X', 'utf8', '-c', code) 55 # undocumented but accepted syntax: -X utf8=1 56 out = self.get_output('-X', 'utf8=1', '-c', code) 59 out = self.get_output('-X', 'utf8=0', '-c', code) 63 # PYTHONLEGACYWINDOWSFSENCODING disables the UTF-8 Mode 64 # and has the priority over -X utf8 65 out = self.get_output('-X', 'utf8', '-c', code, 72 out = self.get_output('-c', code, PYTHONUTF8='1') [all …]
|
| D | test_c_locale_coercion.py | 1 # Tests the attempted automatic coercion of the C locale to a UTF-8 locale 26 TARGET_LOCALES = ["C.UTF-8", "C.utf8", "UTF-8"] 31 # Android defaults to using UTF-8 for all system interfaces 32 EXPECTED_C_LOCALE_STREAM_ENCODING = "utf-8" 33 EXPECTED_C_LOCALE_FS_ENCODING = "utf-8" 41 # AIX uses iso8859-1 in the C locale, other *nix platforms use ASCII 42 EXPECTED_C_LOCALE_STREAM_ENCODING = "iso8859-1" 43 EXPECTED_C_LOCALE_FS_ENCODING = "iso8859-1" 45 # FS encoding is UTF-8 on macOS 46 EXPECTED_C_LOCALE_FS_ENCODING = "utf-8" [all …]
|
| D | test_locale.py | 19 tlocs = ("en_US.UTF-8", "en_US.ISO8859-1", "en_US") 28 tlocs = ("en_US.UTF-8", "en_US.ISO8859-1", 29 "en_US.US-ASCII", "en_US") 104 'negative_sign': '-', 115 # and a non-ASCII currency symbol. 130 'negative_sign': '-', 171 self._test_format("%f", -42, grouping=1, out='-42.000000') 172 self._test_format("%+f", -42, grouping=1, out='-42.000000') 175 self._test_format("%20.f", -42, grouping=1, out='-42'.rjust(20)) 177 self._test_format("%+10.f", -4200, grouping=1, [all …]
|
| /third_party/icu/icu4j/perf-tests/ |
| D | normperf.pl | 5 # * Copyright (C) 2002-2007 International Business Machines Corporation and * 15 #--------------------------------------------------------------------- 39 [ "TestNames_SerbianSH.txt", "UTF-8", "b"], 40 # [ "arabic.txt", "UTF-8", "b"], 41 # [ "french.txt", "UTF-8", "b"], 42 # [ "greek.txt", "UTF-8", "b"], 43 # [ "hebrew.txt", "UTF-8", "b"], 44 # [ "hindi.txt" , "UTF-8", "b"], 45 # [ "japanese.txt", "UTF-8", "b"], 46 # [ "korean.txt", "UTF-8", "b"], [all …]
|
| /third_party/PyYAML/tests/lib/ |
| D | test_input_output.py | 7 data = file.read().decode('utf-8') 13 for input in [data.encode('utf-8'), 14 codecs.BOM_UTF8+data.encode('utf-8'), 15 codecs.BOM_UTF16_BE+data.encode('utf-16-be'), 16 codecs.BOM_UTF16_LE+data.encode('utf-16-le')]: 28 data = file.read().decode('utf-8') 29 for input in [data.encode('utf-16-be'), 30 data.encode('utf-16-le'), 31 codecs.BOM_UTF8+data.encode('utf-16-be'), 32 codecs.BOM_UTF8+data.encode('utf-16-le')]: [all …]
|
| /third_party/pcre2/pcre2/src/ |
| D | pcre2_error.c | 2 * Perl-Compatible Regular Expressions * 9 Original API code Copyright (c) 1997-2012 University of Cambridge 10 New API code Copyright (c) 2016-2024 University of Cambridge 12 ----------------------------------------------------------------------------- 38 ----------------------------------------------------------------------------- 51 /* The texts of compile-time error messages. Compile-time error numbers start 58 pcre2_get_error_message() counts through to the one it wants - this isn't a 79 "unrecognized character after (? or (?-\0" 84 "reference to non-existent subpattern\0" 85 "pattern passed as NULL with non-zero length\0" [all …]
|
| /third_party/rust/crates/regex/regex-capi/ |
| D | README.md | 19 -------- 20 There are readable examples in the `ctest` and `examples` sub-directories. 23 [Rust and Cargo installed](https://www.rust-lang.org/downloads.html) 27 $ git clone git://github.com/rust-lang/regex 28 $ cd regex/regex-capi/examples 35 ----------- 45 https://github.com/rust-lang/regex/blob/master/PERFORMANCE.md 49 ------------- 50 All regular expressions must be valid UTF-8. 53 approximation, haystacks should be UTF-8. In fact, UTF-8 (and, one [all …]
|
| /third_party/mindspore/mindspore-src/source/tests/ut/python/dataset/ |
| D | test_save_op.py | 1 # Copyright 2020-2022 Huawei Technologies Co., Ltd 7 # http://www.apache.org/licenses/LICENSE-2.0 49 file_name = os.environ.get('PYTEST_CURRENT_TEST').split(':')[-1].split(' ')[0] 50 data = [{"image1": bytes("image1 bytes abcddddd", encoding='UTF-8'), 51 "image2": bytes("image1 bytes def", encoding='UTF-8'), 52 "image3": bytes("image1 bytes ghixxxxxxxxxx", encoding='UTF-8'), 53 "image4": bytes("image1 bytes jklzz", encoding='UTF-8'), 54 "image5": bytes("image1 bytes mno", encoding='UTF-8')}, 55 {"image1": bytes("image2 bytes abca", encoding='UTF-8'), 56 "image2": bytes("image2 bytes defbb", encoding='UTF-8'), [all …]
|
| /third_party/icu/docs/userguide/icu/ |
| D | unicode.md | 1 --- 6 --- 7 <!-- 10 --> 16 {: .no_toc .text-delta } 21 --- 41 Go to the [online ICU demos](https://icu4c-demos.unicode.org/icu-bin/icudemos) to 42 see how a Unicode-based server application can handle text in many languages and 47 Representing text-format data in computers is a matter of defining a set of 67 graphic, displayable characters. It was designed to represent English-language [all …]
|
| /third_party/musl/libc-test/src/functionalext/supplement/ctype/ |
| D | isalnum_l.c | 7 * http://www.apache.org/licenses/LICENSE-2.0 26 * @tc.desc : Verify isalnum_l process success when using the en-US.UTF-8 character set. 33 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0100() 58 * @tc.desc : Verify isalnum_l process success when using the en-US.UTF-8 character set. 65 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0200() 90 * @tc.desc : Verify isalnum_l process fail when using the en-US.UTF-8 character set. 97 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0300() 122 * @tc.desc : Verify isalnum_l process fail when using the en-US.UTF-8 character set. 128 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0400() 151 * @tc.desc : Verify isalnum_l process success when using the en-US.UTF-8 character set. [all …]
|
| /third_party/tex-hyphen/hyph-utf8/source/generic/hyph-utf8/lib/tex/hyphen/ |
| D | packages.yml | 3 Hyphenation patterns for Finnish in T1 and UTF-8 encodings. 5 while the newer ones (fi-x-school) implements the simpler rules taught at Finnish school. 8 description: |- 9 Hyphenation patterns for German in T1/EC and UTF-8 encodings, 11 The package includes the latest patterns from dehyph-exptl 13 however 8-bit engines still load old versions of patterns 14 for 'german' and 'ngerman' for backward-compatibility reasons. 27 description: |- 29 spelling in LGR and UTF-8 encodings. Patterns in UTF-8 use two code 40 description: |- [all …]
|
| /third_party/jerryscript/jerry-core/lit/ |
| D | lit-globals.h | 7 * http://www.apache.org/licenses/LICENSE-2.0 22 * ECMAScript standard defines terms "code unit" and "character" as 16-bit unsigned value 23 …* used to represent 16-bit unit of text, this is the same as code unit in UTF-16 (See ECMA-262 5.1… 26 …* than 16 bits: 0x0 - 0x10FFFFF). One code point could be represented with one ore two 16-bit code… 32 …* Internally JerryScript engine uses UTF-8 representation of strings to reduce memory overhead. Un… 33 * occupies from one to four bytes in UTF-8 representation. 35 * Unicode scalar value | Bytes in UTF-8 | Bytes in UTF-16 37 * ---------------------------------------------------------------------- 38 * 0x0 - 0x7F | 1 byte | 2 bytes 39 * 0x80 - 0x7FF | 2 bytes | 2 bytes [all …]
|
| /third_party/pcre2/pcre2/ |
| D | RunTest | 5 # selected, depending on which build-time options were used. 8 # JIT, unless "-nojit" is given on the command line. There are also two tests 9 # for JIT-specific features, one to be run when JIT support is available 10 # (unless "-nojit" is specified), and one when it is not. 12 # Whichever of the 8-, 16- and 32-bit libraries exist are tested. It is also 13 # possible to select which to test by giving "-8", "-16" or "-32" on the 16 # As well as "-nojit", "-8", "-16", and "-32", arguments for this script are 17 # individual test numbers, ranges of tests such as 3-6 or 3- (meaning 3 to the 18 # end), or a number preceded by ~ to exclude a test. For example, "3-15 ~10" 35 # Other arguments can be one of the words "-valgrind", "-valgrind-log", or [all …]
|
| /third_party/musl/libc-test/src/common/ |
| D | utf8.c | 9 setlocale(LC_CTYPE, "C.UTF-8") || in t_setutf8() 10 setlocale(LC_CTYPE, "POSIX.UTF-8") || in t_setutf8() 11 setlocale(LC_CTYPE, "en_US.UTF-8") || in t_setutf8() 12 setlocale(LC_CTYPE, "en_GB.UTF-8") || in t_setutf8() 13 setlocale(LC_CTYPE, "en.UTF-8") || in t_setutf8() 14 setlocale(LC_CTYPE, "UTF-8") || in t_setutf8() 17 if (strcmp(nl_langinfo(CODESET), "UTF-8")) in t_setutf8() 18 return t_error("cannot set UTF-8 locale for test (codeset=%s)\n", nl_langinfo(CODESET)); in t_setutf8()
|