Home
last modified time | relevance | path

Searched +full:utf +full:- +full:8 (Results 1 – 25 of 1119) sorted by relevance

12345678910>>...45

/third_party/python/Lib/
Dlocale.py3 The module provides low-level access to the C lib's locale APIs and adds high
25 # Yuck: LC_MESSAGES is non-standard: can't tell whether it exists before
34 """ strcoll(string,string) -> int.
37 return (a > b) - (a < b)
40 """ strxfrm(string) -> string.
41 Returns a string that behaves for cmp locale-aware.
64 """ localeconv() -> dict.
65 Returns numeric and monetary locale-specific parameters.
88 """ setlocale(integer,string=None) -> string.
125 # if grouping is -1, we are done
[all …]
/third_party/icu/docs/userguide/strings/
Dutf-8.md1 ---
3 title: UTF-8
6 ---
7 <!--
10 -->
12 # UTF-8 chapter
15 UTF-16, except for conversion from bytes to strings (via InputStreamReader or
18 While most of ICU works with UTF-16 strings and uses data structures optimized
19 for UTF-16, there are APIs that facilitate working with UTF-8, or are optimized
20 for UTF-8, or work with Unicode code points (21-bit integer values) regardless
[all …]
/third_party/lzma/CPP/Common/
DUTFConvert.h49 if (NonUtf) s.Add_OptSpaced("non-UTF8"); in PrintStatus()
84 if (allowReduced == false) - all UTF-8 character sequences must be finished.
85 if (allowReduced == true) - it allows truncated last character-Utf8-sequence
100 it processes SINGLE-SURROGATE-8 as valid Unicode point.
101 it converts SINGLE-SURROGATE-8 to SINGLE-SURROGATE-16
102 Note: some sequencies of two SINGLE-SURROGATE-8 points
103 will generate correct SURROGATE-16-PAIR, and
104 that SURROGATE-16-PAIR later will be converted to correct
105 UTF8-SURROGATE-21 point. So we don't restore original
106 STR-8 sequence in that case.
[all …]
/third_party/icu/ohos_icu4j/src/main/tests/resources/ohos/global/icu/dev/test/charsetdet/
DCharsetDetectionTests.xml1 <?xml version="1.0" encoding="UTF-8"?>
3 <!-- Copyright (C) 2016 and later: Unicode, Inc. and others. -->
4 <!-- License & terms of use: http://www.unicode.org/copyright.html#License -->
5 <!-- Copyright (c) 2005-2015 IBM Corporation and others. All rights reserved -->
6 <!-- See individual test cases for their specific copyright. -->
8 <charset-detection-tests>
9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window…
10 <!-- Copyright © 1991-2005 Unicode, Inc. All rights reserved. -->
15 تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود, الذي سيعقد في 10-12 آذار 1997 بمدينة ماينتس,
23 </test-case>
[all …]
/third_party/icu/icu4j/main/tests/core/src/com/ibm/icu/dev/test/charsetdet/
DCharsetDetectionTests.xml1 <?xml version="1.0" encoding="UTF-8"?>
3 <!-- Copyright (C) 2016 and later: Unicode, Inc. and others. -->
4 <!-- License & terms of use: http://www.unicode.org/copyright.html -->
5 <!-- Copyright (c) 2005-2015 IBM Corporation and others. All rights reserved -->
6 <!-- See individual test cases for their specific copyright. -->
8 <charset-detection-tests>
9 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window…
10 <!-- Copyright © 1991-2005 Unicode, Inc. All rights reserved. -->
15 تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود, الذي سيعقد في 10-12 آذار 1997 بمدينة ماينتس,
23 </test-case>
[all …]
/third_party/icu/icu4c/source/test/testdata/
Dcsdetest.xml1 <?xml version="1.0" encoding="UTF-8"?>
3 <!-- Copyright (C) 2016 and later: Unicode, Inc. and others. License & terms of use: http://www.uni…
4 <!-- Copyright (c) 2005-2013 IBM Corporation and others. All rights reserved -->
5 <!-- See individual test cases for their specific copyright. -->
7 <charset-detection-tests>
8 …<test-case id="IUC10-ar" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE ISO-8859-6/ar window…
9 <!-- Copyright © 1991-2005 Unicode, Inc. All rights reserved. -->
14 تسجّل الآن لحضور المؤتمر الدولي العاشر ليونيكود, الذي سيعقد في 10-12 آذار 1997 بمدينة ماينتس,
22 </test-case>
24 … <test-case id="IUC10-da-Q" encodings="UTF-8 UTF-16LE UTF-16BE UTF-32BE UTF-32LE windows-1252/da">
[all …]
/third_party/pcre2/pcre2/testdata/
Dtestoutput101 # This set of tests is for UTF-8 support and Unicode property support, with
2 # relevance only for the 8-bit library.
6 # The next 5 patterns have UTF-8 errors
8 /[�]/utf
9 Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80
11 /�/utf
12 Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
14 /���xxx/utf
15 Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
17 /��������/utf
[all …]
Dtestinput101 # This set of tests is for UTF-8 support and Unicode property support, with
2 # relevance only for the 8-bit library.
6 # The next 5 patterns have UTF-8 errors
8 /[�]/utf
10 /�/utf
12 /���xxx/utf
14 /��������/utf
20 /badutf/utf
21 \= Expect UTF-8 errors
62 /badutf/utf
[all …]
Dtestoutput14-81 # These test special UTF and UCP features of DFA matching. The output is
6 # ----------------------------------------------------
8 # non-DFA matching.
10 /X/utf
12 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
14 Error -36 (bad UTF-8 offset)
18 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
22 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
26 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
30 Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
[all …]
/third_party/protobuf/java/core/src/test/java/com/google/protobuf/
DCheckUtf8Test.java1 // Protocol Buffers - Google's data interchange format
3 // https://developers.google.com/protocol-buffers/
42 * UTF-8 checks.
67 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildRequiredStringWithBadUtf8()
69 assertEquals("Byte string is not UTF-8.", exception.getMessage()); in testBuildRequiredStringWithBadUtf8()
76 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildOptionalStringWithBadUtf8()
78 assertEquals("Byte string is not UTF-8.", exception.getMessage()); in testBuildOptionalStringWithBadUtf8()
85 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildRepeatedStringWithBadUtf8()
87 assertEquals("Byte string is not UTF-8.", exception.getMessage()); in testBuildRepeatedStringWithBadUtf8()
100 fail("Expected IllegalArgumentException for non UTF-8 byte string."); in testBuildRequiredStringWithBadUtf8Size()
[all …]
/third_party/python/Lib/test/test_email/
Dtest__encoded_words.py62 _ew.decode('=?utf-8?X?somevalue?=')
64 def _test(self, source, result, charset='us-ascii', lang='', defects=[]):
72 self._test('=?us-ascii?q?foo?=', 'foo')
75 self._test('=?us-ascii?b?dmk=?=', 'vi')
78 self._test('=?us-ascii?Q?foo?=', 'foo')
81 self._test('=?us-ascii?B?dmk=?=', 'vi')
84 self._test('=?latin-1?q?=20F=fcr=20Elise=20?=', ' Für Elise ', 'latin-1')
87 self._test(b'=?us-ascii?q?=20\xACfoo?='.decode('us-ascii',
93 self._test(b'=?us-ascii?b?dm\xACk?='.decode('us-ascii',
101 self._test('=?us-ascii?b?dm\x01k===?=',
[all …]
/third_party/python/Lib/test/
Dtest_utf8_mode.py2 Test the implementation of the PEP 540: the UTF-8 Mode.
46 out = self.get_output('-c', code, LC_ALL=loc)
52 out = self.get_output('-X', 'utf8', '-c', code)
55 # undocumented but accepted syntax: -X utf8=1
56 out = self.get_output('-X', 'utf8=1', '-c', code)
59 out = self.get_output('-X', 'utf8=0', '-c', code)
63 # PYTHONLEGACYWINDOWSFSENCODING disables the UTF-8 Mode
64 # and has the priority over -X utf8
65 out = self.get_output('-X', 'utf8', '-c', code,
72 out = self.get_output('-c', code, PYTHONUTF8='1')
[all …]
Dtest_c_locale_coercion.py1 # Tests the attempted automatic coercion of the C locale to a UTF-8 locale
26 TARGET_LOCALES = ["C.UTF-8", "C.utf8", "UTF-8"]
31 # Android defaults to using UTF-8 for all system interfaces
32 EXPECTED_C_LOCALE_STREAM_ENCODING = "utf-8"
33 EXPECTED_C_LOCALE_FS_ENCODING = "utf-8"
41 # AIX uses iso8859-1 in the C locale, other *nix platforms use ASCII
42 EXPECTED_C_LOCALE_STREAM_ENCODING = "iso8859-1"
43 EXPECTED_C_LOCALE_FS_ENCODING = "iso8859-1"
45 # FS encoding is UTF-8 on macOS
46 EXPECTED_C_LOCALE_FS_ENCODING = "utf-8"
[all …]
Dtest_locale.py19 tlocs = ("en_US.UTF-8", "en_US.ISO8859-1", "en_US")
28 tlocs = ("en_US.UTF-8", "en_US.ISO8859-1",
29 "en_US.US-ASCII", "en_US")
104 'negative_sign': '-',
115 # and a non-ASCII currency symbol.
130 'negative_sign': '-',
171 self._test_format("%f", -42, grouping=1, out='-42.000000')
172 self._test_format("%+f", -42, grouping=1, out='-42.000000')
175 self._test_format("%20.f", -42, grouping=1, out='-42'.rjust(20))
177 self._test_format("%+10.f", -4200, grouping=1,
[all …]
/third_party/icu/icu4j/perf-tests/
Dnormperf.pl5 # * Copyright (C) 2002-2007 International Business Machines Corporation and *
15 #---------------------------------------------------------------------
39 [ "TestNames_SerbianSH.txt", "UTF-8", "b"],
40 # [ "arabic.txt", "UTF-8", "b"],
41 # [ "french.txt", "UTF-8", "b"],
42 # [ "greek.txt", "UTF-8", "b"],
43 # [ "hebrew.txt", "UTF-8", "b"],
44 # [ "hindi.txt" , "UTF-8", "b"],
45 # [ "japanese.txt", "UTF-8", "b"],
46 # [ "korean.txt", "UTF-8", "b"],
[all …]
/third_party/PyYAML/tests/lib/
Dtest_input_output.py7 data = file.read().decode('utf-8')
13 for input in [data.encode('utf-8'),
14 codecs.BOM_UTF8+data.encode('utf-8'),
15 codecs.BOM_UTF16_BE+data.encode('utf-16-be'),
16 codecs.BOM_UTF16_LE+data.encode('utf-16-le')]:
28 data = file.read().decode('utf-8')
29 for input in [data.encode('utf-16-be'),
30 data.encode('utf-16-le'),
31 codecs.BOM_UTF8+data.encode('utf-16-be'),
32 codecs.BOM_UTF8+data.encode('utf-16-le')]:
[all …]
/third_party/pcre2/pcre2/src/
Dpcre2_error.c2 * Perl-Compatible Regular Expressions *
9 Original API code Copyright (c) 1997-2012 University of Cambridge
10 New API code Copyright (c) 2016-2024 University of Cambridge
12 -----------------------------------------------------------------------------
38 -----------------------------------------------------------------------------
51 /* The texts of compile-time error messages. Compile-time error numbers start
58 pcre2_get_error_message() counts through to the one it wants - this isn't a
79 "unrecognized character after (? or (?-\0"
84 "reference to non-existent subpattern\0"
85 "pattern passed as NULL with non-zero length\0"
[all …]
/third_party/rust/crates/regex/regex-capi/
DREADME.md19 --------
20 There are readable examples in the `ctest` and `examples` sub-directories.
23 [Rust and Cargo installed](https://www.rust-lang.org/downloads.html)
27 $ git clone git://github.com/rust-lang/regex
28 $ cd regex/regex-capi/examples
35 -----------
45 https://github.com/rust-lang/regex/blob/master/PERFORMANCE.md
49 -------------
50 All regular expressions must be valid UTF-8.
53 approximation, haystacks should be UTF-8. In fact, UTF-8 (and, one
[all …]
/third_party/mindspore/mindspore-src/source/tests/ut/python/dataset/
Dtest_save_op.py1 # Copyright 2020-2022 Huawei Technologies Co., Ltd
7 # http://www.apache.org/licenses/LICENSE-2.0
49 file_name = os.environ.get('PYTEST_CURRENT_TEST').split(':')[-1].split(' ')[0]
50 data = [{"image1": bytes("image1 bytes abcddddd", encoding='UTF-8'),
51 "image2": bytes("image1 bytes def", encoding='UTF-8'),
52 "image3": bytes("image1 bytes ghixxxxxxxxxx", encoding='UTF-8'),
53 "image4": bytes("image1 bytes jklzz", encoding='UTF-8'),
54 "image5": bytes("image1 bytes mno", encoding='UTF-8')},
55 {"image1": bytes("image2 bytes abca", encoding='UTF-8'),
56 "image2": bytes("image2 bytes defbb", encoding='UTF-8'),
[all …]
/third_party/icu/docs/userguide/icu/
Dunicode.md1 ---
6 ---
7 <!--
10 -->
16 {: .no_toc .text-delta }
21 ---
41 Go to the [online ICU demos](https://icu4c-demos.unicode.org/icu-bin/icudemos) to
42 see how a Unicode-based server application can handle text in many languages and
47 Representing text-format data in computers is a matter of defining a set of
67 graphic, displayable characters. It was designed to represent English-language
[all …]
/third_party/musl/libc-test/src/functionalext/supplement/ctype/
Disalnum_l.c7 * http://www.apache.org/licenses/LICENSE-2.0
26 * @tc.desc : Verify isalnum_l process success when using the en-US.UTF-8 character set.
33 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0100()
58 * @tc.desc : Verify isalnum_l process success when using the en-US.UTF-8 character set.
65 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0200()
90 * @tc.desc : Verify isalnum_l process fail when using the en-US.UTF-8 character set.
97 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0300()
122 * @tc.desc : Verify isalnum_l process fail when using the en-US.UTF-8 character set.
128 locale_t m_locale = newlocale(LC_ALL_MASK, "en_US.UTF-8", NULL); in isalnum_l_0400()
151 * @tc.desc : Verify isalnum_l process success when using the en-US.UTF-8 character set.
[all …]
/third_party/tex-hyphen/hyph-utf8/source/generic/hyph-utf8/lib/tex/hyphen/
Dpackages.yml3 Hyphenation patterns for Finnish in T1 and UTF-8 encodings.
5 while the newer ones (fi-x-school) implements the simpler rules taught at Finnish school.
8 description: |-
9 Hyphenation patterns for German in T1/EC and UTF-8 encodings,
11 The package includes the latest patterns from dehyph-exptl
13 however 8-bit engines still load old versions of patterns
14 for 'german' and 'ngerman' for backward-compatibility reasons.
27 description: |-
29 spelling in LGR and UTF-8 encodings. Patterns in UTF-8 use two code
40 description: |-
[all …]
/third_party/jerryscript/jerry-core/lit/
Dlit-globals.h7 * http://www.apache.org/licenses/LICENSE-2.0
22 * ECMAScript standard defines terms "code unit" and "character" as 16-bit unsigned value
23 …* used to represent 16-bit unit of text, this is the same as code unit in UTF-16 (See ECMA-262 5.1…
26 …* than 16 bits: 0x0 - 0x10FFFFF). One code point could be represented with one ore two 16-bit code…
32 …* Internally JerryScript engine uses UTF-8 representation of strings to reduce memory overhead. Un…
33 * occupies from one to four bytes in UTF-8 representation.
35 * Unicode scalar value | Bytes in UTF-8 | Bytes in UTF-16
37 * ----------------------------------------------------------------------
38 * 0x0 - 0x7F | 1 byte | 2 bytes
39 * 0x80 - 0x7FF | 2 bytes | 2 bytes
[all …]
/third_party/pcre2/pcre2/
DRunTest5 # selected, depending on which build-time options were used.
8 # JIT, unless "-nojit" is given on the command line. There are also two tests
9 # for JIT-specific features, one to be run when JIT support is available
10 # (unless "-nojit" is specified), and one when it is not.
12 # Whichever of the 8-, 16- and 32-bit libraries exist are tested. It is also
13 # possible to select which to test by giving "-8", "-16" or "-32" on the
16 # As well as "-nojit", "-8", "-16", and "-32", arguments for this script are
17 # individual test numbers, ranges of tests such as 3-6 or 3- (meaning 3 to the
18 # end), or a number preceded by ~ to exclude a test. For example, "3-15 ~10"
35 # Other arguments can be one of the words "-valgrind", "-valgrind-log", or
[all …]
/third_party/musl/libc-test/src/common/
Dutf8.c9 setlocale(LC_CTYPE, "C.UTF-8") || in t_setutf8()
10 setlocale(LC_CTYPE, "POSIX.UTF-8") || in t_setutf8()
11 setlocale(LC_CTYPE, "en_US.UTF-8") || in t_setutf8()
12 setlocale(LC_CTYPE, "en_GB.UTF-8") || in t_setutf8()
13 setlocale(LC_CTYPE, "en.UTF-8") || in t_setutf8()
14 setlocale(LC_CTYPE, "UTF-8") || in t_setutf8()
17 if (strcmp(nl_langinfo(CODESET), "UTF-8")) in t_setutf8()
18 return t_error("cannot set UTF-8 locale for test (codeset=%s)\n", nl_langinfo(CODESET)); in t_setutf8()

12345678910>>...45