Lines Matching +full:performance +full:- +full:rules
1 ---
6 ---
7 <!--
10 -->
16 {: .no_toc .text-delta }
21 ---
40 custom set of rules (a tailoring).
52 function is supported for cloning collators in a thread-safe fashion.
68 length is required in a separate argument. If -1 is passed for the length,
86 If there is a locale with a keyword, like "de-u-co-phonebk" or "de@collation=phonebook", the
98 strings should be in UTF-16 format, and that all the required conversion should
101 or UTF-8 directly.
105 UTF-8 and UTF-16BE (useful when processing data from a big endian platform on an
107 collation APIs has a performance impact. It should be used in situations when it
108 is not desirable to convert whole strings before the operation - such as when
115 describe each ordering as a set of rules for calculating numeric values for each
133 ------ | -------- | -----------
148 One possible encoding of a Collation Element is a 32-bit value consisting of
149 a 16-bit primary weight, a 8-bit secondary weight,
150 2 case bits, and a 6-bit tertiary weight.
158 ------- | --------------------------
165 In this example, the letter "a" has a 16-bit primary weight of 1900 (hex), an
166 8-bit secondary weight of 05 (hex), and a combined 8-bit case-tertiary weight of
192 An ICU sort key is a pre-processed sequence of bytes generated from a Unicode
202 <!-- TODO: (diagram was missing in Google Sites already)
203 The diagram below represents an uncompressed sort key in ICU for ease of understanding. -->
219 If one is to assume the worst case and use too-big buffers, a lot of space will
220 be wasted. However, if you use too-small buffers, you will lose performance if
226 [Collation Example](examples#using-large-buffers-to-manage-sort-keys)
229 Here are some rules of a thumb, please do not rely on them. If you are looking
266 Having sort keys for strings allows for easy creation of bounds - sort keys that
270 Two kinds of upper bounds can be generated - the first one will match only
281 elements one at a time. It can be used to implement language-sensitive text
282 search algorithms like Boyer-Moore.
293 -------- | ---------------------------
301 -------- | ---------------------------
315 <http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>).
321 performance of the Collation Service. This section describes these
322 attributes and, where possible, their performance impact. Performance
328 method differently. To be precise in the discussion of performance, this section
377 --------------------- | --- | --- | ---
378 A-ring | Y | Y |
382 A-ring + grave | Y | |
385 A-ring + cedilla | | Y |
398 Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings),
414 The case-first attribute allows for emphasizing of the case property of the
415 letters by reordering the tertiary weights with either upper-first, and/or
416 lowercase-first. This difference gets the most significant bit in the weight.
426 The case-first attribute does not affect the performance substantially.
448 can be changed on run-time using the `UCOL_HIRAGANA_QUATERNARY_MODE` attribute.
455 non-ignorable. Special APIs are used for setting of variable top. It can
458 ## Performance section in Collation Service Architecture
461 are used to enhance the performance:
473 5. Using a single, shared copy of UCA in memory for the read-only default sort
474 order. Only small tailoring tables are kept in memory for locale-specific
479 7. Making the sort order be data-driven.
481 In general, the best performance from the Collation Service is expected by
486 multi-threading.)
491 Generating the sort keys of two strings is about 5-10
498 ### Performance and Storage Implications of Attributes argument
523 (It is the default only for Canadian French ("fr-CA").)
526 performance. The only noticeable one is normalization, which can cost 10%-40% in
527 performance.
530 it. Shifting can reduce the storage by about 10%-20%; case level + primary-only
534 10%-15%. (The Identical Level also increases the length, but this option is not
541 > The performance and storage may vary, depending on the particular computer,
555 1. The run-time executable
563 The version information of Collator is a 32-bit integer. If a new version of ICU