utext.md - OpenGrok cross reference for /third_party/icu/docs/userguide/strings/utext.md

Lines Matching +full:structured +full:- +full:clone
1 ---
6 ---
7 <!--
10 -->
23 1.  UTF-8 (`char*`) strings
24 2.  UTF-16 (`UChar*` or `UnicodeString`) strings
34 1.  UTF-32 format.
35 2.  Text that is stored in discontiguous chunks in memory, or in application-specific representatio…
36 3.  Text that is in a non-Unicode code page
47     format that is already supported by UText (such as UTF-8). The application
71 accessing characters in a text-storage object. This class has methods for
79 1.  UText can conveniently operate on text stored in formats other than UTF-16.
94 ## Example: Counting the Words in a UTF-8 String
97 of words in a nul-terminated UTF-8 string. The use of UText only adds two lines
98 of code over what a similar function operating on normal UTF-16 strings would
112     ut = utext_openUTF8(ut, utf8String, -1, &status);
132 [utext.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utext.h)
142 …--------------------------------|-----------------------------------------------------------------…
143 …a UText over a standard ICU (`UChar *`) string. The string consists of a UTF-16 array in memory, e…
145 | `Utext_openConstUnicodeString` | Open a UText over a read-only `UnicodeString`. Disallows UText A…
147 | `utext_openUTF8` | Open a UText over a UTF-8 encoded C string. May be either Nul terminated or ha…
168 Here is code for stack-allocating a UText:
176 is non-null, the supplied UText will be used; if it is null, a new UText will be
200 Here is an example of a function that iterates over an array of UTF-8 strings,
216         ut = utext_openUTF8(ut, strings[i], -1, &status);
238     Unicode values are always returned. UTF-16 surrogate values from a surrogate
239     pair, like bytes from a UTF-8 sequence, are not separately visible.
241     storage, in whatever form it has. If the underlying storage is UTF-8, the
242     indexes will be UTF-8 byte indexes, not UTF-16 offsets.
246     through the N<sup>th</sup> positions of a multi byte or multi-code-unit character, the
251 5.  Iteration uses post-increment and pre-decrement conventions. That is,
263 |-------------------------|------------------------------------------------------------------------…
264 …ength of the text string in terms of the underlying native storage – bytes for UTF-8, for example |
275 | `utext_extract` | Retrieve a range of text, placing it into a UTF-16 buffer. |
284 |---------------------|----------------------------------------------------------------------------…
299     detected by the implementation. The application code must be structured to
304 UText instances may be cloned. The clone function,
317 A *shallow* clone creates a new UText that maintains its own iteration state,
318 but does not clone the underlying text itself.
320 A *deep* clone copies the underlying text in addition to the UText state. This
323 deep clone, so checking for error status returns from `utext_clone()` is
329 functions accessing the same non-const UText is not supported. If concurrent
332 modified, a shallow clone is sufficient.
347 1.  A pointer to a *Text Chunk*, which is a UTF-16 buffer containing a section
351     is UTF-16, the chunk description can refer directly to the original text
352     data. For non-UTF-16 sources, the chunk will refer to a side buffer
353     containing some range of the text that has been converted to UTF-16 format.
354 2.  The iteration position, as a UTF-16 offset within the chunk.
368 |----------------------------|---------------------------------------------------------------------…
371 | `UTextClone` | Clone the UText. |
372 | `UTextExtract` | Extract a range of text into a caller-supplied buffer |
373 | `UTextReplace` | Replace a range of text with a caller-supplied replacement. May expand or shrink…
375 | `UTextMapOffsetToNative` | Within the current text chunk, translate a UTF-16 buffer offset to an …
376 | `UTextMapNativeIndexToUTF16` | Translate an absolute native index to a UTF-16 buffer offset withi…
380 read-only, no implementation for Replace or Copy is required. If the text is in
381 UTF-16 format, no implementation of the native to UTF-16 index conversions is
386 [utext.h](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/unicode/utext.h)
388 [utext.cpp](https://github.com/unicode-org/icu/blob/main/icu4c/source/common/utext.cpp).