1--- 2layout: default 3title: CharacterIterator 4nav_order: 3 5parent: Chars and Strings 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# CharacterIterator Class 13 14## Overview 15 16CharacterIterator is the abstract base class that defines a protocol for 17accessing characters in a text-storage object. This class has methods for 18iterating forward and backward over Unicode characters to return either the 19individual Unicode characters or their corresponding index values. 20 21Using CharacterIterator ICU iterates over text that is independent of its 22storage method. The text can be stored locally or remotely in a string, file, 23database, or other method. The CharacterIterator methods make the text appear as 24if it is local. 25 26The CharacterIterator keeps track of its current position and index in the text 27and can do the following 28 291. Move forward or backward one Unicode character at a time 30 312. Jump to a new location using absolute or relative positioning 32 333. Move to the beginning or end of its range 34 354. Return a character or the index to a character 36 37The information can be restricted to a sub-range of characters, can contain a 38large block of text that can be iterated as a whole, or can be broken into 39smaller blocks for the purpose of iteration. 40 41> :point_right: **Note**: *CharacterIterator is different from 42[Normalizer](../transforms/normalization/index) in that CharacterIterator 43walks through the Unicode characters without interpretation.* 44 45Prior to ICU release 1.6, the CharacterIterator class allowed access to a single 46UChar at a time and did not support variable-width encoding. Single UChar 47support makes it difficult when supplementary support is expected in UTF16 48encodings. Beginning with ICU release 1.6, the CharacterIterator class now 49efficiently supports UTF-16 encodings and provides new APIs for UTF32 return 50values. The API names for the UTF16 and UTF32 encodings differ because the UTF32 51APIs include "32" within their naming structure. For example, 52CharacterIterator::current() returns the code unit and Character::current32() 53returns a code point. 54 55## Base class inherited by CharacterIterator 56 57The class, 58[ForwardCharacterIterator,](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classForwardCharacterIterator.html) 59is a superclass of the CharacterIterator class. This superclass provides methods 60for forward iteration only for both UTF16 and UTF32 access, and is and based on 61a efficient forward iteration mechanism. In some situations, where you need to 62iterate over text that does not allow random-access, the 63ForwardCharacterIterator superclass is the most efficient method. For example, 64iterate a UChar string using a character converter with the [ucnv_getNextUChar() 65function.](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ucnv_8h.html) 66 67## Subclasses of CharacterIterator provided by ICU 68 69ICU provides the following concrete subclasses of the CharacterIteratorclass: 70 711. [UCharCharacterIterator](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classUCharCharacterIterator.html) 72 subclass iterates over a `UChar[]` array. 73 742. [StringCharacterIterator](https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/classStringCharacterIterator.html) 75 subclass extends from `UCharCharacterIterator` and iterates over the contents 76 of a `UnicodeString`. 77 78## Usage 79 80To use the methods specified in CharacterIterator class, do one of the 81following: 82 831. Make a subclass that inherits from the CharacterIterator class 84 852. Use the StringCharacterIterator subclass 86 873. Use the UCharCharacterIterator subclass 88 89CharacterIterator objects keep track of its current position within the text 90that is iterated over. The CharacterIterator class uses an object similar to a 91cursor that gets initialized to the beginning of the text and advances according 92to the operations that are used on the object. The current index can move 93between two positions (a start and a limit) that are set with the text. The 94limit position is one character greater than the position of the last UChar 95character that is used. 96 97### Forward iteration 98 99For efficiency, ICU can iterate over text using post-increment semantics or 100Forward Iteration. Forward Iteration is an access method that reads a character 101from the current index position and moves the index forward. It leaves the index 102behind the character it read and returns the character read. ICU can use 103nextPostInc() or next32PostInc() calls with hasNext() to perform Forward 104Iteration. These calls are the only character access methods provided by the 105ForwardCharacterIterator. An iteration loop can be started with the 106setToStart(), firstPostInc() or first32PostInc()calls . (The setToStart() call 107is implied after instantiating the iterator or setting the text.) 108 109The less efficient forward iteration mechanism that is available for 110compatibility with Java™ provides pre-increment semantics. With these methods, 111the current character is skipped, and then the following character is read and 112returned. This is a less efficient method for a variable-width encoding because 113the width of each character is determined twice; once to read it and once to 114skip it the next time ICU calls the method. The methods used for Forward 115Iteration are the next() or next32() calls. An iteration loop must start with 116first() or first32() calls to get the first character. 117 118### Backward iteration 119 120Backward Iteration has pre-decrement semantics, which are the exact opposite of 121the post-increment Forward Iteration. The current index reads the character that 122precedes the index, the character is returned, and the index is left at the 123beginning of this character. The methods used for Backward Iteration are the 124previous() or previous32() calls with the hasPrevious() call . An iteration loop 125can be started with setToEnd(), last(), or last32() calls. 126 127### Direct index manipulation 128 129The index can be set and moved directly without iteration to start iterating at 130an arbitrary position, skip some characters, or reset the index to an earlier 131position. It is possible to set the index to one after the last text code unit 132for backward iteration. 133 134The setIndex() and setIndex32() calls set the index to a new position and return 135the character at that new position. The setIndex32() call ensures that the new 136position is at the beginning of the character (on its first code unit). Since 137the character at the new position is returned, these functions can be used for 138both pre-increment and post-increment iteration semantics. 139Similarly, the current() and current32() calls return the character at the 140current index without modifying the index. The current32() call retrieves the 141complete character whether the index is on the first code unit or not. 142 143The index and the iteration boundaries can be retrieved using separate 144functions. The following syntax is used by ICU: startIndex() <= getIndex() <= 145endIndex(). 146 147Without accessing the text, the setToStart() and setToEnd() calls set the index 148to the start or to the end of the text. Therefore, these calls are efficient in 149starting a forward (post-increment) or backward iteration. 150 151The most general functions for manipulating the index position are the move() 152and move32() calls. These calls allow you to move the index forward or backward 153relative to its current position, start the index, or move to the end of the 154index. The move() and move32() calls do not access the text and are best used 155for skipping part of it. The move32() call skips complete code points like 156next32PostInc() call and other UChar32-access methods. 157 158### Access to the iteration text 159 160The CharacterIterator class provides the following access methods for the entire 161text under iteration: 162 1631. getText() sets a UnicodeString with the text 164 1652. getLength() returns just the length of the text. 166 167This text (and the length) may include more than the actual iteration area 168because the start and end indexes may not be the start and end of the entire 169text. The text and the iteration range are set in the implementing subclasses. 170 171## Additional Sample Code 172 173C/C++: See 174[icu4c/source/samples/citer/](https://github.com/unicode-org/icu/blob/master/icu4c/source/samples/citer/) 175in the ICU source distribution for code samples. 176