1 /* 2 ******************************************************************************* 3 * Copyright (C) 1996-2016, International Business Machines Corporation and * 4 * others. All Rights Reserved. * 5 ******************************************************************************* 6 */ 7 package com.ibm.icu.text; 8 9 /** 10 * Interface that defines an API for forward-only iteration 11 * on text objects. 12 * This is a minimal interface for iteration without random access 13 * or backwards iteration. It is especially useful for wrapping 14 * streams with converters into an object for collation or 15 * normalization. 16 * 17 * <p>Characters can be accessed in two ways: as code units or as 18 * code points. 19 * Unicode code points are 21-bit integers and are the scalar values 20 * of Unicode characters. ICU uses the type <code>int</code> for them. 21 * Unicode code units are the storage units of a given 22 * Unicode/UCS Transformation Format (a character encoding scheme). 23 * With UTF-16, all code points can be represented with either one 24 * or two code units ("surrogates"). 25 * String storage is typically based on code units, while properties 26 * of characters are typically determined using code point values. 27 * Some processes may be designed to work with sequences of code units, 28 * or it may be known that all characters that are important to an 29 * algorithm can be represented with single code units. 30 * Other processes will need to use the code point access functions. 31 * 32 * <p>ForwardCharacterIterator provides next() to access 33 * a code unit and advance an internal position into the text object, 34 * similar to a <code>return text[position++]</code>.<br> 35 * It provides nextCodePoint() to access a code point and advance an internal 36 * position. 37 * 38 * <p>nextCodePoint() assumes that the current position is that of 39 * the beginning of a code point, i.e., of its first code unit. 40 * After nextCodePoint(), this will be true again. 41 * In general, access to code units and code points in the same 42 * iteration loop should not be mixed. In UTF-16, if the current position 43 * is on a second code unit (Low Surrogate), then only that code unit 44 * is returned even by nextCodePoint(). 45 * 46 * Usage: 47 * <code> 48 * public void function1(UForwardCharacterIterator it) { 49 * int c; 50 * while((c=it.next())!=UForwardCharacterIterator.DONE) { 51 * // use c 52 * } 53 * } 54 * </code> 55 * @stable ICU 2.4 56 * 57 */ 58 59 public interface UForwardCharacterIterator { 60 61 /** 62 * Indicator that we have reached the ends of the UTF16 text. 63 * @stable ICU 2.4 64 */ 65 public static final int DONE = -1; 66 /** 67 * Returns the UTF16 code unit at index, and increments to the next 68 * code unit (post-increment semantics). If index is out of 69 * range, DONE is returned, and the iterator is reset to the limit 70 * of the text. 71 * @return the next UTF16 code unit, or DONE if the index is at the limit 72 * of the text. 73 * @stable ICU 2.4 74 */ next()75 public int next(); 76 77 /** 78 * Returns the code point at index, and increments to the next code 79 * point (post-increment semantics). If index does not point to a 80 * valid surrogate pair, the behavior is the same as 81 * <code>next()</code>. Otherwise the iterator is incremented past 82 * the surrogate pair, and the code point represented by the pair 83 * is returned. 84 * @return the next codepoint in text, or DONE if the index is at 85 * the limit of the text. 86 * @stable ICU 2.4 87 */ nextCodePoint()88 public int nextCodePoint(); 89 90 } 91