• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1 /*
2  *******************************************************************************
3  * Copyright (C) 1996-2016, International Business Machines Corporation and    *
4  * others. All Rights Reserved.                                                *
5  *******************************************************************************
6  */
7 package com.ibm.icu.text;
8 
9 /**
10  * Interface that defines an API for forward-only iteration
11  * on text objects.
12  * This is a minimal interface for iteration without random access
13  * or backwards iteration. It is especially useful for wrapping
14  * streams with converters into an object for collation or
15  * normalization.
16  *
17  * <p>Characters can be accessed in two ways: as code units or as
18  * code points.
19  * Unicode code points are 21-bit integers and are the scalar values
20  * of Unicode characters. ICU uses the type <code>int</code> for them.
21  * Unicode code units are the storage units of a given
22  * Unicode/UCS Transformation Format (a character encoding scheme).
23  * With UTF-16, all code points can be represented with either one
24  * or two code units ("surrogates").
25  * String storage is typically based on code units, while properties
26  * of characters are typically determined using code point values.
27  * Some processes may be designed to work with sequences of code units,
28  * or it may be known that all characters that are important to an
29  * algorithm can be represented with single code units.
30  * Other processes will need to use the code point access functions.
31  *
32  * <p>ForwardCharacterIterator provides next() to access
33  * a code unit and advance an internal position into the text object,
34  * similar to a <code>return text[position++]</code>.<br>
35  * It provides nextCodePoint() to access a code point and advance an internal
36  * position.
37  *
38  * <p>nextCodePoint() assumes that the current position is that of
39  * the beginning of a code point, i.e., of its first code unit.
40  * After nextCodePoint(), this will be true again.
41  * In general, access to code units and code points in the same
42  * iteration loop should not be mixed. In UTF-16, if the current position
43  * is on a second code unit (Low Surrogate), then only that code unit
44  * is returned even by nextCodePoint().
45  *
46  * Usage:
47  * <code>
48  *  public void function1(UForwardCharacterIterator it) {
49  *     int c;
50  *     while((c=it.next())!=UForwardCharacterIterator.DONE) {
51  *         // use c
52  *      }
53  *  }
54  * </code>
55  * @stable ICU 2.4
56  *
57  */
58 
59 public interface UForwardCharacterIterator {
60 
61     /**
62      * Indicator that we have reached the ends of the UTF16 text.
63      * @stable ICU 2.4
64      */
65     public static final int DONE = -1;
66     /**
67      * Returns the UTF16 code unit at index, and increments to the next
68      * code unit (post-increment semantics).  If index is out of
69      * range, DONE is returned, and the iterator is reset to the limit
70      * of the text.
71      * @return the next UTF16 code unit, or DONE if the index is at the limit
72      *         of the text.
73      * @stable ICU 2.4
74      */
next()75     public int next();
76 
77     /**
78      * Returns the code point at index, and increments to the next code
79      * point (post-increment semantics).  If index does not point to a
80      * valid surrogate pair, the behavior is the same as
81      * <code>next()</code>.  Otherwise the iterator is incremented past
82      * the surrogate pair, and the code point represented by the pair
83      * is returned.
84      * @return the next codepoint in text, or DONE if the index is at
85      *         the limit of the text.
86      * @stable ICU 2.4
87      */
nextCodePoint()88     public int nextCodePoint();
89 
90 }
91