• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1---
2layout: default
3title: API Details
4nav_order: 6
5parent: Collation
6---
7<!--
8© 2020 and later: Unicode, Inc. and others.
9License & terms of use: http://www.unicode.org/copyright.html
10-->
11
12# Collation API Details
13{: .no_toc }
14
15## Contents
16{: .no_toc .text-delta }
17
181. TOC
19{:toc}
20
21---
22
23## Overview
24
25This section describes some of the usage conventions for the ICU Collation
26Service API.
27
28## Collator Instantiation
29
30To use the Collation Service, you must instantiate a `Collator`. The
31Collator defines the properties and behavior of the sort ordering. The Collator
32can be repeatedly referenced until all collation activities have been performed.
33The Collator can then be closed and removed.
34
35### Instantiating the Predefined Collators
36
37ICU comes with a large set of already predefined collators that are suited for
38specific locales. Most of the ICU locales have a predefined collator. In the worst
39case, the CLDR default set of rules,
40which is mostly equivalent to the UCA default ordering (DUCET), is used.
41The default sort order itself is designed to work well for many languages.
42(For example, there are no tailorings for the standard sort orders for
43English, German, French, etc.)
44
45To instantiate a predefined collator, use the APIs `ucol_open`, `createInstance` and
46`getInstance` for C, C++ and Java codes respectively. The C API takes a locale ID
47(or language tag) string argument, C++ takes a Locale object, and Java takes a
48Locale or ULocale.
49
50For some languages, multiple collation types are available; for example,
51"de-u-co-phonebk" / "de@collation=phonebook". They can be enumerated via
52`Collator::getKeywordValuesForLocale()`. See also the list of available collation
53tailorings in the online [ICU Collation
54Demo](http://demo.icu-project.org/icu-bin/collation.html).
55
56Starting with ICU 54, collation attributes can be specified via locale keywords
57as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in
58language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive.
59
60See the [LDML Collation spec, Collation
61Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings),
62and the [data
63file](https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml) listing
64the valid collation keywords and their values. (The deprecated attributes
65kh/colHiraganaQuaternary and vt/variableTop are not supported.)
66
67For the [old locale extension
68syntax](http://www.unicode.org/reports/tr35/tr35.html#Old_Locale_Extension_Syntax),
69the data file's alias names are used (first alias, if defined, otherwise the
70name): "de@collation=phonebook;colCaseLevel=yes;kv=space"
71
72For the language tag syntax, the non-alias names are used, and "true" values can
73be omitted: "de-u-co-phonebk-kc-kv-space"
74
75This example demonstrates the instantiation of a collator.
76
77**C:**
78
79```c
80UErrorCode status = U_ZERO_ERROR;
81UCollator *coll = ucol_open("en_US", &status);
82if(U_SUCCESS(status)) {
83    /* close the collator*/
84    ucol_close(coll);
85}
86```
87
88**C++:**
89
90```c++
91UErrorCode status = U_ZERO_ERROR;
92Collator *coll = Collator::createInstance(Locale("en", "US"), status);
93if(U_SUCCESS(status)) {
94    //close the collator
95    delete coll;
96}
97```
98
99**Java:**
100
101```java
102Collator col = null;
103try {
104    col = Collator.getInstance(Locale.US);
105} catch (Exception e) {
106    System.err.println("English collation creation failed.");
107    e.printStackTrace();
108}
109```
110
111### Instantiating Collators Using Custom Rules
112
113If the ICU predefined collators are not appropriate for your intended usage, you
114can define your own set of rules and instantiate a collator that uses them. For more
115details, please see [the section on collation customization](customization/index).
116
117This example demonstrates the instantiation of a collator.
118
119**C:**
120
121```c
122UErrorCode status = U_ZERO_ERROR;
123U_STRING_DECL(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52);
124UCollator *coll;
125
126U_STRING_INIT(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52);
127coll = ucol_openRules(rules, -1, UCOL_ON, UCOL_DEFAULT_STRENGTH, NULL, &status);
128if(U_SUCCESS(status)) {
129    /* close the collator*/
130    ucol_close(coll);
131}
132```
133
134**C++:**
135
136```c++
137UErrorCode status = U_ZERO_ERROR;
138UnicodeString rules(u"&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E");
139Collator *coll = new RuleBasedCollator(rules, status);
140if(U_SUCCESS(status)) {
141    //close the collator
142    delete coll;
143}
144```
145
146**Java:**
147
148```java
149RuleBasedCollator coll = null;
150String ruleset = "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E";
151try {
152    coll = new RuleBasedCollator(ruleset);
153} catch (Exception e) {
154    System.err.println("Customized collation creation failed.");
155    e.printStackTrace();
156}
157```
158
159## Compare
160
161Two of the most used functions in ICU collation API, `ucol_strcoll` and `ucol_getSortKey`, have their counterparts in both Win32 and ANSI APIs:
162
163ICU C             | ICU C++                     | ICU Java                   | ANSI/POSIX | WIN32
164----------------- | --------------------------- | -------------------------- | ---------- | -----
165`ucol_strcoll`    | `Collator::compare`         | `Collator.compare`         | `strcoll`  | `CompareString`
166`ucol_getSortKey` | `Collator::getSortKey`      | `Collator.getCollationKey` | `strxfrm`  | `LCMapString`
167&nbsp;            | `Collator::getCollationKey` | &nbsp;                     | &nbsp;     |
168
169For more sophisticated usage, such as user-controlled language-sensitive text
170searching, an iterating interface to collation is provided. Please refer to the
171section below on `CollationElementIterator` for more details.
172
173The `ucol_compare` function compares one pair of strings at a time. Comparing two
174strings is much faster than calculating sort keys for both of them. However, if
175comparisons should be done repeatedly on a very large number of strings, generating
176and storing sort keys can improve performance. In all other cases (such as quick
177sort or bubble sort of a
178moderately-sized list of strings), comparing strings works very well.
179
180The C API used for comparing two strings is `ucol_strcoll`. It requires two
181`UChar *` strings and their lengths as parameters, as well as a pointer to a valid
182`UCollator` instance. The result is a `UCollationResult` constant, which can be one
183of `UCOL_LESS`, `UCOL_EQUAL` or `UCOL_GREATER`.
184
185The C++ API offers the method `Collator::compare` with several overloads.
186Acceptable input arguments are `UChar *` with length of strings, or `UnicodeString`
187instances. The result is a member of the `UCollationResult` or `EComparisonResult` enums.
188
189The Java API provides the method `Collator.compare` with one overload. Acceptable
190input arguments are Strings or Objects. The result is an int value, which is
191less than zero if source is less than target, zero if source and target are
192equal, or greater than zero if source is greater than target.
193
194There are also several convenience functions and methods returning a boolean
195value, such as `ucol_greater`, `ucol_greaterOrEqual`, `ucol_equal` (in C)
196`Collator::greater`, `Collator::greaterOrEqual`, `Collator::equal` (in C++) and
197`Collator.equals` (in Java).
198
199### Examples
200
201**C:**
202
203```c
204UChar *s [] = { /* list of Unicode strings */ };
205uint32_t listSize = sizeof(s)/sizeof(s[0]);
206UErrorCode status = U_ZERO_ERROR;
207UCollator *coll = ucol_open("en_US", &status);
208uint32_t i, j;
209if(U_SUCCESS(status)) {
210  for(i=listSize-1; i>=1; i--) {
211    for(j=0; j<i; j++) {
212      if(ucol_strcoll(s[j], -1, s[j+1], -1) == UCOL_LESS) {
213        swap(s[j], s[j+1]);
214     }
215   }
216}
217ucol_close(coll);
218}
219```
220
221**C++:**
222
223```c++
224UnicodeString s [] = { /* list of Unicode strings */ };
225uint32_t listSize = sizeof(s)/sizeof(s[0]);
226UErrorCode status = U_ZERO_ERROR;
227Collator *coll = Collator::createInstance(Locale("en", "US"), status);
228uint32_t i, j;
229if(U_SUCCESS(status)) {
230  for(i=listSize-1; i>=1; i--) {
231    for(j=0; j<i; j++) {
232      if(coll->compare(s[j], s[j+1]) == UCOL_LESS) {
233        swap(s[j], s[j+1]);
234     }
235   }
236}
237delete coll;
238}
239```
240
241**Java:**
242
243```java
244String s [] = { /* list of Unicode strings */ };
245try {
246    Collator coll = Collator.getInstance(Locale.US);
247    for (int i = s.length - 1; i > = 1; i --) {
248        for (j=0; j<i; j++) {
249            if (coll.compare(s[j], s[j+1]) == -1) {
250                swap(s[j], s[j+1]);
251            }
252        }
253    }
254} catch (Exception e) {
255    System.err.println("English collation creation failed.");
256    e.printStackTrace();
257}
258```
259
260## GetSortKey
261
262The C API provides the `ucol_getSortKey` function, which requires (apart from a
263pointer to a valid `UCollator` instance), an original `UChar` pointer, together with
264its length. It also requires a pointer to a receiving buffer and its length.
265
266The C++ API provides the `Collator::getSortKey` method with similar parameters as
267the C version. It also provides `Collator::getCollationKey`, which produces a
268`CollationKey` object instance (a wrapper around a sort key).
269
270The Java API provides only the `Collator.getCollationKey` method, which produces a
271`CollationKey` object instance (a wrapper around a sort key).
272
273Sort keys are generally only useful in databases or other circumstances where
274function calls are extremely expensive. See [Sortkeys vs
275Comparison](concepts#sortkeys-vs-comparison).
276
277### Sort Key Features
278
279ICU writes sort keys as sequences of bytes.
280
281Each sort key ends with one 00 byte and does not contain any other 00 byte. The
282terminating 00 byte is included in the length of the sort key as returned by the
283API (unlike any other ICU API where terminating NUL bytes or characters are not
284counted as part of the length).
285
286Sort key byte sequences must be compared with an unsigned-byte comparison, as
287with `strcmp()`.
288
289Comparing the sort keys of two strings from the same collator yields the same
290ordering as using the collator to compare the two strings directly. That is:
291`strcmp(coll.getSortKey(str1), coll.getSortKey(str2))` is equivalent to
292`coll.compare(str1, str2)`.
293
294Sort keys from different collators (different locale or strength or any other
295attributes/settings) are not comparable.
296
297Sort keys can be "merged" as described in [UTS #10 Merging Sort
298Keys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys), via
299`ucol_mergeSortkeys()` or Java `CollationKey.merge()`.
300
301*   Since CLDR 1.9/ICU 4.6, the same effect can be achieved by concatenating
302    strings with U+FFFE between them. The concatenation has the same sort order
303    as the merged sort keys.
304*   However, it is not guaranteed that the sort key of the concatenated strings
305    is the same as the merged result of the individual sort keys. (That is,
306    merge(getSortKey(str1), getSortKey(str2)) may differ from getSortKey(str1 +
307    '\\uFFFE' + str2).)
308*   In particular, a future version of ICU is likely to generate shorter sort
309    keys when concatenating strings with U+FFFE between them (by using
310    compression across the U+FFFE weights).
311*   *The recommended way to achieve "merged" sorting is via strings with
312    U+FFFE.*
313
314Any further analysis or parsing of sort keys is not supported.
315
316Sort keys will change from one ICU version to another; therefore, if sort keys
317are stored in a database or other persistent storage, then each upgrade requires
318their regeneration.
319
320*   The details of the underlying data change with every Unicode and CLDR
321    version.
322*   Sort keys are also subject to enhancements and bug fixes in the builder and
323    implementation code.
324*   On the other hand, the sort *order* is much more stable. It is subject to
325    deliberate changes to the default Unicode collation order, which is kept
326    quite stable, and subject to deliberate changes in CLDR data as new data is
327    added and feedback on existing data is taken into account.
328
329Implementation notes: (Not supported as permanent constraints on sort keys)
330
331Byte 02 was unique as a merge separator for some versions of ICU before version
332ICU 53. Since ICU 53, 02 is also used in regular collation weights where there
333is no conflict (to expand the number of available short weights).
334
335Byte 01 has been unique as a level separator. This is not strictly necessary for
336non-primary levels. (A level's compressible "common" weight as its level
337separator would yield shorter sort keys.) However, the current implementation of
338`ucol_mergeSortkeys()` relies on it. (Also, test code currently examines sort keys
339for finding the strength of a comparison difference.) This may change in the
340future, especially if `ucol_mergeSortkeys()` were to become deprecated.
341
342Level separators are likely to be equivalent to single-byte weights (possibly
343compressible): Multi-byte level separators would noticeably lengthen sort keys
344for short strings.
345
346The byte values used in several ICU versions for sort keys and collation
347elements are documented in the [“Special Byte Values” design
348doc](http://site.icu-project.org/design/collation/bytes) on the ICU site.
349
350### Sort Key Output Buffer
351
352`ucol_getSortKey()` can operate in 'preflighting' mode, which returns the amount
353of memory needed to store the resulting sort key. This mode is automatically
354activated if the output buffer size passed is set to zero. Should the sort key
355become longer than the buffer provided, function again slips into preflighting
356mode. The overall performance is poorer than if the function is called with a
357zero output buffer. If the size of the sort key returned is greater than the
358size of the buffer provided, the content of the result buffer is undefined. In
359that case, the result buffer could be reallocated to its proper size and the
360sort key generator function can be used again.
361
362The best way to generate a series of sort keys is to do the following:
363
3641.  Create a big temporary buffer on the stack. Typically, this buffer is
365    allocated only once, and reused with every sort key generated. There is no
366    need to keep it as small as possible. A recommended size for the temporary
367    buffer is four times the length of the longest string processed.
368
3692.  Start the loop. Call `ucol_getSortKey()` to find out how big the sort key
370    buffer should be, and fill in the temporary buffer at the same time.
371
3723.  If the temporary buffer is too small, allocate or reallocate more space.
373    Fill in the sort key values in the overflow buffer.
374
3754.  Allocate the sort key buffer with the size returned by `ucol_getSortKey()` and
376    call `memcpy` to copy the sort key content from the temp buffer to the sort
377    key buffer.
378
3795.  Loop back to step 1 until you are done.
380
3816.  Delete the overflow buffer if you created one.
382
383### Example
384
385```c
386void GetSortKeys(const Ucollator* coll, const UChar*
387const *source, uint32_t arrayLength)
388{
389  char[1000] buffer; // allocate stack buffer
390  char* currBuffer = buffer;
391  int32_t bufferLen = sizeof(buffer);
392  int32_t expectedLen = 0;
393  UErrorCode err = U_ZERO_ERROR;
394
395  for (int i = 0; i < arrayLength; ++i) {
396    expectedLen = ucol_getSortKey(coll, source[i], -1, currBuffer, bufferLen);
397    if (expectedLen > bufferLen) {
398      if (currBuffer == buffer) {
399        currBuffer = (char*)malloc(expectedLen);
400      } else {
401        currBuffer = (char*)realloc(currBuffer, expectedLen);
402      }
403    }
404    bufferLen = ucol_getSortKey(coll, source[i], -1, currBuffer, expectedLen);
405  }
406  processSortKey(i, currBuffer, bufferLen);
407
408
409  if (currBuffer != buffer && currBuffer != NULL) {
410    free(currBuffer);
411  }
412}
413```
414
415> :point_right: **Note** Although the API allows you to call
416> `ucol_getSortKey` with `NULL` to see what the
417> sort key length is, it is strongly recommended that you NOT determine the length
418> first, then allocate and fill the sort key buffer. If you do, it requires twice
419> the processing since computing the length has to do the same calculation as
420> actually getting the sort key. Instead, the example shown above uses a stack buffer.
421
422### Using Iterators for String Comparison
423
424ICU4C's `ucol_strcollIter` API allows for comparing two strings that are supplied
425as character iterators (`UCharIterator`). This is useful when you need to compare
426differently encoded strings using `strcoll`. In that case, converting the strings
427first would probably be wasteful, since `strcoll` usually gives the result
428before whole strings are processed. This API is implemented only as a C function
429in ICU4C. There are no equivalent C++ or ICU4J functions.
430
431```c
432...
433/* we are arriving with two char*: utf8Source and utf8Target, with their
434* lengths in utf8SourceLen and utf8TargetLen
435*/
436    UCharIterator sIter, tIter;
437    uiter_setUTF8(&sIter, utf8Source, utf8SourceLen);
438    uiter_setUTF8(&tIter, utf8Target, utf8TargetLen);
439    compareResultUTF8 = ucol_strcollIter(myCollation, &sIter, &tIter, &status);
440...
441```
442
443### Obtaining Partial Sort Keys
444
445When using different sort algorithms, such as radix sort, sometimes it is useful
446to process strings only as much as needed to feed into the sorting algorithm.
447For that purpose, ICU provides the `ucol_nextSortKeyPart` API, which also takes
448character iterators. This API allows for iterating over subsequent pieces of an
449uncompressed sort key. Between calls to the API you need to save a 64-bit state.
450Following is an example of simulating a string compare function using the partial
451sort key API. Your usage model is bound to look much different.
452
453```c
454static UCollationResult compareUsingPartials(UCollator *coll,
455                                             const UChar source[], int32_t sLen,
456                                             const UChar target[], int32_t tLen,
457                                             int32_t pieceSize, UErrorCode *status) {
458  int32_t partialSKResult = 0;
459  UCharIterator sIter, tIter;
460  uint32_t sState[2], tState[2];
461  int32_t sSize = pieceSize, tSize = pieceSize;
462  int32_t i = 0;
463  uint8_t sBuf[16384], tBuf[16384];
464  if(pieceSize > 16384) {
465    *status = U_BUFFER_OVERFLOW_ERROR;
466    return UCOL_EQUAL;
467  }
468  *status = U_ZERO_ERROR;
469  sState[0] = 0; sState[1] = 0;
470  tState[0] = 0; tState[1] = 0;
471  while(sSize == pieceSize && tSize == pieceSize && partialSKResult == 0) {
472    uiter_setString(&sIter, source, sLen);
473    uiter_setString(&tIter, target, tLen);
474    sSize = ucol_nextSortKeyPart(coll, &sIter, sState, sBuf, pieceSize, status);
475    tSize = ucol_nextSortKeyPart(coll, &tIter, tState, tBuf, pieceSize, status);
476    partialSKResult = memcmp(sBuf, tBuf, pieceSize);
477  }
478
479  if(partialSKResult < 0) {
480      return UCOL_LESS;
481  } else if(partialSKResult > 0) {
482    return UCOL_GREATER;
483  } else {
484    return UCOL_EQUAL;
485  }
486}
487```
488
489### Other Examples
490
491A longer example is presented in the 'Examples' section. Here is an illustration
492of the usage model.
493
494**C:**
495
496```c
497#define MAX_KEY_SIZE 100
498#define MAX_BUFFER_SIZE 10000
499#define MAX_LIST_LENGTH 5
500const char text[] = {
501   "Quick",
502   "fox",
503   "Moving",
504   "trucks",
505   "riddle"
506};
507const UChar s [5][20];
508int i;
509int32_t length, expectedLen;
510uint8_t temp[MAX_BUFFER _SIZE];
511
512
513uint8_t *temp2 = NULL;
514uint8_t keys [MAX_LIST_LENGTH][MAX_KEY_SIZE];
515UErrorCode status = U_ZERO_ERROR;
516
517temp2 = temp;
518
519length = MAX_BUFFER_SIZE;
520for( i = 0; i < 5; i++)
521{
522   u_uastrcpy(s[i], text[i]);
523}
524UCollator *coll = ucol_open("en_US",&status);
525uint32_t length;
526if(U_SUCCESS(status)) {
527  for(i=0; i<MAX_LIST_LENGTH; i++) {
528    expectedLen = ucol_getSortKey(coll, s[i], -1,temp2,length );
529    if (expectedLen > length) {
530      if (temp2 == temp) {
531        temp2 =(char*)malloc(expectedLen);
532      } else {
533        temp2 =(char*)realloc(temp2, expectedLen);
534      }
535        length =ucol_getSortKey(coll, s[i], -1, temp2, expectedLen);
536    }
537    memcpy(key[i], temp2, length);
538  }
539}
540qsort(keys, MAX_LIST_LENGTH,MAX_KEY_SIZE*sizeof(uint8_t), strcmp);
541for (i = 0; i < MAX_LIST_LENGTH; i++) {
542  free(key[i]);
543}
544ucol_close(coll);
545```
546
547**C++:**
548
549```c++
550#define MAX_LIST_LENGTH 5
551const UnicodeString s [] = {
552  "Quick",
553  "fox",
554  "Moving",
555  "trucks",
556  "riddle"
557};
558CollationKey *keys[MAX_LIST_LENGTH];
559UErrorCode status = U_ZERO_ERROR;
560Collator *coll = Collator::createInstance(Locale("en_US"), status);
561uint32_t i;
562if(U_SUCCESS(status)) {
563  for(i=0; i<listSize; i++) {
564    keys[i] = coll->getCollationKey(s[i], -1);
565  }
566  qsort(keys, MAX_LIST_LENGTH, sizeof(CollationKey),compareKeys);
567  delete[] keys;
568  delete coll;
569}
570```
571
572**Java:**
573
574```java
575String s [] = {
576  "Quick",
577  "fox",
578  "Moving",
579  "trucks",
580  "riddle"
581};
582CollationKey keys[] = new CollationKey[s.length];
583try {
584    Collator coll = Collator.getInstance(Locale.US);
585    for (int i = 0; i < s.length; i ++) {
586        keys[i] = coll.getCollationKey(s[i]);
587    }
588
589    Arrays.sort(keys);
590}
591catch (Exception e) {
592    System.err.println("Error creating English collator");
593    e.printStackTrace();
594}
595```
596
597## Collation ElementIterator
598
599A collation element iterator can only be used in one direction. This is
600established at the time of the first call to retrieve a collation element. Once
601`ucol_next` (C), `CollationElementIterator::next` (C++) or
602`CollationElementIterator.next` (Java) are invoked,
603`ucol_previous` (C),
604`CollationElementIterator::previous` (C++) or `CollationElementIterator.previous`
605(Java) should not be used (and vice versa). The direction can be changed
606immediately after `ucol_first`, `ucol_last`, `ucol_reset` (in C),
607`CollationElementIterator::first`, `CollationElementIterator::last`,
608`CollationElementIterator::reset` (in C++) or `CollationElementIterator.first`,
609`CollationElementIterator.last`, `CollationElementIterator.reset` (in Java) is
610called, or when it reaches the end of string while traversing the string.
611
612When `ucol_next` is called at the end of the string buffer, `UCOL_NULLORDER` is
613always returned with any subsequent calls to `ucol_next`. The same applies to
614`ucol_previous`.
615
616An example of how iterators are used is the Boyer-Moore search implementation,
617which can be found in the samples section.
618
619### API Example
620
621**C:**
622
623```c
624UCollator         *coll = ucol_open("en_US",status);
625UErrorCode         status = U_ZERO_ERROR;
626UChar              text[20];
627UCollationElements *collelemitr;
628uint32_t           collelem;
629
630u_uastrcpy(text, "text");
631collelemitr = ucol_openElements(coll, text, -1, &status);
632collelem = 0;
633do {
634  collelem = ucol_next(collelemitr, &status);
635} while (collelem != UCOL_NULLORDER);
636
637ucol_closeElements(collelemitr);
638ucol_close(coll);
639```
640
641**C++:**
642
643```c++
644UErrorCode    status = U_ZERO_ERROR;
645Collator      *coll = Collator::createInstance(Locale::getUS(), status);
646UnicodeString text("text");
647CollationElementIterator *collelemitr = coll->createCollationElementIterator(text);
648uint32_t      collelem = 0;
649do {
650  collelem = collelemitr->next(status);
651} while (collelem != CollationElementIterator::NULLORDER);
652
653delete collelemitr;
654delete coll;
655```
656
657**Java:**
658
659```java
660try {
661    RuleBasedCollator coll = (RuleBasedCollator)Collator.getInstance(Locale.US);
662    String text = "text";
663    CollationElementIterator collelemitr = coll.getCollationElementIterator(text);
664    int collelem = 0;
665    do {
666        collelem = collelemitr.next();
667    } while (collelem != CollationElementIterator.NULLORDER);
668} catch (Exception e) {
669    System.err.println("Error in collation iteration");
670    e.printStackTrace();
671}
672```
673
674## Setting and Getting Attributes
675
676The general attribute setting APIs are `ucol_setAttribute` (in C) and
677`Collator::setAttribute` (in C++). These APIs take an attribute name and an
678attribute value. If the name and the value pass a syntax and range check, the
679property of the collator is changed. If the name and value do not pass a syntax
680and range check, however, the state is not changed and the error code variable
681is set to an error condition. The Java version does not provide general
682attribute setting APIs; instead, each attribute has its own setter API of
683the form `RuleBasedCollator.setATTRIBUTE_NAME(arguments)`.
684
685The attribute getting APIs are `ucol_getAttribute` (C) and `Collator::getAttribute`
686(C++). Both APIs require an attribute name as an argument and return an
687attribute value if a valid attribute name was supplied. If a valid attribute
688name was not supplied, however, they return an undefined result and set the
689error code. Similarly to the setter APIs for the Java version, no generic getter
690API is provided. Each attribute has its own setter API of the form
691`RuleBasedCollator.getATTRIBUTE_NAME()` in the Java version.
692
693## References
694
6951.  Ken Whistler, Markus Scherer: "Unicode Technical Standard #10, Unicode Collation
696    Algorithm" (<http://www.unicode.org/reports/tr10/>)
697
6982.  ICU Design doc: "Collation v2" (<http://site.icu-project.org/design/collation/v2>)
699
7003.  Mark Davis: "ICU Collation Design Document"
701    (<https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/master/design/collation/ICU_collation_design.htm>)
702
7033.  The Unicode Standard, chapter 5, "Implementation guidelines"
704    (<http://www.unicode.org/uni2book/ch05.pdf>)
705
7064.  Laura Werner: "Efficient text searching in Java: Finding the right string in
707    any language"
708    (<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>)
709
7105.  Mark Davis, Martin Dürst: "Unicode Standard Annex #15: Unicode Normalization
711    Forms" (<http://www.unicode.org/reports/tr15/>).
712