1--- 2layout: default 3title: API Details 4nav_order: 6 5parent: Collation 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# Collation API Details 13{: .no_toc } 14 15## Contents 16{: .no_toc .text-delta } 17 181. TOC 19{:toc} 20 21--- 22 23## Overview 24 25This section describes some of the usage conventions for the ICU Collation 26Service API. 27 28## Collator Instantiation 29 30To use the Collation Service, you must instantiate a `Collator`. The 31Collator defines the properties and behavior of the sort ordering. The Collator 32can be repeatedly referenced until all collation activities have been performed. 33The Collator can then be closed and removed. 34 35### Instantiating the Predefined Collators 36 37ICU comes with a large set of already predefined collators that are suited for 38specific locales. Most of the ICU locales have a predefined collator. In the worst 39case, the CLDR default set of rules, 40which is mostly equivalent to the UCA default ordering (DUCET), is used. 41The default sort order itself is designed to work well for many languages. 42(For example, there are no tailorings for the standard sort orders for 43English, German, French, etc.) 44 45To instantiate a predefined collator, use the APIs `ucol_open`, `createInstance` and 46`getInstance` for C, C++ and Java codes respectively. The C API takes a locale ID 47(or language tag) string argument, C++ takes a Locale object, and Java takes a 48Locale or ULocale. 49 50For some languages, multiple collation types are available; for example, 51"de-u-co-phonebk" / "de@collation=phonebook". They can be enumerated via 52`Collator::getKeywordValuesForLocale()`. See also the list of available collation 53tailorings in the online [ICU Collation 54Demo](http://demo.icu-project.org/icu-bin/collation.html). 55 56Starting with ICU 54, collation attributes can be specified via locale keywords 57as well, in the old locale extension syntax ("el@colCaseFirst=upper") or in 58language tag syntax ("el-u-kf-upper"). Keywords and values are case-insensitive. 59 60See the [LDML Collation spec, Collation 61Settings](http://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Settings), 62and the [data 63file](https://github.com/unicode-org/cldr/blob/master/common/bcp47/collation.xml) listing 64the valid collation keywords and their values. (The deprecated attributes 65kh/colHiraganaQuaternary and vt/variableTop are not supported.) 66 67For the [old locale extension 68syntax](http://www.unicode.org/reports/tr35/tr35.html#Old_Locale_Extension_Syntax), 69the data file's alias names are used (first alias, if defined, otherwise the 70name): "de@collation=phonebook;colCaseLevel=yes;kv=space" 71 72For the language tag syntax, the non-alias names are used, and "true" values can 73be omitted: "de-u-co-phonebk-kc-kv-space" 74 75This example demonstrates the instantiation of a collator. 76 77**C:** 78 79```c 80UErrorCode status = U_ZERO_ERROR; 81UCollator *coll = ucol_open("en_US", &status); 82if(U_SUCCESS(status)) { 83 /* close the collator*/ 84 ucol_close(coll); 85} 86``` 87 88**C++:** 89 90```c++ 91UErrorCode status = U_ZERO_ERROR; 92Collator *coll = Collator::createInstance(Locale("en", "US"), status); 93if(U_SUCCESS(status)) { 94 //close the collator 95 delete coll; 96} 97``` 98 99**Java:** 100 101```java 102Collator col = null; 103try { 104 col = Collator.getInstance(Locale.US); 105} catch (Exception e) { 106 System.err.println("English collation creation failed."); 107 e.printStackTrace(); 108} 109``` 110 111### Instantiating Collators Using Custom Rules 112 113If the ICU predefined collators are not appropriate for your intended usage, you 114can define your own set of rules and instantiate a collator that uses them. For more 115details, please see [the section on collation customization](customization/index). 116 117This example demonstrates the instantiation of a collator. 118 119**C:** 120 121```c 122UErrorCode status = U_ZERO_ERROR; 123U_STRING_DECL(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52); 124UCollator *coll; 125 126U_STRING_INIT(rules, "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E", 52); 127coll = ucol_openRules(rules, -1, UCOL_ON, UCOL_DEFAULT_STRENGTH, NULL, &status); 128if(U_SUCCESS(status)) { 129 /* close the collator*/ 130 ucol_close(coll); 131} 132``` 133 134**C++:** 135 136```c++ 137UErrorCode status = U_ZERO_ERROR; 138UnicodeString rules(u"&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E"); 139Collator *coll = new RuleBasedCollator(rules, status); 140if(U_SUCCESS(status)) { 141 //close the collator 142 delete coll; 143} 144``` 145 146**Java:** 147 148```java 149RuleBasedCollator coll = null; 150String ruleset = "&9 < a, A < b, B < c, C; ch, cH, Ch, CH < d, D, e, E"; 151try { 152 coll = new RuleBasedCollator(ruleset); 153} catch (Exception e) { 154 System.err.println("Customized collation creation failed."); 155 e.printStackTrace(); 156} 157``` 158 159## Compare 160 161Two of the most used functions in ICU collation API, `ucol_strcoll` and `ucol_getSortKey`, have their counterparts in both Win32 and ANSI APIs: 162 163ICU C | ICU C++ | ICU Java | ANSI/POSIX | WIN32 164----------------- | --------------------------- | -------------------------- | ---------- | ----- 165`ucol_strcoll` | `Collator::compare` | `Collator.compare` | `strcoll` | `CompareString` 166`ucol_getSortKey` | `Collator::getSortKey` | `Collator.getCollationKey` | `strxfrm` | `LCMapString` 167 | `Collator::getCollationKey` | | | 168 169For more sophisticated usage, such as user-controlled language-sensitive text 170searching, an iterating interface to collation is provided. Please refer to the 171section below on `CollationElementIterator` for more details. 172 173The `ucol_compare` function compares one pair of strings at a time. Comparing two 174strings is much faster than calculating sort keys for both of them. However, if 175comparisons should be done repeatedly on a very large number of strings, generating 176and storing sort keys can improve performance. In all other cases (such as quick 177sort or bubble sort of a 178moderately-sized list of strings), comparing strings works very well. 179 180The C API used for comparing two strings is `ucol_strcoll`. It requires two 181`UChar *` strings and their lengths as parameters, as well as a pointer to a valid 182`UCollator` instance. The result is a `UCollationResult` constant, which can be one 183of `UCOL_LESS`, `UCOL_EQUAL` or `UCOL_GREATER`. 184 185The C++ API offers the method `Collator::compare` with several overloads. 186Acceptable input arguments are `UChar *` with length of strings, or `UnicodeString` 187instances. The result is a member of the `UCollationResult` or `EComparisonResult` enums. 188 189The Java API provides the method `Collator.compare` with one overload. Acceptable 190input arguments are Strings or Objects. The result is an int value, which is 191less than zero if source is less than target, zero if source and target are 192equal, or greater than zero if source is greater than target. 193 194There are also several convenience functions and methods returning a boolean 195value, such as `ucol_greater`, `ucol_greaterOrEqual`, `ucol_equal` (in C) 196`Collator::greater`, `Collator::greaterOrEqual`, `Collator::equal` (in C++) and 197`Collator.equals` (in Java). 198 199### Examples 200 201**C:** 202 203```c 204UChar *s [] = { /* list of Unicode strings */ }; 205uint32_t listSize = sizeof(s)/sizeof(s[0]); 206UErrorCode status = U_ZERO_ERROR; 207UCollator *coll = ucol_open("en_US", &status); 208uint32_t i, j; 209if(U_SUCCESS(status)) { 210 for(i=listSize-1; i>=1; i--) { 211 for(j=0; j<i; j++) { 212 if(ucol_strcoll(s[j], -1, s[j+1], -1) == UCOL_LESS) { 213 swap(s[j], s[j+1]); 214 } 215 } 216} 217ucol_close(coll); 218} 219``` 220 221**C++:** 222 223```c++ 224UnicodeString s [] = { /* list of Unicode strings */ }; 225uint32_t listSize = sizeof(s)/sizeof(s[0]); 226UErrorCode status = U_ZERO_ERROR; 227Collator *coll = Collator::createInstance(Locale("en", "US"), status); 228uint32_t i, j; 229if(U_SUCCESS(status)) { 230 for(i=listSize-1; i>=1; i--) { 231 for(j=0; j<i; j++) { 232 if(coll->compare(s[j], s[j+1]) == UCOL_LESS) { 233 swap(s[j], s[j+1]); 234 } 235 } 236} 237delete coll; 238} 239``` 240 241**Java:** 242 243```java 244String s [] = { /* list of Unicode strings */ }; 245try { 246 Collator coll = Collator.getInstance(Locale.US); 247 for (int i = s.length - 1; i > = 1; i --) { 248 for (j=0; j<i; j++) { 249 if (coll.compare(s[j], s[j+1]) == -1) { 250 swap(s[j], s[j+1]); 251 } 252 } 253 } 254} catch (Exception e) { 255 System.err.println("English collation creation failed."); 256 e.printStackTrace(); 257} 258``` 259 260## GetSortKey 261 262The C API provides the `ucol_getSortKey` function, which requires (apart from a 263pointer to a valid `UCollator` instance), an original `UChar` pointer, together with 264its length. It also requires a pointer to a receiving buffer and its length. 265 266The C++ API provides the `Collator::getSortKey` method with similar parameters as 267the C version. It also provides `Collator::getCollationKey`, which produces a 268`CollationKey` object instance (a wrapper around a sort key). 269 270The Java API provides only the `Collator.getCollationKey` method, which produces a 271`CollationKey` object instance (a wrapper around a sort key). 272 273Sort keys are generally only useful in databases or other circumstances where 274function calls are extremely expensive. See [Sortkeys vs 275Comparison](concepts#sortkeys-vs-comparison). 276 277### Sort Key Features 278 279ICU writes sort keys as sequences of bytes. 280 281Each sort key ends with one 00 byte and does not contain any other 00 byte. The 282terminating 00 byte is included in the length of the sort key as returned by the 283API (unlike any other ICU API where terminating NUL bytes or characters are not 284counted as part of the length). 285 286Sort key byte sequences must be compared with an unsigned-byte comparison, as 287with `strcmp()`. 288 289Comparing the sort keys of two strings from the same collator yields the same 290ordering as using the collator to compare the two strings directly. That is: 291`strcmp(coll.getSortKey(str1), coll.getSortKey(str2))` is equivalent to 292`coll.compare(str1, str2)`. 293 294Sort keys from different collators (different locale or strength or any other 295attributes/settings) are not comparable. 296 297Sort keys can be "merged" as described in [UTS #10 Merging Sort 298Keys](http://www.unicode.org/reports/tr10/#Merging_Sort_Keys), via 299`ucol_mergeSortkeys()` or Java `CollationKey.merge()`. 300 301* Since CLDR 1.9/ICU 4.6, the same effect can be achieved by concatenating 302 strings with U+FFFE between them. The concatenation has the same sort order 303 as the merged sort keys. 304* However, it is not guaranteed that the sort key of the concatenated strings 305 is the same as the merged result of the individual sort keys. (That is, 306 merge(getSortKey(str1), getSortKey(str2)) may differ from getSortKey(str1 + 307 '\\uFFFE' + str2).) 308* In particular, a future version of ICU is likely to generate shorter sort 309 keys when concatenating strings with U+FFFE between them (by using 310 compression across the U+FFFE weights). 311* *The recommended way to achieve "merged" sorting is via strings with 312 U+FFFE.* 313 314Any further analysis or parsing of sort keys is not supported. 315 316Sort keys will change from one ICU version to another; therefore, if sort keys 317are stored in a database or other persistent storage, then each upgrade requires 318their regeneration. 319 320* The details of the underlying data change with every Unicode and CLDR 321 version. 322* Sort keys are also subject to enhancements and bug fixes in the builder and 323 implementation code. 324* On the other hand, the sort *order* is much more stable. It is subject to 325 deliberate changes to the default Unicode collation order, which is kept 326 quite stable, and subject to deliberate changes in CLDR data as new data is 327 added and feedback on existing data is taken into account. 328 329Implementation notes: (Not supported as permanent constraints on sort keys) 330 331Byte 02 was unique as a merge separator for some versions of ICU before version 332ICU 53. Since ICU 53, 02 is also used in regular collation weights where there 333is no conflict (to expand the number of available short weights). 334 335Byte 01 has been unique as a level separator. This is not strictly necessary for 336non-primary levels. (A level's compressible "common" weight as its level 337separator would yield shorter sort keys.) However, the current implementation of 338`ucol_mergeSortkeys()` relies on it. (Also, test code currently examines sort keys 339for finding the strength of a comparison difference.) This may change in the 340future, especially if `ucol_mergeSortkeys()` were to become deprecated. 341 342Level separators are likely to be equivalent to single-byte weights (possibly 343compressible): Multi-byte level separators would noticeably lengthen sort keys 344for short strings. 345 346The byte values used in several ICU versions for sort keys and collation 347elements are documented in the [“Special Byte Values” design 348doc](http://site.icu-project.org/design/collation/bytes) on the ICU site. 349 350### Sort Key Output Buffer 351 352`ucol_getSortKey()` can operate in 'preflighting' mode, which returns the amount 353of memory needed to store the resulting sort key. This mode is automatically 354activated if the output buffer size passed is set to zero. Should the sort key 355become longer than the buffer provided, function again slips into preflighting 356mode. The overall performance is poorer than if the function is called with a 357zero output buffer. If the size of the sort key returned is greater than the 358size of the buffer provided, the content of the result buffer is undefined. In 359that case, the result buffer could be reallocated to its proper size and the 360sort key generator function can be used again. 361 362The best way to generate a series of sort keys is to do the following: 363 3641. Create a big temporary buffer on the stack. Typically, this buffer is 365 allocated only once, and reused with every sort key generated. There is no 366 need to keep it as small as possible. A recommended size for the temporary 367 buffer is four times the length of the longest string processed. 368 3692. Start the loop. Call `ucol_getSortKey()` to find out how big the sort key 370 buffer should be, and fill in the temporary buffer at the same time. 371 3723. If the temporary buffer is too small, allocate or reallocate more space. 373 Fill in the sort key values in the overflow buffer. 374 3754. Allocate the sort key buffer with the size returned by `ucol_getSortKey()` and 376 call `memcpy` to copy the sort key content from the temp buffer to the sort 377 key buffer. 378 3795. Loop back to step 1 until you are done. 380 3816. Delete the overflow buffer if you created one. 382 383### Example 384 385```c 386void GetSortKeys(const Ucollator* coll, const UChar* 387const *source, uint32_t arrayLength) 388{ 389 char[1000] buffer; // allocate stack buffer 390 char* currBuffer = buffer; 391 int32_t bufferLen = sizeof(buffer); 392 int32_t expectedLen = 0; 393 UErrorCode err = U_ZERO_ERROR; 394 395 for (int i = 0; i < arrayLength; ++i) { 396 expectedLen = ucol_getSortKey(coll, source[i], -1, currBuffer, bufferLen); 397 if (expectedLen > bufferLen) { 398 if (currBuffer == buffer) { 399 currBuffer = (char*)malloc(expectedLen); 400 } else { 401 currBuffer = (char*)realloc(currBuffer, expectedLen); 402 } 403 } 404 bufferLen = ucol_getSortKey(coll, source[i], -1, currBuffer, expectedLen); 405 } 406 processSortKey(i, currBuffer, bufferLen); 407 408 409 if (currBuffer != buffer && currBuffer != NULL) { 410 free(currBuffer); 411 } 412} 413``` 414 415> :point_right: **Note** Although the API allows you to call 416> `ucol_getSortKey` with `NULL` to see what the 417> sort key length is, it is strongly recommended that you NOT determine the length 418> first, then allocate and fill the sort key buffer. If you do, it requires twice 419> the processing since computing the length has to do the same calculation as 420> actually getting the sort key. Instead, the example shown above uses a stack buffer. 421 422### Using Iterators for String Comparison 423 424ICU4C's `ucol_strcollIter` API allows for comparing two strings that are supplied 425as character iterators (`UCharIterator`). This is useful when you need to compare 426differently encoded strings using `strcoll`. In that case, converting the strings 427first would probably be wasteful, since `strcoll` usually gives the result 428before whole strings are processed. This API is implemented only as a C function 429in ICU4C. There are no equivalent C++ or ICU4J functions. 430 431```c 432... 433/* we are arriving with two char*: utf8Source and utf8Target, with their 434* lengths in utf8SourceLen and utf8TargetLen 435*/ 436 UCharIterator sIter, tIter; 437 uiter_setUTF8(&sIter, utf8Source, utf8SourceLen); 438 uiter_setUTF8(&tIter, utf8Target, utf8TargetLen); 439 compareResultUTF8 = ucol_strcollIter(myCollation, &sIter, &tIter, &status); 440... 441``` 442 443### Obtaining Partial Sort Keys 444 445When using different sort algorithms, such as radix sort, sometimes it is useful 446to process strings only as much as needed to feed into the sorting algorithm. 447For that purpose, ICU provides the `ucol_nextSortKeyPart` API, which also takes 448character iterators. This API allows for iterating over subsequent pieces of an 449uncompressed sort key. Between calls to the API you need to save a 64-bit state. 450Following is an example of simulating a string compare function using the partial 451sort key API. Your usage model is bound to look much different. 452 453```c 454static UCollationResult compareUsingPartials(UCollator *coll, 455 const UChar source[], int32_t sLen, 456 const UChar target[], int32_t tLen, 457 int32_t pieceSize, UErrorCode *status) { 458 int32_t partialSKResult = 0; 459 UCharIterator sIter, tIter; 460 uint32_t sState[2], tState[2]; 461 int32_t sSize = pieceSize, tSize = pieceSize; 462 int32_t i = 0; 463 uint8_t sBuf[16384], tBuf[16384]; 464 if(pieceSize > 16384) { 465 *status = U_BUFFER_OVERFLOW_ERROR; 466 return UCOL_EQUAL; 467 } 468 *status = U_ZERO_ERROR; 469 sState[0] = 0; sState[1] = 0; 470 tState[0] = 0; tState[1] = 0; 471 while(sSize == pieceSize && tSize == pieceSize && partialSKResult == 0) { 472 uiter_setString(&sIter, source, sLen); 473 uiter_setString(&tIter, target, tLen); 474 sSize = ucol_nextSortKeyPart(coll, &sIter, sState, sBuf, pieceSize, status); 475 tSize = ucol_nextSortKeyPart(coll, &tIter, tState, tBuf, pieceSize, status); 476 partialSKResult = memcmp(sBuf, tBuf, pieceSize); 477 } 478 479 if(partialSKResult < 0) { 480 return UCOL_LESS; 481 } else if(partialSKResult > 0) { 482 return UCOL_GREATER; 483 } else { 484 return UCOL_EQUAL; 485 } 486} 487``` 488 489### Other Examples 490 491A longer example is presented in the 'Examples' section. Here is an illustration 492of the usage model. 493 494**C:** 495 496```c 497#define MAX_KEY_SIZE 100 498#define MAX_BUFFER_SIZE 10000 499#define MAX_LIST_LENGTH 5 500const char text[] = { 501 "Quick", 502 "fox", 503 "Moving", 504 "trucks", 505 "riddle" 506}; 507const UChar s [5][20]; 508int i; 509int32_t length, expectedLen; 510uint8_t temp[MAX_BUFFER _SIZE]; 511 512 513uint8_t *temp2 = NULL; 514uint8_t keys [MAX_LIST_LENGTH][MAX_KEY_SIZE]; 515UErrorCode status = U_ZERO_ERROR; 516 517temp2 = temp; 518 519length = MAX_BUFFER_SIZE; 520for( i = 0; i < 5; i++) 521{ 522 u_uastrcpy(s[i], text[i]); 523} 524UCollator *coll = ucol_open("en_US",&status); 525uint32_t length; 526if(U_SUCCESS(status)) { 527 for(i=0; i<MAX_LIST_LENGTH; i++) { 528 expectedLen = ucol_getSortKey(coll, s[i], -1,temp2,length ); 529 if (expectedLen > length) { 530 if (temp2 == temp) { 531 temp2 =(char*)malloc(expectedLen); 532 } else { 533 temp2 =(char*)realloc(temp2, expectedLen); 534 } 535 length =ucol_getSortKey(coll, s[i], -1, temp2, expectedLen); 536 } 537 memcpy(key[i], temp2, length); 538 } 539} 540qsort(keys, MAX_LIST_LENGTH,MAX_KEY_SIZE*sizeof(uint8_t), strcmp); 541for (i = 0; i < MAX_LIST_LENGTH; i++) { 542 free(key[i]); 543} 544ucol_close(coll); 545``` 546 547**C++:** 548 549```c++ 550#define MAX_LIST_LENGTH 5 551const UnicodeString s [] = { 552 "Quick", 553 "fox", 554 "Moving", 555 "trucks", 556 "riddle" 557}; 558CollationKey *keys[MAX_LIST_LENGTH]; 559UErrorCode status = U_ZERO_ERROR; 560Collator *coll = Collator::createInstance(Locale("en_US"), status); 561uint32_t i; 562if(U_SUCCESS(status)) { 563 for(i=0; i<listSize; i++) { 564 keys[i] = coll->getCollationKey(s[i], -1); 565 } 566 qsort(keys, MAX_LIST_LENGTH, sizeof(CollationKey),compareKeys); 567 delete[] keys; 568 delete coll; 569} 570``` 571 572**Java:** 573 574```java 575String s [] = { 576 "Quick", 577 "fox", 578 "Moving", 579 "trucks", 580 "riddle" 581}; 582CollationKey keys[] = new CollationKey[s.length]; 583try { 584 Collator coll = Collator.getInstance(Locale.US); 585 for (int i = 0; i < s.length; i ++) { 586 keys[i] = coll.getCollationKey(s[i]); 587 } 588 589 Arrays.sort(keys); 590} 591catch (Exception e) { 592 System.err.println("Error creating English collator"); 593 e.printStackTrace(); 594} 595``` 596 597## Collation ElementIterator 598 599A collation element iterator can only be used in one direction. This is 600established at the time of the first call to retrieve a collation element. Once 601`ucol_next` (C), `CollationElementIterator::next` (C++) or 602`CollationElementIterator.next` (Java) are invoked, 603`ucol_previous` (C), 604`CollationElementIterator::previous` (C++) or `CollationElementIterator.previous` 605(Java) should not be used (and vice versa). The direction can be changed 606immediately after `ucol_first`, `ucol_last`, `ucol_reset` (in C), 607`CollationElementIterator::first`, `CollationElementIterator::last`, 608`CollationElementIterator::reset` (in C++) or `CollationElementIterator.first`, 609`CollationElementIterator.last`, `CollationElementIterator.reset` (in Java) is 610called, or when it reaches the end of string while traversing the string. 611 612When `ucol_next` is called at the end of the string buffer, `UCOL_NULLORDER` is 613always returned with any subsequent calls to `ucol_next`. The same applies to 614`ucol_previous`. 615 616An example of how iterators are used is the Boyer-Moore search implementation, 617which can be found in the samples section. 618 619### API Example 620 621**C:** 622 623```c 624UCollator *coll = ucol_open("en_US",status); 625UErrorCode status = U_ZERO_ERROR; 626UChar text[20]; 627UCollationElements *collelemitr; 628uint32_t collelem; 629 630u_uastrcpy(text, "text"); 631collelemitr = ucol_openElements(coll, text, -1, &status); 632collelem = 0; 633do { 634 collelem = ucol_next(collelemitr, &status); 635} while (collelem != UCOL_NULLORDER); 636 637ucol_closeElements(collelemitr); 638ucol_close(coll); 639``` 640 641**C++:** 642 643```c++ 644UErrorCode status = U_ZERO_ERROR; 645Collator *coll = Collator::createInstance(Locale::getUS(), status); 646UnicodeString text("text"); 647CollationElementIterator *collelemitr = coll->createCollationElementIterator(text); 648uint32_t collelem = 0; 649do { 650 collelem = collelemitr->next(status); 651} while (collelem != CollationElementIterator::NULLORDER); 652 653delete collelemitr; 654delete coll; 655``` 656 657**Java:** 658 659```java 660try { 661 RuleBasedCollator coll = (RuleBasedCollator)Collator.getInstance(Locale.US); 662 String text = "text"; 663 CollationElementIterator collelemitr = coll.getCollationElementIterator(text); 664 int collelem = 0; 665 do { 666 collelem = collelemitr.next(); 667 } while (collelem != CollationElementIterator.NULLORDER); 668} catch (Exception e) { 669 System.err.println("Error in collation iteration"); 670 e.printStackTrace(); 671} 672``` 673 674## Setting and Getting Attributes 675 676The general attribute setting APIs are `ucol_setAttribute` (in C) and 677`Collator::setAttribute` (in C++). These APIs take an attribute name and an 678attribute value. If the name and the value pass a syntax and range check, the 679property of the collator is changed. If the name and value do not pass a syntax 680and range check, however, the state is not changed and the error code variable 681is set to an error condition. The Java version does not provide general 682attribute setting APIs; instead, each attribute has its own setter API of 683the form `RuleBasedCollator.setATTRIBUTE_NAME(arguments)`. 684 685The attribute getting APIs are `ucol_getAttribute` (C) and `Collator::getAttribute` 686(C++). Both APIs require an attribute name as an argument and return an 687attribute value if a valid attribute name was supplied. If a valid attribute 688name was not supplied, however, they return an undefined result and set the 689error code. Similarly to the setter APIs for the Java version, no generic getter 690API is provided. Each attribute has its own setter API of the form 691`RuleBasedCollator.getATTRIBUTE_NAME()` in the Java version. 692 693## References 694 6951. Ken Whistler, Markus Scherer: "Unicode Technical Standard #10, Unicode Collation 696 Algorithm" (<http://www.unicode.org/reports/tr10/>) 697 6982. ICU Design doc: "Collation v2" (<http://site.icu-project.org/design/collation/v2>) 699 7003. Mark Davis: "ICU Collation Design Document" 701 (<https://htmlpreview.github.io/?https://github.com/unicode-org/icu-docs/blob/master/design/collation/ICU_collation_design.htm>) 702 7033. The Unicode Standard, chapter 5, "Implementation guidelines" 704 (<http://www.unicode.org/uni2book/ch05.pdf>) 705 7064. Laura Werner: "Efficient text searching in Java: Finding the right string in 707 any language" 708 (<http://icu-project.org/docs/papers/efficient_text_searching_in_java.html>) 709 7105. Mark Davis, Martin Dürst: "Unicode Standard Annex #15: Unicode Normalization 711 Forms" (<http://www.unicode.org/reports/tr15/>). 712