1--- 2layout: default 3title: Collation FAQ 4nav_order: 5 5parent: Collation 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# Collation FAQ 13{: .no_toc } 14 15## Contents 16{: .no_toc .text-delta } 17 181. TOC 19{:toc} 20 21--- 22 23## Q. Should I turn Full Normalization on all the time? 24 25**A.** You can if you want, but you don't typically need to. The key is that 26normalization for most characters is already built into ICU's collation by 27default. Everything that can be done without affecting performance is already 28there, and will work with most languages. So the normalization parameter in ICU 29really only changes whether full normalization is invoked. 30 31The outlying cases are situations where a language uses multiple accents 32(non-spacing marks) on the same base letter, such as Vietnamese or Arabic. In 33those cases, full normalization needs to be turned on. If you use the right 34locale (or language) when creating a collation in ICU, then full normalization 35will be turned on or off according to what the language typically requires. 36 37## Q. Are there any cases where I would want to override the Full Normalization setting? 38 39**A.** The only case where you really need to worry about that parameter is for 40very unusual cases, such as sorting an list containing of names according to 41English conventions, but where the list contains, for example, some Vietnamese 42names. One way to check for such a situation is to open a collator for each of 43the languages you expect to find, and see if any of them have the full 44normalization flags set. 45 46## Q. How can collation rules mimic word sorting? 47 48Word sort is a way of sorting where certain interpunction characters are 49completely ignored, while other are considered. An example of word sort below 50ignores hyphens and apostrophes: 51 52Word Sort | String Sort 53--------- | ----------- 54billet | bill's 55bills | billet 56bill's | bills 57cannot | can't 58cant | cannot 59can't | cant 60con | co-op 61coop | con 62co-op | coop 63 64This specific behavior can be mimicked using a tailoring that makes these 65characters completely ignorable. In this case, an appropriate rule would be 66`"&\\u0000 = '' = '-'"`. 67 68Please note that we don't think that such solution is correct, since different 69languages have different word elements. Instead one should use shifted mode for 70comparison. 71