1--- 2layout: default 3title: Case Mappings 4nav_order: 1 5parent: Transforms 6--- 7<!-- 8© 2020 and later: Unicode, Inc. and others. 9License & terms of use: http://www.unicode.org/copyright.html 10--> 11 12# Case Mappings 13{: .no_toc } 14 15## Contents 16{: .no_toc .text-delta } 17 181. TOC 19{:toc} 20 21--- 22 23## Overview 24 25Case mapping is used to handle the mapping of upper-case, lower-case, and title 26case characters for a given language. Case is a normative property of characters 27in specific alphabets (e.g. Latin, Greek, Cyrillic, Armenian, and Georgian) 28whereby characters are considered to be variants of a single letter. ICU refers 29to these variants, which may differ markedly in shape and size, as uppercase 30letters (also known as capital or majuscule) and lower-case letters (also known 31as small or minuscule). Alphabets with case differences are called bicameral and 32alphabets without case differences are called unicameral. 33 34Due to the inclusion of certain composite characters for compatibility, such as 35the Latin capital letter 'DZ' (\\u01F1 'DZ'), there is a third case called title 36case. Title case is used to capitalize the first character of a word such as the 37Latin capital letter 'D' with small letter 'z' ( \\u01F2 'Dz'). The term "title 38case" can also be used to refer to words whose first letter is an uppercase or 39title case letter and the rest are lowercase letters. However, not all words in 40the title of a document or first words in a sentence will be title case. The use 41of title case words is language dependent. For example, in English, "Taming of 42the Shrew" would be the appropriate capitalization and not "Taming Of The 43Shrew". 44 45> :point_right: **Note**: *As of Unicode 11, Georgian now has Mkhedruli (lowercase) and Mtavruli 46(uppercase) which form case pairs, but are not used in title case.* 47 48Sample code is available in the ICU source code library at 49[icu/source/samples/ustring/ustring.cpp](https://github.com/unicode-org/icu/blob/master/icu4c/source/samples/ustring/ustring.cpp) 50. 51 52Please refer to the following sections in the [The Unicode Standard](http://www.unicode.org/versions/latest/) 53for more information about case mapping: 54 55* 3.13 Default Case Algorithms 56* 4.2 Case 57* 5.18 Case Mappings 58 59## Simple (Single-Character) Case Mapping 60 61The general case mapping in ICU is non-language based and a 1 to 1 generic 62character map. 63 64A character is considered to have a lowercase, uppercase, or title case 65equivalent if there is a respective "simple" case mapping specified for the 66character in the [Unicode Character Database](http://www.unicode.org/ucd/) (UnicodeData.txt). 67If a character has no mapping equivalent, the result is the character itself. 68 69The APIs provided for the general case mapping, located in `uchar.h` file, handles 70only single characters of type `UChar32` and returns only single characters. To 71convert a string to a non-language based specific case, use the APIs in either 72the `unistr.h` or `ustring.h` files with a `NULL` argument locale. 73 74## Full (Language-Specific) Case Mapping 75 76There are different case mappings for different locales. For instance, unlike 77English, the character Latin small letter 'i' in Turkish has an equivalent Latin 78capital letter 'I' with dot above ( \\u0130 'İ'). 79 80Similar to the simple case mapping API, a character is considered to have a 81lowercase, uppercase or title case equivalent if there is a respective mapping 82specified for the character in the Unicode Character database (UnicodeData.txt). 83In the case where a character has no mapping equivalent, the result is the 84character itself. 85 86To convert a string to a language based specific case, use the APIs in `ustring.h` 87and `unistr.h` with an intended argument locale. 88 89ICU implements full Unicode string case mappings. 90 91**In general:** 92 93* **case mapping can change the number of code points and/or code units of a 94 string,** 95* **is language-sensitive (results may differ depending on language), and** 96* **is context-sensitive (a character in the input string may map differently 97 depending on surrounding characters).** 98 99## Case Folding 100 101Case folding maps strings to a canonical form where case differences are erased. 102Using the case folding API, ICU supports fast matches without regard to case in 103lookups, since only binary comparison is required. 104 105The CaseFolding.txt file in the Unicode Character Database is used for 106performing locale-independent case folding. This text file is generated from the 107case mappings in the Unicode Character Database, using both the single-character 108and the multi-character mappings. The CaseFolding.txt file transforms all 109characters having different case forms into a common form. To compare two 110strings for non-case-sensitive matching, you can transform each string and then 111use a binary comparison. There are also functions to compare two strings 112case-insensitively using the same case folding data. 113 114Unicode case folding is not context-sensitive. It is also not 115language-sensitive, although there is a flag for whether to apply special 116mappings for use with Turkic (Turkish/Azerbaijani) text data. 117 118Character case folding APIs implementations are located in: 119 1201. `uchar.h` for single character folding 121 1222. `ustring.h` and `unistr.h` for character string folding. 123