1# Character Processing 2 3## Use Cases 4 5Character rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules. 6 7## How to Develop 8 9 10### Character Type Identification Using Character Attributes 11 12Character attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean). 13 14You can implement these functions by using APIs of the Unicode class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows: 15 161. Import the **i18n** module. 17 18 ```ts 19 import { i18n } from '@kit.LocalizationKit'; 20 ``` 21 222. Obtain the character attribute. 23 24 ```ts 25 let isDigit: boolean = i18n.Unicode.isDigit(char: string); 26 ``` 27 283. Obtain the character type. The following code snippet uses the common type as an example. 29 30 ```ts 31 let unicodeType: string = i18n.Unicode.getType(char: string); 32 ``` 33 34**Development Example** 35```ts 36// Import the i18n module. 37import { i18n } from '@kit.LocalizationKit'; 38 39// Check whether the input character is a digit. 40let isDigit: boolean = i18n.Unicode.isDigit('1'); // isDigit = true 41 42// Check whether a character is of the RTL language. 43let isRTL: boolean = i18n.Unicode.isRTL('a'); // isRTL = false 44 45// Check whether a character is an ideographic character. 46let isIdeograph: boolean = i18n.Unicode.isIdeograph ('华'); // isIdeograph = true 47 48// Obtain the character type. 49let unicodeType: string = i18n.Unicode.getType('a'); // unicodeType = 'U_LOWERCASE_LETTER' 50``` 51 52 53### Transliteration 54 55Transliteration refers to the process of converting text represented by one writing system or alphabet into text represented by another writing system or alphabet with the same pronunciation. It is distinct from translation. You can implement this function by using the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows: 56 57> **NOTE** 58> This module enables the conversion of Chinese characters into pinyin. Nevertheless, when the Chinese text includes polyphonic characters, there may be instances where some of these characters fail to be converted into pinyin with the accurate pronunciation. 59 601. Import the **i18n** module. 61 ```ts 62 import { i18n } from '@kit.LocalizationKit'; 63 ``` 64 652. Obtain the list of available transliterator IDs, and create a **Transliterator** object. 66 ```ts 67 let ids: string[] = i18n.Transliterator.getAvailableIDs(); // Obtain the list of available transliterator IDs. 68 let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string); // Pass in a valid transliterator ID to create a Transliterator object. 69 ``` 70 713. Transliterate text. 72 ```ts 73 let translatedText: string = transliterator.transform(text: string); // Transliterate the text content. 74 ``` 75 76 77**Development Example** 78```ts 79// Import the i18n module. 80import { i18n } from '@kit.LocalizationKit'; 81 82// Transliterate the text into the Latn format. 83let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance('Any-Latn'); 84let text: string = '中国' 85let translatedText: string = transliterator.transform(text); // translatedText = 'zhōng guó' 86 87// Chinese transliteration and tone removal 88let toneLessTransliterator: i18n.Transliterator = i18n.Transliterator.getInstance('Any-Latn;Latin-Ascii'); 89translatedText = toneLessTransliterator.transform ('中国'); // translatedText ='zhong guo' 90 91// Chinese surname pronunciation 92let nameTransliterator: i18n.Transliterator = i18n.Transliterator.getInstance('Han-Latin/Names'); 93translatedText = nameTransliterator.transform('单老师'); // translatedText = 'shàn lǎo shī' 94translatedText = nameTransliterator.transform('长孙无忌'); // translatedText = 'zhǎng sūn wú jì' 95 96// Obtain the list of available transliterator IDs. 97let ids: string[] = i18n.Transliterator.getAvailableIDs(); // ids = ['ASCII-Latin', 'Accents-Any', ...] 98``` 99 100 101### Text Normalization 102 103Text normalization means to the normalize text according to the specified paradigm. You can implement this function by using the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows: 104 1051. Import the **i18n** module. 106 ```ts 107 import { i18n } from '@kit.LocalizationKit'; 108 ``` 109 1102. Create a **Normalizer** object based on the specified text normalization mode. The text normalization mode can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms). 111 ```ts 112 let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode); 113 ``` 114 1153. Normalize the text. 116 ```ts 117 let normalizedText: string = normalizer.normalize(text: string); // Normalize the text. 118 ``` 119 120**Development Example** 121```ts 122// Import the i18n module. 123import { i18n } from '@kit.LocalizationKit'; 124 125// Normalize the text in NFC mode. 126let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC); 127let normalizedText: string = normalizer.normalize('\u1E9B\u0323'); // normalizedText = 'ẛ̣' 128``` 129 130 131### Line Break Point Acquisition 132 133You can use APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class to obtain line break points of the text for the specified locale. The development procedure is as follows: 134 1351. Import the **i18n** module. 136 ```ts 137 import { i18n } from '@kit.LocalizationKit'; 138 ``` 139 1402. Create a **BreakIterator** object to obtain line break points of the text for the specified locale. The object calculates the line break points in the text according to the rules of the specified locale. 141 142 ```ts 143 let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string); 144 ``` 145 1463. Set the text to be processed. 147 ```ts 148 iterator.setLineBreakText(text: string); // Set the text to be processed. 149 let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object. 150 ``` 151 1524. Obtain the position of a line break point. 153 ```ts 154 let currentPos: number = iterator.current(); // Obtain the position of the BreakIterator object in the text. 155 let firstPos: number = iterator.first(); // If the first line break point is specified, its position will be returned. It is always at the beginning of the text, that is, firstPos = 0. 156 let nextPos: number = iterator.next(index?: number); // Move the BreakIterator object by the specified number of line break points. If the number is a positive number, the object is moved backward. If the number is a negative number, the object is moved forward. The default value is 1. nextPos indicates the position after movement. If BreakIterator is moved out of the text length range, -1 is returned. 157 let isBoundary: boolean = iterator.isBoundary(offset: number); // Check whether the offset position is a line break point. 158 ``` 159 160 161**Development Example** 162```ts 163// Import the i18n module. 164import { i18n } from '@kit.LocalizationKit'; 165 166// Create a BreakIterator object to obtain line break points. 167let iterator: i18n.BreakIterator = i18n.getLineInstance('en-GB'); 168 169// Set the text to be processed. 170iterator.setLineBreakText('Apple is my favorite fruit.'); 171 172// Move the BreakIterator object to the beginning of the text. 173let firstPos: number = iterator.first(); // firstPos = 0 174 175// Move the BreakIterator object backward by two line break points. 176let nextPos: number = iterator.next(2); // nextPos = 9 177 178// Check whether a certain position is a line break point. 179let isBoundary: boolean = iterator.isBoundary(9); // isBoundary = true 180 181// Obtain the text processed by BreakIterator. 182let breakText: string = iterator.getLineBreakText(); // breakText = 'Apple is my favorite fruit.' 183``` 184 185### File Path Mirroring 186 187File path mirroring means to localize file paths for an RTL language, so as to achieve file path mirroring effect in that language. You can implement this function by using the [getUnicodeWrappedFilePath](../reference/apis-localization-kit/js-apis-i18n.md#getunicodewrappedfilepath18) API of the **I18NUtil** class. The development procedure is as follows: 188 1891. Import the **i18n** module. 190 ```ts 191 import { i18n, intl } from '@kit.LocalizationKit'; 192 ``` 193 1942. Perform file path mirroring. 195 ```ts 196 let mirrorPath: string = i18n.I18NUtil.getUnicodeWrappedFilePath(path: string, delimiter?: string, locale?: intl.Locale); 197 ``` 198 199 200**Development Example** 201```ts 202// Import the i18n module. 203import { BusinessError } from '@kit.BasicServicesKit'; 204import { i18n, intl } from '@kit.LocalizationKit'; 205 206try { 207 // Perform file path mirroring if mirrorPath is passed. 208 let path: string = 'data/out/tmp'; 209 let delimiter: string = '/'; 210 let locale: intl.Locale = new intl.Locale('ar'); 211 // mirrorPath = 'tmp/out/data/' 212 let mirrorPath: string = i18n.I18NUtil.getUnicodeWrappedFilePath(path, delimiter, locale); 213 214 // Skip file path mirroring if unMirrorPath is passed. 215 let localeZh: intl.Locale = new intl.Locale('zh'); 216 // unMirrorPath = '/data/out/tmp' 217 let unMirrorPath: string = i18n.I18NUtil.getUnicodeWrappedFilePath(path, delimiter, localeZh); 218} catch (error) { 219 console.error(`call I18NUtil.getUnicodeWrappedFilePath failed, error code: ${error.code}, message: ${error.message}.`); 220} 221``` 222<!--RP1--><!--RP1End--> 223 224<!--no_check-->