• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# Character Processing
2
3## Use Cases
4
5Character rules vary greatly in different languages, and it is usually difficult to extract expected information from the corresponding text. Character processing makes it possible to process text with similar logic under different language rules.
6
7## How to Develop
8
9
10### Character Type Identification Using Character Attributes
11
12Character attributes are used to determine the character type, for example, digit, letter, or space, and check whether a character is of the right-to-left (RTL) language or whether a character is an ideographic character (for example, Chinese, Japanese, or Korean).
13
14You can implement these functions by using APIs of the Unicode class. For example, you can use [isDigit](../reference/apis-localization-kit/js-apis-i18n.md#isdigit9) to check whether a character is a digit. The development procedure is as follows:
15
161. Import the **i18n** module.
17
18   ```ts
19   import { i18n } from '@kit.LocalizationKit';
20   ```
21
222. Obtain the character attribute.
23
24   ```ts
25   let isDigit: boolean = i18n.Unicode.isDigit(char: string);
26   ```
27
283. Obtain the character type. The following code snippet uses the common type as an example.
29
30   ```ts
31   let unicodeType: string = i18n.Unicode.getType(char: string);
32   ```
33
34**Development Example**
35```ts
36// Import the i18n module.
37import { i18n } from '@kit.LocalizationKit';
38
39// Check whether the input character is a digit.
40let isDigit: boolean = i18n.Unicode.isDigit('1'); // isDigit = true
41
42// Check whether a character is of the RTL language.
43let isRTL: boolean = i18n.Unicode.isRTL('a'); // isRTL = false
44
45// Check whether a character is an ideographic character.
46let isIdeograph: boolean = i18n.Unicode.isIdeograph ('华'); // isIdeograph = true
47
48// Obtain the character type.
49let unicodeType: string = i18n.Unicode.getType('a'); // unicodeType = 'U_LOWERCASE_LETTER'
50```
51
52
53### Transliteration
54
55Transliteration refers to the process of converting text represented by one writing system or alphabet into text represented by another writing system or alphabet with the same pronunciation. It is distinct from translation. You can implement this function by using the [transform](../reference/apis-localization-kit/js-apis-i18n.md#transform9) API of the **Transliterator** class. The development procedure is as follows:
56
57> **NOTE**
58> This module enables the conversion of Chinese characters into pinyin. Nevertheless, when the Chinese text includes polyphonic characters, there may be instances where some of these characters fail to be converted into pinyin with the accurate pronunciation.
59
601. Import the **i18n** module.
61   ```ts
62   import { i18n } from '@kit.LocalizationKit';
63   ```
64
652. Obtain the list of available transliterator IDs, and create a **Transliterator** object.
66   ```ts
67   let ids: string[] = i18n.Transliterator.getAvailableIDs(); // Obtain the list of available transliterator IDs.
68   let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance(id: string); // Pass in a valid transliterator ID to create a Transliterator object.
69   ```
70
713. Transliterate text.
72   ```ts
73   let translatedText: string = transliterator.transform(text: string); // Transliterate the text content.
74   ```
75
76
77**Development Example**
78```ts
79// Import the i18n module.
80import { i18n } from '@kit.LocalizationKit';
81
82// Transliterate the text into the Latn format.
83let transliterator: i18n.Transliterator = i18n.Transliterator.getInstance('Any-Latn');
84let text: string = '中国'
85let translatedText: string = transliterator.transform(text); // translatedText = 'zhōng guó'
86
87// Chinese transliteration and tone removal
88let toneLessTransliterator: i18n.Transliterator = i18n.Transliterator.getInstance('Any-Latn;Latin-Ascii');
89translatedText = toneLessTransliterator.transform ('中国'); // translatedText ='zhong guo'
90
91// Chinese surname pronunciation
92let nameTransliterator: i18n.Transliterator = i18n.Transliterator.getInstance('Han-Latin/Names');
93translatedText = nameTransliterator.transform('单老师'); // translatedText = 'shàn lǎo shī'
94translatedText = nameTransliterator.transform('长孙无忌'); // translatedText = 'zhǎng sūn wú jì'
95
96// Obtain the list of available transliterator IDs.
97let ids: string[] = i18n.Transliterator.getAvailableIDs(); // ids = ['ASCII-Latin', 'Accents-Any', ...]
98```
99
100
101### Text Normalization
102
103Text normalization means to the normalize text according to the specified paradigm. You can implement this function by using the [normalize](../reference/apis-localization-kit/js-apis-i18n.md#normalize10) API of the **Normalizer** class. The development procedure is as follows:
104
1051. Import the **i18n** module.
106   ```ts
107   import { i18n } from '@kit.LocalizationKit';
108   ```
109
1102. Create a **Normalizer** object based on the specified text normalization mode. The text normalization mode can be NFC, NFD, NFKC, or NFKD. For details, see [Unicode Normalization Forms](https://www.unicode.org/reports/tr15/#Norm_Forms).
111   ```ts
112   let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(mode: NormalizerMode);
113   ```
114
1153. Normalize the text.
116   ```ts
117   let normalizedText: string = normalizer.normalize(text: string); // Normalize the text.
118   ```
119
120**Development Example**
121```ts
122// Import the i18n module.
123import { i18n } from '@kit.LocalizationKit';
124
125// Normalize the text in NFC mode.
126let normalizer: i18n.Normalizer = i18n.Normalizer.getInstance(i18n.NormalizerMode.NFC);
127let normalizedText: string = normalizer.normalize('\u1E9B\u0323'); // normalizedText = 'ẛ̣'
128```
129
130
131### Line Break Point Acquisition
132
133You can use APIs of the [BreakIterator](../reference/apis-localization-kit/js-apis-i18n.md#breakiterator8) class to obtain line break points of the text for the specified locale. The development procedure is as follows:
134
1351. Import the **i18n** module.
136   ```ts
137   import { i18n } from '@kit.LocalizationKit';
138   ```
139
1402. Create a **BreakIterator** object to obtain line break points of the text for the specified locale. The object calculates the line break points in the text according to the rules of the specified locale.
141
142   ```ts
143   let iterator: i18n.BreakIterator = i18n.getLineInstance(locale: string);
144   ```
145
1463. Set the text to be processed.
147   ```ts
148   iterator.setLineBreakText(text: string); // Set the text to be processed.
149   let breakText: string = iterator.getLineBreakText(); // View the text being processed by the BreakIterator object.
150   ```
151
1524. Obtain the position of a line break point.
153   ```ts
154   let currentPos: number = iterator.current(); // Obtain the position of the BreakIterator object in the text.
155   let firstPos: number = iterator.first(); // If the first line break point is specified, its position will be returned. It is always at the beginning of the text, that is, firstPos = 0.
156   let nextPos: number = iterator.next(index?: number); // Move the BreakIterator object by the specified number of line break points. If the number is a positive number, the object is moved backward. If the number is a negative number, the object is moved forward. The default value is 1. nextPos indicates the position after movement. If BreakIterator is moved out of the text length range, -1 is returned.
157   let isBoundary: boolean = iterator.isBoundary(offset: number); // Check whether the offset position is a line break point.
158   ```
159
160
161**Development Example**
162```ts
163// Import the i18n module.
164import { i18n } from '@kit.LocalizationKit';
165
166// Create a BreakIterator object to obtain line break points.
167let iterator: i18n.BreakIterator  = i18n.getLineInstance('en-GB');
168
169// Set the text to be processed.
170iterator.setLineBreakText('Apple is my favorite fruit.');
171
172// Move the BreakIterator object to the beginning of the text.
173let firstPos: number = iterator.first(); // firstPos = 0
174
175// Move the BreakIterator object backward by two line break points.
176let nextPos: number = iterator.next(2); // nextPos = 9
177
178// Check whether a certain position is a line break point.
179let isBoundary: boolean = iterator.isBoundary(9); // isBoundary = true
180
181// Obtain the text processed by BreakIterator.
182let breakText: string = iterator.getLineBreakText(); // breakText = 'Apple is my favorite fruit.'
183```
184
185### File Path Mirroring
186
187File path mirroring means to localize file paths for an RTL language, so as to achieve file path mirroring effect in that language. You can implement this function by using the [getUnicodeWrappedFilePath](../reference/apis-localization-kit/js-apis-i18n.md#getunicodewrappedfilepath18) API of the **I18NUtil** class. The development procedure is as follows:
188
1891. Import the **i18n** module.
190   ```ts
191   import { i18n, intl } from '@kit.LocalizationKit';
192   ```
193
1942. Perform file path mirroring.
195   ```ts
196   let mirrorPath: string = i18n.I18NUtil.getUnicodeWrappedFilePath(path: string, delimiter?: string, locale?: intl.Locale);
197   ```
198
199
200**Development Example**
201```ts
202// Import the i18n module.
203import { BusinessError } from '@kit.BasicServicesKit';
204import { i18n, intl } from '@kit.LocalizationKit';
205
206try {
207  // Perform file path mirroring if mirrorPath is passed.
208  let path: string = 'data/out/tmp';
209  let delimiter: string = '/';
210  let locale: intl.Locale = new intl.Locale('ar');
211  // mirrorPath = 'tmp/out/data/'
212  let mirrorPath: string = i18n.I18NUtil.getUnicodeWrappedFilePath(path, delimiter, locale);
213
214  // Skip file path mirroring if unMirrorPath is passed.
215  let localeZh: intl.Locale = new intl.Locale('zh');
216  // unMirrorPath = '/data/out/tmp'
217  let unMirrorPath: string = i18n.I18NUtil.getUnicodeWrappedFilePath(path, delimiter, localeZh);
218} catch (error) {
219  console.error(`call I18NUtil.getUnicodeWrappedFilePath failed, error code: ${error.code}, message: ${error.message}.`);
220}
221```
222<!--RP1--><!--RP1End-->
223
224<!--no_check-->