• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# tex-hyphen
2## Introduction
3tex-hyphen is a hyphenation pattern library for the TeX system. It can correctly hyphenate words in multiple languages to improve typesetting quality.
4
5Source: tex-hyphen
6URL: https://github.com/hyphenation/tex-hyphen
7Version: CTAN-2024.12.31
8License: Various combinations
9
10## Background
11In multilingual document processing and typesetting, correct hyphenation is crucial. tex-hyphen provides a comprehensive set of hyphenation patterns that support multiple languages, ensuring high-quality typesetting. Introducing tex-hyphen into OpenHarmony can significantly enhance the typesetting quality of multilingual documents.
12
13## Language Classification
14The tex directory contains multiple hyphenation patterns from TeX hyphenation patterns, each using different open-source licenses. The classification is as follows:
15* MIT License
16* GPL, GPL 2
17* LGPL 1, LGPL 2.1
18* LPPL 1, LPPL 1.2, LPPL 1.3
19* MPL 1.1
20* BSD 3
21
22Languages used in OHOS,the following languages all use a user-friendly open-source license.:
23* be - Belarusian
24* cs - Czech
25* cy - Welsh
26* da - Danish
27* de-1901 - German (1901 orthography)
28* de-ch-1901 - Swiss German (1901 orthography)
29* el-monoton - Modern Greek (monotonic)
30* el-polyton - Modern Greek (polytonic)
31* en-gb - British English
32* en-us - American English
33* es - Spanish
34* et - Estonian
35* fr - French
36* ga - Irish
37* gl - Galician
38* hr - Croatian
39* hu - Hungarian
40* id - Indonesian
41* is - Icelandic
42* it - Italian
43* ka - Georgian
44* lt - Lithuanian
45* lv - Latvian
46* mk - Macedonian
47* mn-cyrl - Mongolian (Cyrillic script)
48* nl - Dutch
49* pt - Portuguese
50* ru - Russian
51* sh-cyrl - Serbo-Croatian (Cyrillic script)
52* sh-latn - Serbo-Croatian (Latin script)
53* sk - Slovak
54* sl - Slovenian
55* sr-cyrl - Serbian (Cyrillic script)
56* sv - Swedish
57* th - Thai
58* tk - Turkmen
59* tr - Turkish
60* uk - Ukrainian
61* zh-latn-pinyin - Chinese (Pinyin)
62
63## Directory Structure
64```
65third_party_tex-hyphen
66├── collaboration
67│   ├── original
68│   ├── repository
69│   └── source
70├── data/language-codes
71├── docs
72│   └── languages
73├── encoding
74│   └── data
75├── hyph-utf8
76│   ├── doc
77│   ├── source
78│   └── tex
79├── misc
80├── ohos
81│   ├── src
82│   └── hpb-binary
83├── old
84├── source
85├── tests
86├── TL
87├── tools
88└── webpage
89```
90collaboration/       JavaScript dependencies and XML configuration files required by the tex-hyphen official website
91ohos/                OpenHarmony compilation files and hpb binary files
92data/                Language library
93docs/                Documentation related to hyphenation
94encoding/            Contains files related to character set encodings, handling different character sets.
95hyph-utf8/           Hyphenation pattern package for TeX, providing hyphenation patterns encoded in UTF-8
96misc/                An example of a hyphenation file for the en-gb language.
97old/                 Contains older hyphenation pattern files that may have been updated or replaced.
98source/              Contains source code files used to generate and process hyphenation patterns.
99TL/                  tlpsrc resource files, which are package source files in the TeX Live system, used to describe metadata of TeX Live packages
100tools/               Contains utility scripts to assist in processing hyphenation pattern files.
101webpage/             tex-hyphen official homepage, providing detailed information and resources about the hyph-utf8 package
102
103
104## Value Brought to OpenHarmony
105**1. Improved Typesetting Quality:** By introducing tex-hyphen, OpenHarmony can achieve more accurate hyphenation, improving the readability and aesthetics of documents.
106**2. Enhanced Small Screen Experience:** Using hyphenation patterns on small screen devices can display more content in the same area, enhancing the reading experience.
107
108## How to Use tex-hyphen in OpenHarmony
109### 1. Compile the HPB Binary
110#### Compilation Steps
111Open the terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_processor.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_processor.cpp) file, and run the following command to compile the code:
112
113```
114cd ohos/src/hyphen-build/
115g++ -g -Wall hyphen_pattern_processor.cpp -o transform
116```
117
118Explanation of the command:
119- g++: Invoke the GCC compiler.
120- -g: Add debugging information.
121- -Wall: Enable all warnings.
122- hyphen_pattern_processor.cpp: Source code file.
123- -o transform: Specify the output executable file name as transform.
124
125#### Execution Steps
126After compilation, you can run the generated executable file and process the specified .tex file using the following command:
127
128```
129./transform hyph-en-us.tex ./out/
130```
131
132Explanation of the command:
133- ./transform: Run the generated transform executable file.
134- hyph-en-us.tex: Input file (the .tex file to be processed).
135- ./out/: Output directory (the processed files will be stored in this directory).
136
137After successful execution, the processed files will be stored in the ./out/ directory.
138
139#### Batch Compilation
140- Dependencies:
141```
142jq:JSON file parsing tool
143```
144- Configure the files to be compiled using the JSON configuration file [build-tex.json](ohos%2Fbuild%2Fbuild-tex.json):
145```
146[
147    {
148        "filename": "example1.tex"
149    },
150    {
151        "filename": "example2.tex"
152    }
153]
154```
155filename: Specifies the name of the TeX file to be compiled. The file must be located in the [tex](hyph-utf8%2Ftex%2Fgeneric%2Fhyph-utf8%2Fpatterns%2Ftex) directory.
156
157The build-tex.json file defines all supported languages, and the script will compile all of them by default. Developers can control the addition or removal of languages by modifying build-tex.json.
158For example:
159To remove the example2 language, modify the file as follows:
160```
161[
162    {
163        "filename": "example1.tex"
164    }
165]
166```
167To add the example3 language, modify the file as follows:
168```
169[
170    {
171        "filename": "example1.tex"
172    },
173    {
174        "filename": "example2.tex"
175    },
176    {
177        "filename": "example3.tex"
178    }
179]
180```
181
182- Open a terminal (or command prompt), navigate to the directory containing the [build.sh](ohos%2Fbuild%2Fbuild.sh) file, and run the following commands to compile the code:
183```
184chmod +x build.sh
185./build.sh
186```
187After successful compilation, the compiled output will be placed in the ./out_hpb directory.
188### 2. Parse Word Hyphenation Positions Using HPB
189#### compilation Steps
190Open a terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_reader.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_reader.cpp) file, and run the following command to compile the code:
191
192```
193cd ohos/src/hyphen-build/
194g++ -g -Wall hyphen_pattern_reader.cpp -o reader
195```
196Explanation of the command:
197- g++: Calls the GCC compiler.
198- -g: Adds debugging information.
199- -Wall: Enables all warnings.
200- hyphen_pattern_reader.cpp: The source code file.
201- -o reader: Specifies the output executable file name as reader.
202
203#### Running Steps
204After compilation, you can parse the hyphenation positions of words in the specified language using the following command:
205
206```
207./reader hyph-en-us.hpb helloworld
208```
209Explanation of the command:
210- ./reader: Runs the generated reader executable.
211- hyph-en-us.hpb: The input file (the binary file to be parsed).
212- helloworld: The word to be parsed.
213
214After successful execution, the log will output the hyphenation information of the parsed word.
215
216### 3. Batch Verification
217You can use the [generate_report.py](ohos%2Ftest%2Fgenerate_report.py) Python script to read the [report_config.json](ohos%2Ftest%2Freport_config.json) configuration file and perform batch verification to check the validity of the generated binary files.
218#### Preparation
219- Python 3.x
220- transform and reader executables, placed in the same directory as the script.
221- report_config.json configuration file
222
223#### Usage
2241. Prepare the Configuration File First, create a JSON configuration file named report_config.json with the following content:
225```
226{
227    "file_path": "path/to/tex/files",
228    "tex_files": [
229        {
230            "filename": "example.tex",
231            "words": ["word1", "word2", "word3", "word4", "word5", "word6", "word7", "word8", "word9", "word10"]
232        },
233        ...
234    ]
235}
236```
2372. Run the Script Run the following command in the terminal:
238```
239python generate_report.py report_config.json
240```
2413. Log Files The script will generate a timestamped subdirectory under the report directory, containing the following log files:
242```
243match.log: Records successful matches.
244unmatch.log: Records unsuccessful matches.
245```