• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1# tex-hyphen
2## Introduction
3tex-hyphen is a hyphenation pattern library for the TeX system. It can correctly hyphenate words in multiple languages to improve typesetting quality.
4
5Source: tex-hyphen
6URL: https://github.com/hyphenation/tex-hyphen
7Version: CTAN-2021.03.21
8License: Various combinations
9
10## Background
11In multilingual document processing and typesetting, correct hyphenation is crucial. tex-hyphen provides a comprehensive set of hyphenation patterns that support multiple languages, ensuring high-quality typesetting. Introducing tex-hyphen into OpenHarmony can significantly enhance the typesetting quality of multilingual documents.
12
13## Language Classification
14The tex directory contains multiple hyphenation patterns from TeX hyphenation patterns, each using different open-source licenses. The classification is as follows:
15* MIT License
16* GPL, GPL 2
17* LGPL 1, LGPL 2.1
18* LPPL 1, LPPL 1.2, LPPL 1.3
19* MPL 1.1
20* BSD 3
21
22Languages used in OHOS,the following languages all use a user-friendly open-source license.:
23* be - Belarusian
24* cs - Czech
25* cy - Welsh
26* da - Danish
27* de-1901 - German (1901 orthography)
28* de-ch-1901 - Swiss German (1901 orthography)
29* el-monoton - Modern Greek (monotonic)
30* el-polyton - Modern Greek (polytonic)
31* en-gb - British English
32* en-us - American English
33* es - Spanish
34* et - Estonian
35* fr - French
36* ga - Irish
37* gl - Galician
38* hr - Croatian
39* hu - Hungarian
40* hy - Armenian
41* id - Indonesian
42* is - Icelandic
43* it - Italian
44* ka - Georgian
45* lt - Lithuanian
46* lv - Latvian
47* mk - Macedonian
48* mn-cyrl - Mongolian (Cyrillic script)
49* nl - Dutch
50* pt - Portuguese
51* ru - Russian
52* sh-cyrl - Serbo-Croatian (Cyrillic script)
53* sh-latn - Serbo-Croatian (Latin script)
54* sk - Slovak
55* sl - Slovenian
56* sr-cyrl - Serbian (Cyrillic script)
57* sv - Swedish
58* th - Thai
59* tk - Turkmen
60* tr - Turkish
61* uk - Ukrainian
62* zh-latn-pinyin - Chinese (Pinyin)
63
64## Directory Structure
65```
66third_party_tex-hyphen
67├── collaboration
68│   ├── original
69│   ├── repository
70│   └── source
71├── data/language-codes
72├── docs
73│   └── languages
74├── encoding
75│   └── data
76├── hyph-utf8
77│   ├── doc
78│   ├── source
79│   └── tex
80├── misc
81├── ohos
82│   ├── src
83│   └── hpb-binary
84├── old
85├── source
86├── tests
87├── TL
88├── tools
89└── webpage
90```
91collaboration/       JavaScript dependencies and XML configuration files required by the tex-hyphen official website
92ohos/                OpenHarmony compilation files and hpb binary files
93data/                Language library
94docs/                Documentation related to hyphenation
95encoding/            Contains files related to character set encodings, handling different character sets.
96hyph-utf8/           Hyphenation pattern package for TeX, providing hyphenation patterns encoded in UTF-8
97misc/                An example of a hyphenation file for the en-gb language.
98old/                 Contains older hyphenation pattern files that may have been updated or replaced.
99source/              Contains source code files used to generate and process hyphenation patterns.
100TL/                  tlpsrc resource files, which are package source files in the TeX Live system, used to describe metadata of TeX Live packages
101tools/               Contains utility scripts to assist in processing hyphenation pattern files.
102webpage/             tex-hyphen official homepage, providing detailed information and resources about the hyph-utf8 package
103
104
105## Value Brought to OpenHarmony
106**1. Improved Typesetting Quality:** By introducing tex-hyphen, OpenHarmony can achieve more accurate hyphenation, improving the readability and aesthetics of documents.
107**2. Enhanced Small Screen Experience:** Using hyphenation patterns on small screen devices can display more content in the same area, enhancing the reading experience.
108
109## How to Use tex-hyphen in OpenHarmony
110### 1. Compile the HPB Binary
111#### Compilation Steps
112Open the terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_processor.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_processor.cpp) file, and run the following command to compile the code:
113
114```
115cd ohos/src/hyphen-build/
116g++ -g -Wall hyphen_pattern_processor.cpp -o transform
117```
118
119Explanation of the command:
120- g++: Invoke the GCC compiler.
121- -g: Add debugging information.
122- -Wall: Enable all warnings.
123- hyphen_pattern_processor.cpp: Source code file.
124- -o transform: Specify the output executable file name as transform.
125
126#### Execution Steps
127After compilation, you can run the generated executable file and process the specified .tex file using the following command:
128
129```
130./transform hyph-en-us.tex ./out/
131```
132
133Explanation of the command:
134- ./transform: Run the generated transform executable file.
135- hyph-en-us.tex: Input file (the .tex file to be processed).
136- ./out/: Output directory (the processed files will be stored in this directory).
137
138After successful execution, the processed files will be stored in the ./out/ directory.
139
140#### Batch Compilation
141- Dependencies:
142```
143jq:JSON file parsing tool
144```
145- Configure the files to be compiled using the JSON configuration file [build-tex.json](ohos%2Fbuild%2Fbuild-tex.json):
146```
147[
148    {
149        "filename": "example1.tex"
150    },
151    {
152        "filename": "example2.tex"
153    }
154]
155```
156filename: Specifies the name of the TeX file to be compiled. The file must be located in the [tex](hyph-utf8%2Ftex%2Fgeneric%2Fhyph-utf8%2Fpatterns%2Ftex) directory.
157
158The build-tex.json file defines all supported languages, and the script will compile all of them by default. Developers can control the addition or removal of languages by modifying build-tex.json.
159For example:
160To remove the example2 language, modify the file as follows:
161```
162[
163    {
164        "filename": "example1.tex"
165    }
166]
167```
168To add the example3 language, modify the file as follows:
169```
170[
171    {
172        "filename": "example1.tex"
173    },
174    {
175        "filename": "example2.tex"
176    },
177    {
178        "filename": "example3.tex"
179    }
180]
181```
182
183- Open a terminal (or command prompt), navigate to the directory containing the [build.sh](ohos%2Fbuild%2Fbuild.sh) file, and run the following commands to compile the code:
184```
185chmod +x build.sh
186./build.sh
187```
188After successful compilation, the compiled output will be placed in the ./out_hpb directory.
189### 2. Parse Word Hyphenation Positions Using HPB
190#### compilation Steps
191Open a terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_reader.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_reader.cpp) file, and run the following command to compile the code:
192
193```
194cd ohos/src/hyphen-build/
195g++ -g -Wall hyphen_pattern_reader.cpp -o reader
196```
197Explanation of the command:
198- g++: Calls the GCC compiler.
199- -g: Adds debugging information.
200- -Wall: Enables all warnings.
201- hyphen_pattern_reader.cpp: The source code file.
202- -o reader: Specifies the output executable file name as reader.
203
204#### Running Steps
205After compilation, you can parse the hyphenation positions of words in the specified language using the following command:
206
207```
208./reader hyph-en-us.hpb helloworld
209```
210Explanation of the command:
211- ./reader: Runs the generated reader executable.
212- hyph-en-us.hpb: The input file (the binary file to be parsed).
213- helloworld: The word to be parsed.
214
215After successful execution, the log will output the hyphenation information of the parsed word.
216
217### 3. Batch Verification
218You can use the [generate_report.py](ohos%2Ftest%2Fgenerate_report.py) Python script to read the [report_config.json](ohos%2Ftest%2Freport_config.json) configuration file and perform batch verification to check the validity of the generated binary files.
219#### Preparation
220- Python 3.x
221- transform and reader executables, placed in the same directory as the script.
222- report_config.json configuration file
223
224#### Usage
2251. Prepare the Configuration File First, create a JSON configuration file named report_config.json with the following content:
226```
227{
228    "file_path": "path/to/tex/files",
229    "tex_files": [
230        {
231            "filename": "example.tex",
232            "words": ["word1", "word2", "word3", "word4", "word5", "word6", "word7", "word8", "word9", "word10"]
233        },
234        ...
235    ]
236}
237```
2382. Run the Script Run the following command in the terminal:
239```
240python generate_report.py report_config.json
241```
2423. Log Files The script will generate a timestamped subdirectory under the report directory, containing the following log files:
243```
244match.log: Records successful matches.
245unmatch.log: Records unsuccessful matches.
246```