1# tex-hyphen 2## Introduction 3tex-hyphen is a hyphenation pattern library for the TeX system. It can correctly hyphenate words in multiple languages to improve typesetting quality. 4 5Source: tex-hyphen 6URL: https://github.com/hyphenation/tex-hyphen 7Version: CTAN-2024.12.31 8License: Various combinations 9 10## Background 11In multilingual document processing and typesetting, correct hyphenation is crucial. tex-hyphen provides a comprehensive set of hyphenation patterns that support multiple languages, ensuring high-quality typesetting. Introducing tex-hyphen into OpenHarmony can significantly enhance the typesetting quality of multilingual documents. 12 13## Language Classification 14The tex directory contains multiple hyphenation patterns from TeX hyphenation patterns, each using different open-source licenses. The classification is as follows: 15* MIT License 16* GPL, GPL 2 17* LGPL 1, LGPL 2.1 18* LPPL 1, LPPL 1.2, LPPL 1.3 19* MPL 1.1 20* BSD 3 21 22Languages used in OHOS,the following languages all use a user-friendly open-source license.: 23* be - Belarusian 24* cs - Czech 25* cy - Welsh 26* da - Danish 27* de-1901 - German (1901 orthography) 28* de-ch-1901 - Swiss German (1901 orthography) 29* el-monoton - Modern Greek (monotonic) 30* el-polyton - Modern Greek (polytonic) 31* en-gb - British English 32* en-us - American English 33* es - Spanish 34* et - Estonian 35* fr - French 36* ga - Irish 37* gl - Galician 38* hr - Croatian 39* hu - Hungarian 40* id - Indonesian 41* is - Icelandic 42* it - Italian 43* ka - Georgian 44* lt - Lithuanian 45* lv - Latvian 46* mk - Macedonian 47* mn-cyrl - Mongolian (Cyrillic script) 48* nl - Dutch 49* pt - Portuguese 50* ru - Russian 51* sh-cyrl - Serbo-Croatian (Cyrillic script) 52* sh-latn - Serbo-Croatian (Latin script) 53* sk - Slovak 54* sl - Slovenian 55* sr-cyrl - Serbian (Cyrillic script) 56* sv - Swedish 57* th - Thai 58* tk - Turkmen 59* tr - Turkish 60* uk - Ukrainian 61* zh-latn-pinyin - Chinese (Pinyin) 62 63## Directory Structure 64``` 65third_party_tex-hyphen 66├── collaboration 67│ ├── original 68│ ├── repository 69│ └── source 70├── data/language-codes 71├── docs 72│ └── languages 73├── encoding 74│ └── data 75├── hyph-utf8 76│ ├── doc 77│ ├── source 78│ └── tex 79├── misc 80├── ohos 81│ ├── src 82│ └── hpb-binary 83├── old 84├── source 85├── tests 86├── TL 87├── tools 88└── webpage 89``` 90collaboration/ JavaScript dependencies and XML configuration files required by the tex-hyphen official website 91ohos/ OpenHarmony compilation files and hpb binary files 92data/ Language library 93docs/ Documentation related to hyphenation 94encoding/ Contains files related to character set encodings, handling different character sets. 95hyph-utf8/ Hyphenation pattern package for TeX, providing hyphenation patterns encoded in UTF-8 96misc/ An example of a hyphenation file for the en-gb language. 97old/ Contains older hyphenation pattern files that may have been updated or replaced. 98source/ Contains source code files used to generate and process hyphenation patterns. 99TL/ tlpsrc resource files, which are package source files in the TeX Live system, used to describe metadata of TeX Live packages 100tools/ Contains utility scripts to assist in processing hyphenation pattern files. 101webpage/ tex-hyphen official homepage, providing detailed information and resources about the hyph-utf8 package 102 103 104## Value Brought to OpenHarmony 105**1. Improved Typesetting Quality:** By introducing tex-hyphen, OpenHarmony can achieve more accurate hyphenation, improving the readability and aesthetics of documents. 106**2. Enhanced Small Screen Experience:** Using hyphenation patterns on small screen devices can display more content in the same area, enhancing the reading experience. 107 108## How to Use tex-hyphen in OpenHarmony 109### 1. Compile the HPB Binary 110#### Compilation Steps 111Open the terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_processor.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_processor.cpp) file, and run the following command to compile the code: 112 113``` 114cd ohos/src/hyphen-build/ 115g++ -g -Wall hyphen_pattern_processor.cpp -o transform 116``` 117 118Explanation of the command: 119- g++: Invoke the GCC compiler. 120- -g: Add debugging information. 121- -Wall: Enable all warnings. 122- hyphen_pattern_processor.cpp: Source code file. 123- -o transform: Specify the output executable file name as transform. 124 125#### Execution Steps 126After compilation, you can run the generated executable file and process the specified .tex file using the following command: 127 128``` 129./transform hyph-en-us.tex ./out/ 130``` 131 132Explanation of the command: 133- ./transform: Run the generated transform executable file. 134- hyph-en-us.tex: Input file (the .tex file to be processed). 135- ./out/: Output directory (the processed files will be stored in this directory). 136 137After successful execution, the processed files will be stored in the ./out/ directory. 138 139#### Batch Compilation 140- Dependencies: 141``` 142jq:JSON file parsing tool 143``` 144- Configure the files to be compiled using the JSON configuration file [build-tex.json](ohos%2Fbuild%2Fbuild-tex.json): 145``` 146[ 147 { 148 "filename": "example1.tex" 149 }, 150 { 151 "filename": "example2.tex" 152 } 153] 154``` 155filename: Specifies the name of the TeX file to be compiled. The file must be located in the [tex](hyph-utf8%2Ftex%2Fgeneric%2Fhyph-utf8%2Fpatterns%2Ftex) directory. 156 157The build-tex.json file defines all supported languages, and the script will compile all of them by default. Developers can control the addition or removal of languages by modifying build-tex.json. 158For example: 159To remove the example2 language, modify the file as follows: 160``` 161[ 162 { 163 "filename": "example1.tex" 164 } 165] 166``` 167To add the example3 language, modify the file as follows: 168``` 169[ 170 { 171 "filename": "example1.tex" 172 }, 173 { 174 "filename": "example2.tex" 175 }, 176 { 177 "filename": "example3.tex" 178 } 179] 180``` 181 182- Open a terminal (or command prompt), navigate to the directory containing the [build.sh](ohos%2Fbuild%2Fbuild.sh) file, and run the following commands to compile the code: 183``` 184chmod +x build.sh 185./build.sh 186``` 187After successful compilation, the compiled output will be placed in the ./out_hpb directory. 188### 2. Parse Word Hyphenation Positions Using HPB 189#### compilation Steps 190Open a terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_reader.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_reader.cpp) file, and run the following command to compile the code: 191 192``` 193cd ohos/src/hyphen-build/ 194g++ -g -Wall hyphen_pattern_reader.cpp -o reader 195``` 196Explanation of the command: 197- g++: Calls the GCC compiler. 198- -g: Adds debugging information. 199- -Wall: Enables all warnings. 200- hyphen_pattern_reader.cpp: The source code file. 201- -o reader: Specifies the output executable file name as reader. 202 203#### Running Steps 204After compilation, you can parse the hyphenation positions of words in the specified language using the following command: 205 206``` 207./reader hyph-en-us.hpb helloworld 208``` 209Explanation of the command: 210- ./reader: Runs the generated reader executable. 211- hyph-en-us.hpb: The input file (the binary file to be parsed). 212- helloworld: The word to be parsed. 213 214After successful execution, the log will output the hyphenation information of the parsed word. 215 216### 3. Batch Verification 217You can use the [generate_report.py](ohos%2Ftest%2Fgenerate_report.py) Python script to read the [report_config.json](ohos%2Ftest%2Freport_config.json) configuration file and perform batch verification to check the validity of the generated binary files. 218#### Preparation 219- Python 3.x 220- transform and reader executables, placed in the same directory as the script. 221- report_config.json configuration file 222 223#### Usage 2241. Prepare the Configuration File First, create a JSON configuration file named report_config.json with the following content: 225``` 226{ 227 "file_path": "path/to/tex/files", 228 "tex_files": [ 229 { 230 "filename": "example.tex", 231 "words": ["word1", "word2", "word3", "word4", "word5", "word6", "word7", "word8", "word9", "word10"] 232 }, 233 ... 234 ] 235} 236``` 2372. Run the Script Run the following command in the terminal: 238``` 239python generate_report.py report_config.json 240``` 2413. Log Files The script will generate a timestamped subdirectory under the report directory, containing the following log files: 242``` 243match.log: Records successful matches. 244unmatch.log: Records unsuccessful matches. 245```