1# tex-hyphen 2## Introduction 3tex-hyphen is a hyphenation pattern library for the TeX system. It can correctly hyphenate words in multiple languages to improve typesetting quality. 4 5Source: tex-hyphen 6URL: https://github.com/hyphenation/tex-hyphen 7Version: CTAN-2021.03.21 8License: Various combinations 9 10## Background 11In multilingual document processing and typesetting, correct hyphenation is crucial. tex-hyphen provides a comprehensive set of hyphenation patterns that support multiple languages, ensuring high-quality typesetting. Introducing tex-hyphen into OpenHarmony can significantly enhance the typesetting quality of multilingual documents. 12 13## Language Classification 14The tex directory contains multiple hyphenation patterns from TeX hyphenation patterns, each using different open-source licenses. The classification is as follows: 15* MIT License 16* GPL, GPL 2 17* LGPL 1, LGPL 2.1 18* LPPL 1, LPPL 1.2, LPPL 1.3 19* MPL 1.1 20* BSD 3 21 22Languages used in OHOS,the following languages all use a user-friendly open-source license.: 23* be - Belarusian 24* cs - Czech 25* cy - Welsh 26* da - Danish 27* de-1901 - German (1901 orthography) 28* de-ch-1901 - Swiss German (1901 orthography) 29* el-monoton - Modern Greek (monotonic) 30* el-polyton - Modern Greek (polytonic) 31* en-gb - British English 32* en-us - American English 33* es - Spanish 34* et - Estonian 35* fr - French 36* ga - Irish 37* gl - Galician 38* hr - Croatian 39* hu - Hungarian 40* hy - Armenian 41* id - Indonesian 42* is - Icelandic 43* it - Italian 44* ka - Georgian 45* lt - Lithuanian 46* lv - Latvian 47* mk - Macedonian 48* mn-cyrl - Mongolian (Cyrillic script) 49* nl - Dutch 50* pt - Portuguese 51* ru - Russian 52* sh-cyrl - Serbo-Croatian (Cyrillic script) 53* sh-latn - Serbo-Croatian (Latin script) 54* sk - Slovak 55* sl - Slovenian 56* sr-cyrl - Serbian (Cyrillic script) 57* sv - Swedish 58* th - Thai 59* tk - Turkmen 60* tr - Turkish 61* uk - Ukrainian 62* zh-latn-pinyin - Chinese (Pinyin) 63 64## Directory Structure 65``` 66third_party_tex-hyphen 67├── collaboration 68│ ├── original 69│ ├── repository 70│ └── source 71├── data/language-codes 72├── docs 73│ └── languages 74├── encoding 75│ └── data 76├── hyph-utf8 77│ ├── doc 78│ ├── source 79│ └── tex 80├── misc 81├── ohos 82│ ├── src 83│ └── hpb-binary 84├── old 85├── source 86├── tests 87├── TL 88├── tools 89└── webpage 90``` 91collaboration/ JavaScript dependencies and XML configuration files required by the tex-hyphen official website 92ohos/ OpenHarmony compilation files and hpb binary files 93data/ Language library 94docs/ Documentation related to hyphenation 95encoding/ Contains files related to character set encodings, handling different character sets. 96hyph-utf8/ Hyphenation pattern package for TeX, providing hyphenation patterns encoded in UTF-8 97misc/ An example of a hyphenation file for the en-gb language. 98old/ Contains older hyphenation pattern files that may have been updated or replaced. 99source/ Contains source code files used to generate and process hyphenation patterns. 100TL/ tlpsrc resource files, which are package source files in the TeX Live system, used to describe metadata of TeX Live packages 101tools/ Contains utility scripts to assist in processing hyphenation pattern files. 102webpage/ tex-hyphen official homepage, providing detailed information and resources about the hyph-utf8 package 103 104 105## Value Brought to OpenHarmony 106**1. Improved Typesetting Quality:** By introducing tex-hyphen, OpenHarmony can achieve more accurate hyphenation, improving the readability and aesthetics of documents. 107**2. Enhanced Small Screen Experience:** Using hyphenation patterns on small screen devices can display more content in the same area, enhancing the reading experience. 108 109## How to Use tex-hyphen in OpenHarmony 110### 1. Compile the HPB Binary 111#### Compilation Steps 112Open the terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_processor.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_processor.cpp) file, and run the following command to compile the code: 113 114``` 115cd ohos/src/hyphen-build/ 116g++ -g -Wall hyphen_pattern_processor.cpp -o transform 117``` 118 119Explanation of the command: 120- g++: Invoke the GCC compiler. 121- -g: Add debugging information. 122- -Wall: Enable all warnings. 123- hyphen_pattern_processor.cpp: Source code file. 124- -o transform: Specify the output executable file name as transform. 125 126#### Execution Steps 127After compilation, you can run the generated executable file and process the specified .tex file using the following command: 128 129``` 130./transform hyph-en-us.tex ./out/ 131``` 132 133Explanation of the command: 134- ./transform: Run the generated transform executable file. 135- hyph-en-us.tex: Input file (the .tex file to be processed). 136- ./out/: Output directory (the processed files will be stored in this directory). 137 138After successful execution, the processed files will be stored in the ./out/ directory. 139 140#### Batch Compilation 141- Dependencies: 142``` 143jq:JSON file parsing tool 144``` 145- Configure the files to be compiled using the JSON configuration file [build-tex.json](ohos%2Fbuild%2Fbuild-tex.json): 146``` 147[ 148 { 149 "filename": "example1.tex" 150 }, 151 { 152 "filename": "example2.tex" 153 } 154] 155``` 156filename: Specifies the name of the TeX file to be compiled. The file must be located in the [tex](hyph-utf8%2Ftex%2Fgeneric%2Fhyph-utf8%2Fpatterns%2Ftex) directory. 157 158The build-tex.json file defines all supported languages, and the script will compile all of them by default. Developers can control the addition or removal of languages by modifying build-tex.json. 159For example: 160To remove the example2 language, modify the file as follows: 161``` 162[ 163 { 164 "filename": "example1.tex" 165 } 166] 167``` 168To add the example3 language, modify the file as follows: 169``` 170[ 171 { 172 "filename": "example1.tex" 173 }, 174 { 175 "filename": "example2.tex" 176 }, 177 { 178 "filename": "example3.tex" 179 } 180] 181``` 182 183- Open a terminal (or command prompt), navigate to the directory containing the [build.sh](ohos%2Fbuild%2Fbuild.sh) file, and run the following commands to compile the code: 184``` 185chmod +x build.sh 186./build.sh 187``` 188After successful compilation, the compiled output will be placed in the ./out_hpb directory. 189### 2. Parse Word Hyphenation Positions Using HPB 190#### compilation Steps 191Open a terminal (or command prompt), navigate to the directory containing the [hyphen_pattern_reader.cpp](ohos%2Fsrc%2Fhyphen-build%2Fhyphen_pattern_reader.cpp) file, and run the following command to compile the code: 192 193``` 194cd ohos/src/hyphen-build/ 195g++ -g -Wall hyphen_pattern_reader.cpp -o reader 196``` 197Explanation of the command: 198- g++: Calls the GCC compiler. 199- -g: Adds debugging information. 200- -Wall: Enables all warnings. 201- hyphen_pattern_reader.cpp: The source code file. 202- -o reader: Specifies the output executable file name as reader. 203 204#### Running Steps 205After compilation, you can parse the hyphenation positions of words in the specified language using the following command: 206 207``` 208./reader hyph-en-us.hpb helloworld 209``` 210Explanation of the command: 211- ./reader: Runs the generated reader executable. 212- hyph-en-us.hpb: The input file (the binary file to be parsed). 213- helloworld: The word to be parsed. 214 215After successful execution, the log will output the hyphenation information of the parsed word. 216 217### 3. Batch Verification 218You can use the [generate_report.py](ohos%2Ftest%2Fgenerate_report.py) Python script to read the [report_config.json](ohos%2Ftest%2Freport_config.json) configuration file and perform batch verification to check the validity of the generated binary files. 219#### Preparation 220- Python 3.x 221- transform and reader executables, placed in the same directory as the script. 222- report_config.json configuration file 223 224#### Usage 2251. Prepare the Configuration File First, create a JSON configuration file named report_config.json with the following content: 226``` 227{ 228 "file_path": "path/to/tex/files", 229 "tex_files": [ 230 { 231 "filename": "example.tex", 232 "words": ["word1", "word2", "word3", "word4", "word5", "word6", "word7", "word8", "word9", "word10"] 233 }, 234 ... 235 ] 236} 237``` 2382. Run the Script Run the following command in the terminal: 239``` 240python generate_report.py report_config.json 241``` 2423. Log Files The script will generate a timestamped subdirectory under the report directory, containing the following log files: 243``` 244match.log: Records successful matches. 245unmatch.log: Records unsuccessful matches. 246```