1Hyphen - hyphenation library to use converted TeX hyphenation patterns 2 3(C) 1998 Raph Levien 4(C) 2001 ALTLinux, Moscow 5(C) 2006, 2007, 2008 László Németh 6 7This was part of libHnj library by Raph Levien. 8 9Peter Novodvorsky from ALTLinux cut hyphenation part from libHnj 10to use it in OpenOffice.org. 11 12Compound word and non-standard hyphenation support by László Németh. 13 14License is the original LibHnj license: 15LibHnj is dual licensed under LGPL and MPL (see also README.libhnj). 16 17Because LGPL allows GPL relicensing, COPYING contains now 18LGPL/GPL/MPL tri-license for explicit Mozilla source compatibility. 19 20Original Libhnj source with OOo's patches are managed by Rene Engelhard 21and Chris Halls at Debian: 22 23http://packages.debian.org/stable/libdevel/libhnj-dev 24and http://packages.debian.org/unstable/source/libhnj 25 26 27OTHER FILES 28 29This distribution is the source of the en_US hyphenation patterns 30"hyph_en_US.dic", too. See README_hyph_en_US.txt. 31 32Source files of hyph_en_US.dic in the distribution: 33 34hyphen.tex (en_US hyphenation patterns from plain TeX) 35 36 Source: http://tug.ctan.org/text-archive/macros/plain/base/hyphen.tex 37 38tbhyphext.tex: hyphenation exception log from TugBoat archive 39 40 Source of the hyphenation exception list: 41 http://www.ctan.org/tex-archive/info/digests/tugboat/tb0hyf.tex 42 43 Generated with the hyphenex script 44 (http://www.ctan.org/tex-archive/info/digests/tugboat/hyphenex.sh) 45 46 sh hyphenex.sh <tb0hyf.tex >tbhyphext.tex 47 48 49INSTALLATION 50 51./configure 52make 53make install 54 55UNIT TESTS (WITH VALGRIND DEBUGGER) 56 57make check 58VALGRIND=memcheck make check 59 60USAGE 61 62./example hyph_en_US.dic mywords.txt 63 64or (under Linux) 65 66echo example | ./example hyph_en_US.dic /dev/stdin 67 68NOTE: In the case of Unicode encoded input, convert your words 69to lowercase before hyphenation (under UTF-8 console environment): 70 71cat mywords.txt | awk '{print tolower($0)}' >mywordslow.txt 72 73DEVELOPMENT 74 75See README.hyphen for hyphenation algorithm, README.nonstandard 76and doc/tb87nemeth.pdf for non-standard hyphenation, 77README.compound for compound word hyphenation, and tests/*. 78 79Description of the dictionary format: 80 81First line contains the character encoding (ISO8859-x, UTF-8). 82 83Possible options in the following lines: 84 85LEFTHYPHENMIN num minimal hyphenation distance from the left word end 86RIGHTHYPHENMIN num minimal hyphation distance from the right word end 87COMPOUNDLEFTHYPHENMIN num min. hyph. dist. from the left compound word boundary 88COMPOUNDRIGHTHYPHENMIN num min. hyph. dist. from the right comp. word boundary 89 90hyphenation patterns see README.* files 91 92NEXTWORD separate the two compound sets (see README.compound) 93 94Default values: 95Without explicite declarations, hyphenmin fields of dict struct 96are zeroes, but in this case the lefthyphenmin and righthyphenmin 97will be the default 2 under the hyphenation (for backward compatibility). 98 99Comments 100 101Use percent sign at the beginning of the lines to add comments to your 102hpyhenation patterns (after the character encoding in the first line): 103 104% comment 105 106***************************************************************************** 107* Warning! Correct working of Libhnj *needs* prepared hyphenation patterns. * 108 109For example, generating hyph_en_US.dic from "hyphen.us" TeX patterns: 110 111perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 112 113or with default LEFTHYPHENMIN and RIGHTHYPHENMIN values: 114 115perl substrings.pl hyphen.us hyph_en_US.dic ISO8859-1 2 3 116perl substrings.pl hyphen.gb hyph_en_GB.dic ISO8859-1 3 3 117**************************************************************************** 118 119OTHERS 120 121Java hyphenation: Peter B. West (Folio project) implements a hyphenator with 122non standard hyphenation facilities based on extended Libhnj. The HyFo module 123is released in binary form as jar files and in source form as zip files. 124See http://sourceforge.net/project/showfiles.php?group_id=119136 125 126László Németh 127<nemeth (at) openoffice (dot) org> 128