1Non-standard hyphenation 2------------------------ 3 4Some languages use non-standard hyphenation; `discretionary' 5character changes at hyphenation points. For example, 6Catalan: paral·lel -> paral-lel, 7Dutch: omaatje -> oma-tje, 8German (before the new orthography): Schiffahrt -> Schiff-fahrt, 9Hungarian: asszonnyal -> asz-szony-nyal (multiple occurance!) 10Swedish: tillata -> till-lata. 11 12Using this extended library, you can define 13non-standard hyphenation patterns. For example: 14 15l·1l/l=l 16a1atje./a=t,1,3 17.schif1fahrt/ff=f,5,2 18.as3szon/sz=sz,2,3 19n1nyal./ny=ny,1,3 20.til1lata./ll=l,3,2 21 22or with narrow boundaries: 23 24l·1l/l=,1,2 25a1atje./a=,1,1 26.schif1fahrt/ff=,5,1 27.as3szon/sz=,2,1 28n1nyal./ny=,1,1 29.til1lata./ll=,3,1 30 31Note: Libhnj uses modified patterns by preparing substrings.pl. 32Unfortunatelly, now the conversion step can generate bad non-standard 33patterns (non-standard -> standard pattern conversion), so using 34narrow boundaries may be better for recent Libhnj. For example, 35substrings.pl generates a few bad patterns for Hungarian hyphenation 36patterns resulting bad non-standard hyphenation in a few cases. Using narrow 37boundaries solves this problem. Java HyFo module can check this problem. 38 39Syntax of the non-standard hyphenation patterns 40------------------------------------------------ 41 42pat1tern/change[,start,cut] 43 44If this pattern matches the word, and this pattern win (see README.hyphen) 45in the change region of the pattern, then pattern[start, start + cut - 1] 46substring will be replaced with the "change". 47 48For example, a German ff -> ff-f hyphenation: 49 50f1f/ff=f 51 52or with expansion 53 54f1f/ff=f,1,2 55 56will change every "ff" with "ff=f" at hyphenation. 57 58A more real example: 59 60% simple ff -> f-f hyphenation 61f1f 62% Schiffahrt -> Schiff-fahrt hyphenation 63% 64schif3fahrt/ff=f,5,2 65 66Specification 67 68- Pattern: matching patterns of the original Liang's algorithm 69 - patterns must contain only one hyphenation point at change region 70 signed with an one-digit odd number (1, 3, 5, 7 or 9). 71 These point may be at subregion boundaries: schif3fahrt/ff=,5,1 72 - only the greater value guarantees the win (don't mix non-standard and 73 non-standard patterns with the same value, for example 74 instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2) 75 76- Change: new characters. 77 Arbitrary character sequence. Equal sign (=) signs hyphenation points 78 for OpenOffice.org (like in the example). (In a possible German LaTeX 79 preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz 80 with `ssz, according to the German and Hungarian Babel settings.) 81 82- Start: starting position of the change region. 83 - begins with 1 (not 0): schif3fahrt/ff=f,5,2 84 - start dot doesn't matter: .schif3fahrt/ff=f,5,2 85 - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2 86 - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3 87 ("össze" looks "össze" in an ISO 8859-1 8-bit editor). 88 89- Cut: length of the removed character sequence in the original word. 90 - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3 91 ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor). 92 93Dictionary developing 94--------------------- 95 96There hasn't been extended PatGen pattern generator for non-standard 97hyphenation patterns, yet. 98 99Fortunatelly, non-standard hyphenation points are forbidden in the PatGen 100generated hyphenation patterns, so with a little patch can be develop 101non-standard hyphenation patterns also in this case. 102 103Warning: If you use UTF-8 Unicode encoding in your patterns, call 104substrings.pl with UTF-8 parameter to calculate right 105character positions for non-standard hyphenation: 106 107./substrings.pl input output UTF-8 108 109Programming 110----------- 111 112Use hyphenate2() or hyphenate3() to handle non-standard hyphenation. 113See hyphen.h for the documentation of the hyphenate*() functions. 114See example.c for processing the output of the hyphenate*() functions. 115 116Warning: change characters are lower cased in the source, so you may need 117case conversion of the change characters based on input word case detection. 118For example, see OpenOffice.org source 119(lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx). 120 121László Németh 122<nemeth (at) openoffice.org> 123