• Home
  • Line#
  • Scopes#
  • Navigate#
  • Raw
  • Download
1Non-standard hyphenation
2------------------------
3
4Some languages use non-standard hyphenation; `discretionary'
5character changes at hyphenation points. For example,
6Catalan: paral·lel -> paral-lel,
7Dutch: omaatje -> oma-tje,
8German (before the new orthography): Schiffahrt -> Schiff-fahrt,
9Hungarian: asszonnyal -> asz-szony-nyal (multiple occurance!)
10Swedish: tillata -> till-lata.
11
12Using this extended library, you can define
13non-standard hyphenation patterns. For example:
14
15l·1l/l=l
16a1atje./a=t,1,3
17.schif1fahrt/ff=f,5,2
18.as3szon/sz=sz,2,3
19n1nyal./ny=ny,1,3
20.til1lata./ll=l,3,2
21
22or with narrow boundaries:
23
24l·1l/l=,1,2
25a1atje./a=,1,1
26.schif1fahrt/ff=,5,1
27.as3szon/sz=,2,1
28n1nyal./ny=,1,1
29.til1lata./ll=,3,1
30
31Note: Libhnj uses modified patterns by preparing substrings.pl.
32Unfortunatelly, now the conversion step can generate bad non-standard
33patterns (non-standard -> standard pattern conversion), so using
34narrow boundaries may be better for recent Libhnj. For example,
35substrings.pl generates a few bad patterns for Hungarian hyphenation
36patterns resulting bad non-standard hyphenation in a few cases. Using narrow
37boundaries solves this problem. Java HyFo module can check this problem.
38
39Syntax of the non-standard hyphenation patterns
40------------------------------------------------
41
42pat1tern/change[,start,cut]
43
44If this pattern matches the word, and this pattern win (see README.hyphen)
45in the change region of the pattern, then pattern[start, start + cut - 1]
46substring will be replaced with the "change".
47
48For example, a German ff -> ff-f hyphenation:
49
50f1f/ff=f
51
52or with expansion
53
54f1f/ff=f,1,2
55
56will change every "ff" with "ff=f" at hyphenation.
57
58A more real example:
59
60% simple ff -> f-f hyphenation
61f1f
62% Schiffahrt -> Schiff-fahrt hyphenation
63%
64schif3fahrt/ff=f,5,2
65
66Specification
67
68- Pattern: matching patterns of the original Liang's algorithm
69  - patterns must contain only one hyphenation point at change region
70    signed with an one-digit odd number (1, 3, 5, 7 or 9).
71    These point may be at subregion boundaries: schif3fahrt/ff=,5,1
72  - only the greater value guarantees the win (don't mix non-standard and
73    non-standard patterns with the same value, for example
74    instead of f3f and schif3fahrt/ff=f,5,2 use f3f and schif5fahrt/ff=f,5,2)
75
76- Change: new characters.
77  Arbitrary character sequence. Equal sign (=) signs hyphenation points
78  for OpenOffice.org (like in the example). (In a possible German LaTeX
79  preprocessor, ff could be replaced with "ff, for a Hungarian one, ssz
80  with `ssz, according to the German and Hungarian Babel settings.)
81
82- Start: starting position of the change region.
83  - begins with 1 (not 0): schif3fahrt/ff=f,5,2
84  - start dot doesn't matter: .schif3fahrt/ff=f,5,2
85  - numbers don't matter: .s2c2h2i2f3f2ahrt/ff=f,5,2
86  - In UTF-8 encoding, use Unicode character positions: össze/sz=sz,2,3
87    ("össze" looks "össze" in an ISO 8859-1 8-bit editor).
88
89- Cut: length of the removed character sequence in the original word.
90  - In UTF-8 encoding, use Unicode character length: paral·1lel/l=l,5,3
91    ("paral·lel" looks "paral·1lel" in an ISO 8859-1 8-bit editor).
92
93Dictionary developing
94---------------------
95
96There hasn't been extended PatGen pattern generator for non-standard
97hyphenation patterns, yet.
98
99Fortunatelly, non-standard hyphenation points are forbidden in the PatGen
100generated hyphenation patterns, so with a little patch can be develop
101non-standard hyphenation patterns also in this case.
102
103Warning: If you use UTF-8 Unicode encoding in your patterns, call
104substrings.pl with UTF-8 parameter to calculate right
105character positions for non-standard hyphenation:
106
107./substrings.pl input output UTF-8
108
109Programming
110-----------
111
112Use hyphenate2() or hyphenate3() to handle non-standard hyphenation.
113See hyphen.h for the documentation of the hyphenate*() functions.
114See example.c for processing the output of the hyphenate*() functions.
115
116Warning: change characters are lower cased in the source, so you may need
117case conversion of the change characters based on input word case detection.
118For example, see OpenOffice.org source
119(lingucomponent/source/hyphenator/altlinuxhyph/hyphen/hyphenimp.cxx).
120
121László Németh
122<nemeth (at) openoffice.org>
123